Seneca ESG led Team Stoics in the third Eagle Alpha Data Hackathon in May 2021. As more corporations and funds commit to a net-zero economy and investment strategies, the Eagle Alpha Data Hackathon posed the challenge of how investors may get reliable ESG ratings free of greenwashing, social washing and huge gaps in governance reporting. The theme of the hackathon was to use alternative datasets to arrive at ESG insights that solve for gaps, or show differentiated and new insights, versus established ESG frameworks.
Seneca ESG worked with recognized data vendors to develop new insights on two representative public companies. We collaborated with six esteemed data vendors, namely Accern, Arabesque S-Ray, Carbon4Finance, Owl Analytics, Revelio Labs, and S-Factor. Within two weeks of preparation, our team processed all obtained data and produced ESG performance insights on two selected companies. This article highlights the challenges and lessons of working with diverse ESG datasets and distilling high quality insights from the data mix.
How We Integrated Various Datasets
Each data source used its distinct terminology and methodology, hence it was critical to review data dictionaries and methodologies and draw distinctions between the datasets. Every dataset from our six vendors offered a unique focus, such as environmental impact, social issues, or ESG overall performance. We organized them into the following three specializations:
- Environmental-Focused: Carbon4Finance
- Social-Focused: Revelio Labs, S-Factor
- ESG Overall: Accern, Arabesque S-Ray, Owl Analytics
Since all datasets differed in historical time range, data frequency, and granularity, we normalized all received data from the period of 2016 to 2021 for the purpose of this study, and organized them into three different levels based on their granularity: Pillar Score, Indicator Score, and Raw Data.

Next, data from all six vendors were mapped onto a framework based on the SASB standard, with four pillars and 26 categories, combining Social Capital and Human Capital into one pillar. We also highlighted the industry groups of the two companies on SASB’s Materiality Map to identify their most pertinent indicators.

Our analysis began at the pillar level data, where we looked for gaps in need of deeper investigation. We then zoomed into indicator level data to search for relevant explanations, and finally referred to available raw data for missing details. Sentiment data were also referenced to support findings on specific score changes. The chart below illustrates our work flow.

Challenges and Lessons in Data Blending
Through this Hackathon experience, we summarized the following key challenges and lessons in ESG data blending and analysis:
Preparation for Data Ingestion. Before investigation of the obtained data can begin, data users need to consider many details, including but not limited to: the method with which the data will be obtained, the time and frequency of data acquisition, and the best time to capture updated data. On top of that, users need to understand the data dictionary and methodology of each acquired dataset. With these necessary steps accomplished, data cleaning and quality check need to follow. This series of crucial preparation work prior to analysis is usually time-consuming. Data users not only need robust IT knowledge and technical capacity to process such data, but also ESG expertise to correctly interpret those data.
Benchmark Setting. While analysis based on raw data can retain the highest degree of data integrity and objectivity, setting proper benchmarks can be a significant challenge. For example, raw ESG datasets may provide a company’s employee gender and ethnicity ratios. In this case, depending on the company’s limiting factors due to its specific industry or geographical location, its ideal ratios of gender and ethnicity may vary greatly from others. To ensure a fair and sensible ESG rating, determining the ideal benchmarks for each indicator based on raw data will require analysis of a large database of peer company performance in the same categories. For an individual company or investment manager, this could be a costly and labor intensive process.
Data Contradictions and Ambiguity. In the process of gathering and integrating different datasets, at some point the user is bound to run into contradictions between their datasets, as well as data that do not offer clear explanations. Since these are common issues in data interpretation, users should beware of the following traps:
! Discarding the contradicting data altogether. While this method eliminates biases in the selection of data, it exacerbates the problem of data scarcity, which is also a common issue. In cases where users cannot afford to discard conflicting data due to data scarcity, further investigation to identify the more the reliable data source will be needed.
! Adding assumptions to explain contradictory data. This strategy runs the risk of introducing incorrect assumptions to the analysis and affecting objectivity of the rating. For transparency, users should back up the added assumptions.
! Selecting only data that affirms a predetermined conclusion. This method has its allure of convenience, yet presents a dangerous trap of cognitive biases.
Sentiment Scores vs ESG Performance. Sentiment data products are being introduced as part of the arsenal for ESG rating. These are scores generated based on media responses on the company’s ESG-related events, usually relying on Artificial Intelligence (AI) and Natural Language Processing (NLP) to gather real-time data. For a big company with frequent news coverage, sentiment scores can help indicate potential shortfalls in a timely manner. However, sentiment data will be scarce for companies with relatively small publicity. In addition, the nature of media involves a tendency towards sensationalism and speculation, which may overestimate the effects of certain ESG-related events or lack concrete evidence. In general, incorporating sentiment scores into ESG analysis may offer a helpful pointer for which ESG indicators to watch out for in the near term, yet they do not necessarily contribute to an objective evaluation of a company’s ESG performance.
Case Study on a Notable Tech Company
One of the companies studied in the Hackathon was a notable tech company. Its Social pillar rating by an established ESG framework was mediocre, specifically in categories of Human Capital Development and Supply Chain Labor Standards. We noted that under the SASB Materiality Map, both Employee Engagement, Diversity and Inclusion as well as Supply Chain Management are crucial to this company’s industry category.
To conduct a quick comparison at the pillar level with our alternative datasets, we referenced data from Arabesque S-Ray and Owl Analytics for this tech company, and found that its Social pillar score was markedly lower than its Environmental and Governance pillar ratings, coherent with the established ratings. The graph below illustrates the ESG trends of this tech company based on scores from both Arabesque S-Ray and Owl Analytics, with green lines for Environmental scores, red lines for Governance scores, and yellow lines for Social scores. Consistent in both datasets, the Social pillar is the laggard of ESG rating for this company.

Zooming into indicator level data in the Social pillar, we were able to identify specific areas where this tech company underperformed. The chart below demonstrates the standing of this company among its industry peers in terms of indicators in the Social pillar, with the tech company studied as the leftmost data points. The indicator level comparison revealed that this company performed significantly worse than its peers in Occupational Health and Safety (orange) as well as Training and Development (magenta).

It is worth noting that our team was lucky to find our indicator level data providing clear explanations for our target company’s poor performance in the Social categories. If our target company were to be one of the nine peer companies shown in the graph, the task to draw insights from this data would likely be much more difficult. This highlights the data ambiguity challenge.
Finally, sentiment data from Accern partially supported this finding. As seen from the graph below, sentiment scores on Labor Practices (orange) of this company were dominantly and consistently in the negative, with negative news concentrating around a few specific dates. Some of these negativity clusters correspond to drops in the company’s Social scores, while others were not necessarily reflected in the overall ESG pillar scores at the same time period.

As the ESG data landscape becomes increasingly varied and complex, investors and businesses integrating ESG into their processes need to have pertinent expertise to utilize and interpret ESG data properly and accurately. At Seneca ESG, we possess both the expertise and technical capacity, and our goal is to provide a robust solution for investors and businesses to incorporate diverse ESG data and assess ESG performance with customized frameworks. To ensure equal comparison and establish a suitable methodology to evaluate your ESG performance, it is critical to determine the appropriate benchmarks for your business. The data providers mentioned can provide valuable data as references to compare your company’s performance with its peers in the market.
Seneca ESG was honored to have collaborated with the mentioned data vendors in this Hackathon challenge. We greatly appreciate Eagle Alpha for this fantastic experience to investigate the use of blended alternative data for new ESG insights.
About Seneca ESG
Seneca ESG is a business intelligence company delivering solutions for corporate sustainability assessment, reporting, and integration with financial services. The company’s flagship ZENO (for investment firms) & EPIC (for corporates) platforms facilitate ESG data management, sustainability-driven analyses, and workflow automation, for both corporate and investment manager clients. ZENO & EPIC allow for complete customization of the ESG data collection, ingestion, analyses, scoring, and assessment process, while taking into consideration the entire range of data sources, reporting standards, and assessment frameworks that currently exist today. These standards include SASB, GRI, TCFD, CDP, CDSB, IIRC, PRI, GRESB, UN SDGs, UN GC, WEF Guidelines, and more.
About Eagle Alpha
Established in 2012, Eagle Alpha is the pioneer connecting the universe of alternative data. We are the leading alternative data aggregation platform with supporting advisory services for data buyers and data vendors.
First adopted by alpha-seeking hedge funds over 10 years ago, alternative data is now being sought for use in the wider asset management space, as well as the private equity and corporate verticals.
Eagle Alpha was one of the first companies to recognize the value from these new data sources and has been investing in educating and connecting alternative data vendors and buyers since 2012, in the process building trusted relationships with both sides of this market.
As of May 2021, Eagle Alpha has over 1,500 dataset profiles on the platform and provides annual solutions to data buyers and data vendors globally.
A unique breadth of datasets, knowledge of the industry and client relationships have cemented Eagle Alpha as the global leader and strategic partner in the alternative data space. To learn more about Eagle Alpha’s solutions for vendors and buyers visit www.eaglealpha.com.
PRESS INQUIRIES
Seneca ESG
Rebecca Yu
Eagle Alpha
Ronit Koren
Email must be formatted correctly