Accessing Environmental Justice Datasets

Solving environmental problems, particularly in marginalized neighborhoods, is difficult. Community contamination from industrial pollution often goes unchecked for years before cleanup or mitigation occurs. However, data can help community organizers articulate grievances and energize action. This article walks through publicly accessible data sources, an analysis case study in New Jersey, and how environmental justice organizers can leverage data to support their work with little technical capacity.
June 6, 2022
Accessing Environmental Justice Datasets

Canal catastrophe

The environmental justice movement in the United States sprouted from poisonous soil. Historians and experts often cite the disaster and eventual response that unfolded in Love Canal, New York, as the harbinger of the environmental justice movement. A community the government ignored rose up and demanded support to address the harm inflicted on residents by industrial polluters. Love Canal embodies the havoc caused by toxic pollution on public health and the success of community organizing when paired with data.

In what would become the town of Love Canal, the Hooker Chemical Company saw value in turning a defunct canal project between Lake Ontario and Lake Erie from the 1890s into a disposal site for waste from their perfuming, dying, and rubber industries. All told, the company dumped 43,700 pounds of waste into the canal before capping the canal with clay in 1953. Then, they sold the land to the local school board for $1, with a clause protecting the Hooker Chemical Company from future liability. Soon a neighborhood of 1,000 homes was built. Residents became unsuspecting participants in a live-in laboratory, developing terrible illnesses that demonstrated the effects of chemicals like benzene and chloroform on human health and the environment.

Data-driven change

For years, birth defects and cancer plagued residents. They voiced concerns to local and state officials but received no support. Eventually, fervent community organizing, investigative journalism, and government-collected data detecting harmful chemicals seeping from the ground sparked action. In 1980, almost three decades after the clay capping, Love Canal became the first site in the US Environmental Protection Agency’s (EPA) Superfund program. The Superfund program would go on to identify and clean up the country’s most significant environmental disasters, currently there are over 13,000 Superfund locations across the country.

In hindsight, we can identify the symbiotic relationship between community organizers’ energy and quantifiable data in delivering justice. Community organizing played an important role; however, they needed quality data to prove a causal relationship between the waste and problems in their community for change to occur. Was this a success story? Sort of. The community of Love Canal was eventually razed, the residents relocated, and the site cleaned up. Unfortunately, the story of residential contamination from industrial activity continues, near Love Canal, and throughout the nation, despite increased federal oversight and monitoring through the Superfund program.

Forty years on

Superfund proximity demographics. Source: EPA

Across the country, areas like Cancer Alley between New Orleans and Baton Rouge and Collegeville Alabama, where mid-century coke plants belched toxins into the air and water, continue to suffer the impacts of pollution from industrial processes. Low-income and residents of Color are particularly at risk to unknown exposure. Unfortunately, knowledge of local Superfund sites isn’t easy to come by and the EPA and Federal Housing Authority have done a poor job of notifying residents. For example, a resident of Collegeville, Gerica Cammack told investigators from Inside Climate News she knew there was industrial activity nearby when she moved with her two children into a subsidized apartment in May of 2021. However, she was shocked to find that her new home was less than a mile from the Collegeville 35th Street Superfund site, where cleanup began seven years prior in 2014. Her situation was not unique, communities co-exist with Superfund sites across the United States. Research shows that a startling 70% of Superfund sites are within a mile of a federal public housing development and nearly 50% of individuals living within 3 miles of a superfund site are a minority. Data demonstrates the environmental justice connection between underserved communities and these recognized hazards.

Bridging the gap

Map of Collegeville Alabama and Federally assisted housing. Source: The Intercept

Despite a timespan of forty years between them, there are many parallels between Love Canal and Collegeville Alabama. As was the case in Love Canal, data can spur cleanup action and protections for nearby residents, but that data needs to be accessible and in the hands of organizers and community members. Unlike forty years ago, much of that data is publicly available but the roadmaps and resources to leverage the data are often unclear. Publicly accessible federal data can help bridge the gap between government and communities, but work needs to be done to get it into the hands of those who need it most, like Gerica Cammack and other vulnerable residents.

Local knowledge and observation galvanize most environmental justice initiatives, but organizers need to employ scientific evidence to force the government to take action. Where should organizers turn for empirical support? Agencies like EPA, Census Bureau, and CDC, produce data usable for raising awareness about the potential harm from pollution, giving power to residents at risk. Although many are unaware of the data, accessing them is within the reach of polluted communities or groups that support them. This piece will walk through valuable data tools that every community organizer should add to their toolbox and explore analysis processes for analyzing raw datasets. With this data, community organizers can tell data-driven stories to prevent or mitigate future Love Canals from robbing communities of their right to clean water and air.

Tools and Programmatic Analysis

Before exploring datasets and sources, it is useful to briefly address the available tools and programs for analyzing the data. Spreadsheet programs like Microsoft Excel, Apple Numbers, and Google Sheets are great entry points for budding analysts. Many organizers are already familiar with these programs. They allow for easy data manipulation and visualizations without the need for more technical skills. All data discussed below can be downloaded and ingested into one of these programs.

For those looking to unlock greater analysis capabilities, consider developing analyses in a statistical programming language. Free languages like R and Python have robust analysis and visualization capabilities and are easy to learn with a wealth of resources and tutorials. Script-based workflows are easily modified for new or additional data, quickly fed into visualizations, and provide a recipe for analysts and developers to build off of. Furthermore, you can connect many of the data sources in this article to your script via an API, eliminating the need for data downloads and streamlining workflows. Lastly, for those interested in spatial data analysis, check out QGIS, an open source program similar to the industry standard ESRI ArcMap. You can also handle spatial data in R and Python for programmatic spatial data manipulation. While all of these tools are not necessary for analyzing environmental justice datasets, keep an eye out for the benefits of programmatic analysis.

Envirofacts and Superfund Data

EPA Envirofacts Portal. Source: EPA

To find pollution information for a variety of hazards, the EPA has the Envirofacts portal. Hazards range from Superfund sites and Toxic Release Inventory facilities (TRI), to underground storage tanks and facilities discharging pollution into waterways. With this portal, you can find facilities polluting the air, water, and land in your community.The Envirofacts API provides EPA also offers a series of endpoints for directly accessing these types of facilities from their Envirofacts API service. Here you can easily request data programmatically for over a dozen facility registries, including Superfund sites. In the case of Superfund sites, however, the EPA Envirofacts data does not include geographic information. This is a deal-breaker for analysts looking to perform spatial analysis. Luckily, the EPA provides a separate map of Superfund sites with a CSV download option. This data includes latitude and longitude coordinates, allowing ingestion into spatial programs like ArcMap, QGIS, or R via the sf package. Gathering hazard data is a crucial component of evaluating environmental justice concerns. However, this data has minimal stand alone use. Additional data describing those who live in proximity to hazards, is needed to draw conclusions about potential health impacts. A synthesis of this information is available with the EPA EJScreen tool.

EPA EJScreen

EPA EJScreen Mapping Tool. Source: EPA

The US EPA’s EJScreen tool sits atop the environmental justice data tool heap. EJScreen combines a multitude of pollution hazard data like Superfund sites and TRI facilities as well as a trove of justice-related data. As such, organizers can use it to directly access hazard data to pinpoint risk to local communities rather than gathering their own hazard data more suitable for in-depth analysis. The tool was originally developed in California as CalEnviroScreen and was subsequently adopted by the EPA to cover the entire nation. Tools like EJScreen and CalEnviroScreen are unique in that they synthesize demographic and pollution data to produce indicators that show  a communities’ exposure and vulnerability to pollution. (Important to note that, unlike CalEnviroScreen, EJScreen does not explicitly incorporate race categories due to article six of the Civil Rights Act, opting instead to use People of Color.)

To access EJScreen data there are a number of entry points. The easiest is through the EJScreen Mapping Tool which allows users to examine a plethora of scores ranging from air toxin and flooding risk to superfund site proximity and lead paint prevalence. Users can download EJScreen data for the entire country (watch out, it is a big file!) to have full access to the variables provided in their analysis tool of choice. Lastly, the EPA provides an additional mapping tool to help developers programmatically access EJScreen data for specific geographies using an API. You can also use the tool to query for a particular census tract, location, or geographic extent. The tool will return the data, and construct an API request to use in analysis scripts. By bringing together hazard and demographic data, EJScreen is the comprehensive environmental justice tool. A downside however, is that analysts are restricted to the hazard sources and demographic variables used in EJScreen. For those who require specificity, Envirofacts covers all publically available federal hazard data. For demographic data, the US Census has extensive coverage.

United States Census Bureau data

US Census Bureau data tables. Source: Census Bureau

Environmental justice data by definition incorporates not only environment-based data but also demographic data. As such, the U.S. Census Bureau is a wonderful resource for communities in need of robust data. Easily accessible through, the Bureau provides detailed and disaggregated information for geographies ranging from census blocks (roughly 1,000 people) to the entire country. They also offer data from numerous surveys, including the decennial census, and American Community Survey samples. When using the Census portal, users can generate tables for their geography, vintage, and variables of choice and download them in CSV and Excel formats. Just like EJScreen, the Census Bureau provides tools for accessing their data programmatically via a series of API endpoints. Unfortunately, these endpoints are relatively tedious and can be difficult to use for junior developers.

NJDemog < get_acs(geography =  "tract ",
                             state =  "NJ ",
                             variables = c(BlackAlone = "B01001B_001",
                                           WhiteAlone = "B01001H_001",
                                           TotalPop = "B01003_001",
                                           Poverty = "B14006_002"
                             year = 2020,
                             geometry = TRUE, output = "wide")

Data practitioners using the popular open-source statistical programming language, R, can put the R package, tidycensus, to work and make census data easier to access. With tidycensus, a simple request using the function get_acs() can retrieve the data a user requires in a usable format, much easier than creating a table using the Census Bureau provided interface. Armed with demographic data, analysts can take it one step further and compare pollution hazards pulled from EnviroFacts with health outcomes.


CDC and Robert Wood Johnson Foundation PLACES map. Source: CDC and Robert Wood Johnson Foundation.

The CDC and Robert Wood Johnson Foundation created a nation wide, census tract level database called PLACES, or the Population Level Analysis and Community Estimates. PLACES estimates the prevalence of diseases ranging from diabetes and heart disease to depression and cancer. PLACES differs from typical health data in that it models diseases for the entire country, allowing for comparisons across geographies. Typically, health data is generated by local city, county or state health departments or academic institutions. While these data are often high quality as they are derived from surveys or health records, they lack comparability across geographies.

Accessing PLACES data is straightforward. Users can view and explore data with the PLACES interactive map, or download data for the entire country at different geographies including census tracts, zip code tabulation areas (ZCTAs), cities, counties, and entire states from the PLACES data portal. Data from PLACES is also accessible pragmatically through the Socrata API framework. Socrata builds data portals for many states and cities and facilitates access via multiple programming languages, including the R package - RSocrata. With a few lines of code, you can access PLACES data and easily integrate it into your analysis workflows.

Pulling it together

Now that we have a variety of data sources to pull from, let’s explore what that looks like in practice. For this example, we will conduct a brief analysis in R to evaluate a simple research question. As mentioned earlier, across the country, roughly 50% of people living within three miles of a Superfund site are a minority. Let’s take New Jersey, a state with 152 Superfund sites and see if this phenomenon persists. To do so, we  need 2020 American Community Survey data from the Census Bureau, and locations of all Superfund sites from EPA’s Envirofacts. We will then create a three mile buffer around the site locations and select the census tracts which intersect the three mile buffer. For a more accurate analysis, we will perform a spatial interpolation: the process of allocating populations for sub geographies from larger geographies. The R package Areal provides support for this operation. If you omit this step, your results need to indicate  they represent all census tracts within a three mile buffer, not the population within a three mile buffer.

Reading the results

Once we have completed the data preparation, we can analyze the data by calculating averages for the percentage of minority and people living in poverty within the three mile buffer of Superfund sites. We will calculate the variables in question for the entire affected population, and look at averages across sites and compare them to state averages.

NJ Superfund Site Map; Total Population and Percent People of Color

Unsurprisingly, in the case of New Jersey, the analysis indicates that minorities and people in poverty are overrepresented within the zone of concern compared to White people and people living above the poverty line, agreeing with our hypothesis and national averages. The analysis shows that, summed across all sites, 58.4% of the population within three miles is a Person of Color, compared to the state average of 45.8%. Similarly, 11.6% of people living within three miles of a Superfund site are in poverty, 2% more than the state average of 9.6%. Interestingly when we look at site specific averages, the demographic components fall below state averages. For example, the Person of Color and poverty rates averaged across all Superfund sites is 39.3% and 7.7%, compared to 45.8% and 9.6%, respectively. What explains this discrepancy between total affected population statewide and site specific averages? The answer is population density. Sites with higher populations living within a three mile radius, have higher rates of poverty and People of Color. For example, the A. O. Polymer Superfund site in rural Sparta, New Jersey has 3,235 living within three miles, and a minority population rate of only 14.7%, a third the state average. Comparatively, in urban Newark, New Jersey, the White Chemical Corp Superfund site has 53,499 people living within three miles, and 91% of them are a minority. The poverty rate is more than twice the state average, 19.4% compared to 9.6%. Furthermore, many of these urban Superfund sites are clustered within just a few miles from each other, causing compound risk to residents.

NJ Superfund Site Chart; Total Population and Percent People of Color

While this analysis is preliminary, it has useful insights. The analysis shows a pattern of disparate exposure to superfund sites, particularly for high population urban areas. In New Jersey, these areas have a higher minority population and people living in poverty. These communities often have limited access to healthcare, and less resources to advocate for cleanup. Further analysis could include health outcome data from PLACES, or look at additional demographic variables like language isolation and healthcare coverage, or disaggregated race variables instead of People of Color. Community groups can use this environmental justice analysis to prioritize outreach and organizing for lower income and higher minority communities living near a Superfund site. Using data for equitable prioritization closes the knowledge gap between the government, industry, and affected communities. Doing so paves a path towards equitable mitigation and support for these underserved and communities facing possible public health crises due to their proximity to Superfund sites.

More work to be done

The data sources discussed in this article are just a few resources of many. Often states and regional initiatives offer high quality and easily accessible data usable by community organizers. If you're a group looking to access additional data not discussed in this article, check out the EPA’s Environmental Dataset Gateway (EDG). There you can find datasets from extreme temperature change and heat related hospitalizations to regional specific datasets like drinking well uranium contamination data for Navajo Nations. As with many resources discussed in this article, the EDG also has a series of endpoints for accessing the data programmatically. While you  may not have the resources to build complex script-based analysis, you do have the capacity to take advantage of web-based portals like EJScreen or PLACES. Use this data to give credence to the lived experience and challenges faced by polluted neighborhoods. Although more can be done to bring data out of government systems and into the hands of those who need it most, a lot can be accomplished with a little knowledge and basic analytical skills. Once data is sourced and analyzed, it can be used at community meetings for awareness, leveraged for policy change in legislatures, and ultimately improve and save lives. If the experience of Love Canal tells us anything, it is when data is combined with grassroots organizing, real change can occur.