Air Pollution in Sheffield


Hala Al-Jarah


Introduction


Background

Air pollution is a rapidly growing global public health emergency. Levels of particulates (harmful invisible tiny solids that float in the air) are exceeding “safe” levels in 91% of the world’s population, and Sheffield is not exempt from this.

Sheffield is one of the top 10 most polluted spots in Britain, and experiences 20% over the “safe” level of exposure to fine particulate matter pollution. This causes many cardiovascular and respiratory problems, and 500 people in Sheffield die each year because of it.


Research Questions

According to the Sheffield CC Air Pollution Monitoring Data Presentation Site, the main pollutants of concern in Sheffield are:

  • Nitrogen dioxide (NO2)
  • Fine particulate matter (PM2.5)

Therefore the levels of the specific pollutants NO2 and PM2.5 are visualised for this project, particularly looking at where in Sheffield these levels are highest, and when these levels are highest.




Data and Visualisations


1 - Locating Air Pollution Hotspots

Sheffield CC Air Pollution Monitoring Data Presentation Site shows air pollution data from all of Sheffield’s 5 automatic monitoring stations. Initial data was collected across all 5 monitoring stations by selecting each site and recording the average maximum score for both NO2 and PM2.5.


Data Origins From Air Pollution Monitoring Stations

These are the following average maximum scores recorded at different stations on an average day for NO2 and PM2.5:

#loading data
Data1.1<- read.csv("Data/Data1.1.csv")
head(Data1.1)
##           Location NO2ppb PM2.5ug.m3
## 1      GH1 FIRVALE    8.6       10.5
## 2      GH2 TINSLEY   16.7       14.0
## 3     GH3 LOWFIELD   18.0       22.8
## 4   GH4 THE WICKER   11.6       19.6
## 5 GH5 KING ECGBERT    4.5        8.3
#visualising data
library(tidyverse)
Data1.2<- read.csv("Data/Data1.2.csv")
ggplot(data = Data1.2, aes(fill=Pollutant, x=Location, y=Pollutant_Level)) + 
  geom_bar(position = "dodge", stat = "identity") + 
  labs(y= "Pollutant Levels") + 
  ggtitle("Pollutant Levels Across Sheffield") + 
  scale_fill_manual(values=c("#42A5F5", "#66BB6A")) + 
  theme_light(base_size = 7) +
  theme(panel.grid.major.x = element_blank(), panel.grid.major.y = element_line( size=.1, color="light grey")) +
  expand_limits(y=25)
*Figure 1:average maximum scores recorded on an average day for NO2 and PM2.5*

Figure 1:average maximum scores recorded on an average day for NO2 and PM2.5



From visualising this data, it is clear that monitors at locations GH2, GH3, and GH4 produce a higher maximum score for both pollutants than those in surrounding areas. Upon identifying these locations on a map, it is apparent these monitors are located on the ring road and by the M1.

This would therefore suggest that high traffic areas are hotspots for this type of air pollution.

Now that where these levels of NO2 and PM2.5 are highest have been identified, we can now look at what is happening at these 3 hotspots in more detail, to see when these levels are highest.



2- Air Pollution Across Time

The Department for Environment Food & Rural Affairs states air pollution in parts of Sheffield last April (2019) was double the legal limit. We can use April as an example to look at what is happening at these hotspots in more detail.


Preparing Data showing average pollutant levels over 24 hours

These are the levels of pollutants NO2, and PM2.5 across an average 24 hours in April:

#loading data for NO2
Data2 <- read.csv("Data/Data2.csv")
head(Data2)
##   Time  GH2.NO2  GH3.NO3 GH4.NO2
## 1    1  3.00000  3.56666 7.97666
## 2    2  2.49667  3.48334 7.32334
## 3    3  2.15500  4.82500 6.61250
## 4    4  3.11750  3.04750 5.94750
## 5    5  6.20750  4.79250 5.99250
## 6    6 16.62000 10.65250 9.14999
#preparing data for NO2
names(Data2)
colnames(Data2) <- c("Time", "GH2", "GH3", "GH4") #changing column names as they originally contained dashes
df2 <- Data2 %>% #creating a new dataset
  gather(key = "variable", value = "value", -Time) #creating a new column to gather values (apart from time)
head(df2)
## [1] "Time"    "GH2.NO2" "GH3.NO3" "GH4.NO2"
##   Time variable    value
## 1    1      GH2  3.00000
## 2    2      GH2  2.49667
## 3    3      GH2  2.15500
## 4    4      GH2  3.11750
## 5    5      GH2  6.20750
## 6    6      GH2 16.62000
#loading data for PM2.5
Data3 <- read.csv("Data/Data3.csv")
head(Data3)
##   Time GH2.PM2.5 GH3.PM2.5 GH4.PM2.5
## 1    1   11.3825  12.27750   10.8275
## 2    2   16.1850   9.03300   10.8320
## 3    3   12.1300   7.72850   14.1750
## 4    4   14.4750  15.40500   13.9925
## 5    5   11.7350  14.35500   10.7125
## 6    6   11.3200   4.39575   10.0475
#preparing data for PM2.5
names(Data3)
colnames(Data3) <- c("Time", "GH2", "GH3", "GH4") #changing column names as they originally contained dashes
df3 <- Data3 %>% #creating a new dataset
  gather(key = "variable", value = "value", -Time) #creating a new column to gather values (apart from time)
head(df3)
## [1] "Time"      "GH2.PM2.5" "GH3.PM2.5" "GH4.PM2.5"
##   Time variable   value
## 1    1      GH2 11.3825
## 2    2      GH2 16.1850
## 3    3      GH2 12.1300
## 4    4      GH2 14.4750
## 5    5      GH2 11.7350
## 6    6      GH2 11.3200


Visualising Data showing average pollutant levels over 24 hours

#visualising NO2 data
ggplot(df2, aes(x = Time, y = value)) +
  geom_line(aes(color = variable), size = 1) +
  scale_color_manual(values = c("#90CAF9", "#42A5F5", "#1565C0"))+
  expand_limits(y=35) +
  theme_light(base_size = 10) +
  
  ggtitle("Average level of NO2 over 24 hours") + 
  scale_x_continuous(name="Time", breaks= seq(0, 24, 1)) +
  scale_y_continuous(name="NO2 Levels (ppb)", breaks= seq(0, 35, 5)) +
  
  theme(legend.position="bottom")+
  labs(col="Location")

#visualising PM2.5 data
ggplot(df3, aes(x = Time, y = value)) +
  geom_line(aes(color = variable), size = 1) +
  scale_color_manual(values = c("#AED581", "#66BB6A", "#1B5E20"))+
  expand_limits(y=55) +
  theme_light(base_size = 10) +
  
  ggtitle("Average level of PM2.5 over 24 hours") + 
  scale_x_continuous(name="Time", breaks= seq(0, 24, 1)) +
  scale_y_continuous(name="PM2.5 Levels (ug/m3)", breaks= seq(0, 55, 5)) +
  
  theme(legend.position="bottom") +
  labs(col="Location")


Upon visualising this data, it is clear that there are peaks in pollutant levels during rush hour-times where there is likely to be high traffic. Although this pattern is much stronger for NO2 than PM2.5, this still shows that not only is air pollution localised at vehicle hotspots, but these air pollution levels peak during high traffic, suggesting this type of air pollution could be caused by vehicle emissions.


Summary



After exploring this data, one might conlcude that vehicle emissions play a large role in the air pollution problem in Sheffield (due to the higher pollution levels recorded near busier roads, which peak during high traffic).


Perhaps an interesting way to futher these findings would be to correlate number of vehicles with pollutant levels, to ensure the pattern observed in the second graphs is specifically related to high traffic and not a confound that concerns the time of day in a different way.


Another interesting angle could be to compare average scores from April 2019 (as shown), with April 2020, following UK COVID lockdown rules, and explore whether air pollution levels have decreased.


This data could also be compared against diferent cities in the UK.