Content
I am interested in environmental remediation: pollution modeling and the data analysis that comes with that are my jam. I also think that the tools we are learning in this class are useful in the corporate setting – performance metrics, etc. could all be run through tools like this.
I also have a blog where I post projects and classwork I’m particularly passionate about.
4. Tidying up Data / Answers and Tasks
Task 1. Convert the following table CO2 emissions per country since 1970 from wide to long format and output the first few rows into your Rmarkdown.
library(readxl) # install.packages('readxl')
url = 'http://edgar.jrc.ec.europa.eu/news_docs/CO2_1970-2014_dataset_of_CO2_report_2015.xls'
xls = '../data/co2_europa.xls'
# This download seems to corrupt on PCs... look at download.file helpfile, maybe try another method? For now we'll use a workaround I guess.
print(getwd())
if (!file.exists(xls)){
download.file(url, xls)
}
Quick workaround (not downloading from online)
library(readxl) # install.packages('readxl')
library(readr)
library(dplyr)
library(tidyr)
xls = '../data/co2_europa.xls'
co2=read_excel(xls, skip=12)
## DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 0c 00 e0 00 00 00 2c 00
## DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 0c 00 e0 00 00 00 2c 00
## DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 0c 00 e0 00 00 00 2c 00
## DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 0c 00 e0 00 00 00 2c 00
#Convert to long format. Display a few rows as example.
task1 = co2 %>%
gather("year", "n", -1)
head(task1)
## Source: local data frame [6 x 3]
##
## Country year n
## (chr) (chr) (dbl)
## 1 Afghanistan 1970.000000 1813.981234
## 2 Albania 1970.000000 4435.433037
## 3 Algeria 1970.000000 18850.745443
## 4 American Samoa 1970.000000 6.183023
## 5 Angola 1970.000000 8946.495011
## 6 Anguilla 1970.000000 2.168006
Task 2. Report the top 5 emitting countries (not World or EU28) for 2014 using your long format table. (You may need to convert your year column from factor to numeric, eg mutate(year = as.numeric(as.character(year)))
.
task2=co2 %>%
gather("year", "n", -1) %>%
mutate(year = as.numeric(as.character(year))) %>%
arrange(desc(n)) %>%
filter (year==2014) %>%
filter(Country != "World") %>%
filter(Country != "EU28")
head(task2)
## Source: local data frame [6 x 3]
##
## Country year n
## (chr) (dbl) (dbl)
## 1 China 2014 10540749.6
## 2 United States of America 2014 5334529.7
## 3 India 2014 2341896.8
## 4 Russian Federation 2014 1766427.3
## 5 Japan 2014 1278921.8
## 6 Germany 2014 767145.6
Task 3. Summarize the total emissions by country (not World or EU28) across years from your long format table and return the top 5 emitting countries. (As with most analyses, there are multiple ways to do this. I used the following functions: filter
, arrange
, desc
, head
).
task3=co2 %>%
gather("year", "n", -1) %>%
mutate(year = as.numeric(as.character(year))) %>%
group_by(Country) %>%
summarise(Sum = sum(n)) %>%
arrange(desc(Sum)) %>%
filter(Country != "World") %>%
filter(Country != "EU28")
colnames(task3)[colnames(task3)=="Sum"] <- "Sum of CO2 Emissions, 1970-2014 (kton CO2)"
head(task3)
## Source: local data frame [6 x 2]
##
## Country Sum of CO2 Emissions, 1970-2014 (kton CO2)
## (chr) (dbl)
## 1 United States of America 231948899
## 2 China 174045927
## 3 Russian Federation 81242427
## 4 Japan 51276329
## 5 Germany 43382205
## 6 India 39004218