Content

I’m very interested in investigating how can dynamic ecosystem-based management strategies be designed to protect and recover marine resources. In particular, I’m interested in reef-associated predators and their role in ecosystem stability and resilience. Some burning questions are:

I’m also passionate about ocean exploration, science communication, and outreach. I sail with the Ocean Exploration Trust doing deep sea reseach onboard the E/V Nautilus. Follow our research at: http://www.nautiluslive.org

Techniques

I believe that having a streamlined, transparent, and reproducible approach to managing data and conducting scientific analysis is of paramount importance to do interdisciplinary and collaborative work. I’m looking forward to deepening my R skills, to become confortable with GitHub, and to expand my skills in visualization and communication of results.

Data

Currently, I don’t have data related to the specific research question stated above. The data that I’ll use in this assignment pertains to a long term ecological assessment of reef fish populations in the lagoons of Rarotonga and Aitutaki for the years 2002 and 2014. This data has been provided by profesor Hunter Lenihan for his course on Applied Marine Ecology.

# read csv
d1 = read.csv('data/juanmayorgahenao_hunterdata.csv')
surgeon <- subset(d1, Species == "Surgeonfish")
trout <- subset(d1, Species == "Coral Trout")
spotted <- subset(d1, Species == "Spotted Damselfish")
yellow <- subset(d1, Species == "Yellow Damselfish")
densities <- data.frame(surgeon$Adults, trout$Adults, spotted$Adults, yellow$Adults)
colnames(densities) <- c("Surgeon", "Coral Trout", "Spotted Damselfish", "Yellow Damselfish")
# output summary
summary(densities)
##     Surgeon       Coral Trout    Spotted Damselfish Yellow Damselfish
##  Min.   : 20.0   Min.   :  4.0   Min.   : 19.0      Min.   :32.0     
##  1st Qu.:192.5   1st Qu.: 31.0   1st Qu.:197.5      1st Qu.:35.0     
##  Median :305.0   Median : 80.0   Median :388.5      Median :58.0     
##  Mean   :255.0   Mean   : 83.5   Mean   :426.5      Mean   :59.5     
##  3rd Qu.:367.5   3rd Qu.:132.5   3rd Qu.:617.5      3rd Qu.:82.5     
##  Max.   :390.0   Max.   :170.0   Max.   :910.0      Max.   :90.0

Wrangling data —-

Reading Data with readr and dplyr

suppressWarnings(library(readr))
suppressWarnings(suppressMessages(library(dplyr)))
d = read_csv('../data/r-ecology/species.csv') %>%
  tbl_df() 

knitr::kable(head(d))
species_id genus species taxa
AB Amphispiza bilineata Bird
AH Ammospermophilus harrisi Rodent
AS Ammodramus savannarum Bird
BA Baiomys taylori Rodent
CB Campylorhynchus brunneicapillus Bird
CM Calamospiza melanocorys Bird
knitr::kable(summary(d))
species_id genus species taxa
Length:54 Length:54 Length:54 Length:54
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character

Gather() and Spread()

# Loading all the required packages
suppressWarnings(library(readr))
suppressWarnings(library(tidyr))
suppressWarnings(library(knitr))
suppressWarnings(library(readxl))
library(dplyr)
library(EDAWR)
library(nycflights13)

# This is the data set being used
kable(cases) 
country 2011 2012 2013
FR 7000 6900 7000
DE 5800 6000 6200
US 15000 14000 13000
# Using the gather() function
cases %>% 
  gather("year","n",2:4) %>% # params : the name of the new key column (string), name of the new value column, which rows to collapse
  kable() 
country year n
FR 2011 7000
DE 2011 5800
US 2011 15000
FR 2012 6900
DE 2012 6000
US 2012 14000
FR 2013 7000
DE 2013 6200
US 2013 13000
# Using the spread() function

casesLong <- gather(cases,"year","n",2:4)
casesLong %>% 
  spread(year,n) %>%  # params:column to use for new keys, column to use for values
  kable()
country 2011 2012 2013
DE 5800 6000 6200
FR 7000 6900 7000
US 15000 14000 13000

Separate() and Unite()

storms %>% 
  kable()
storm wind pressure date
Alberto 110 1007 2000-08-03
Alex 45 1009 1998-07-27
Allison 65 1005 1995-06-03
Ana 40 1013 1997-06-30
Arlene 50 1010 1999-06-11
Arthur 45 1010 1996-06-17
MDYstorms <- separate(storms, date, c("year","month","day"), sep = "-") 
kable(MDYstorms) 
storm wind pressure year month day
Alberto 110 1007 2000 08 03
Alex 45 1009 1998 07 27
Allison 65 1005 1995 06 03
Ana 40 1013 1997 06 30
Arlene 50 1010 1999 06 11
Arthur 45 1010 1996 06 17
StormsUnite <- unite(MDYstorms, "date", year, month, day, sep = "-")
kable(StormsUnite)
storm wind pressure date
Alberto 110 1007 2000-08-03
Alex 45 1009 1998-07-27
Allison 65 1005 1995-06-03
Ana 40 1013 1997-06-30
Arlene 50 1010 1999-06-11
Arthur 45 1010 1996-06-17

Using dplyr

storms %>%
  select(storm, pressure) %>% # Selects some columns from the table 
  kable()
storm pressure
Alberto 1007
Alex 1009
Allison 1005
Ana 1013
Arlene 1010
Arthur 1010
storms %>%
  filter(wind >= 50, storm %in% c("Alberto", "Alex", "Allison")) %>% # %in% is group membership
  kable()
storm wind pressure date
Alberto 110 1007 2000-08-03
Allison 65 1005 1995-06-03
storms %>%
  mutate(ratio = pressure/wind, inverse = ratio^-1) %>%  # This function creates a new variable column by making operations between other columns. 
  kable(digits = 2)
storm wind pressure date ratio inverse
Alberto 110 1007 2000-08-03 9.15 0.11
Alex 45 1009 1998-07-27 22.42 0.04
Allison 65 1005 1995-06-03 15.46 0.06
Ana 40 1013 1997-06-30 25.32 0.04
Arlene 50 1010 1999-06-11 20.20 0.05
Arthur 45 1010 1996-06-17 22.44 0.04
pollution %>%
  summarise(median = median(amount), variance = var(amount), n = n()) %>%  # creates a summary table with the specified stats
  kable()
median variance n
22.5 1731.6 6
storms %>% 
  arrange(desc(wind)) %>%   # This AWESOME function arranges data from min to max or max to min (desc())
  arrange(wind, date) %>% 
  kable()
storm wind pressure date
Ana 40 1013 1997-06-30
Arthur 45 1010 1996-06-17
Alex 45 1009 1998-07-27
Arlene 50 1010 1999-06-11
Allison 65 1005 1995-06-03
Alberto 110 1007 2000-08-03

Selecting the unit of analysis

pollution %>% 
  group_by(city) %>% 
  summarise(mean = mean(amount), sum = sum(amount), n = n()) %>% 
  kable()
city mean sum n
Beijing 88.5 177 2
London 19.0 38 2
New York 18.5 37 2
pollution %>% 
  group_by(size) %>% 
  summarise(mean = mean(amount), sum = sum(amount), n = n()) %>% 
  kable()
size mean sum n
large 55.33333 166 3
small 28.66667 86 3
tb %>% 
  group_by(country, year) %>% 
  head() %>% 
  kable()
country year sex child adult elderly
Afghanistan 1995 female NA NA NA
Afghanistan 1995 male NA NA NA
Afghanistan 1996 female NA NA NA
Afghanistan 1996 male NA NA NA
Afghanistan 1997 female 5 96 1
Afghanistan 1997 male 0 26 0

Joining data —-

bind_cols(y,z) %>% # adds all the columns into one df
  kable()
x1 x2 x1 x2
A 1 B 2
B 2 C 3
C 3 D 4
bind_rows(y,z) %>%  # adds all rows into a df
  kable()
x1 x2
A 1
B 2
C 3
B 2
C 3
D 4
union(y,z) %>% # unites 2 df without producing replicates
  kable()
x1 x2
D 4
C 3
B 2
A 1
intersect(y,z) %>% # find the replicates between df
  kable()
x1 x2
B 2
C 3
setdiff(y,z) %>% # find the different entries between df
  kable()
x1 x2
A 1
left_join(songs, artists, by = 'name') %>% # joins artists to songs using the variable "name" to relate both df
  kable()
song name plays
Across the Universe John guitar
Come Together John guitar
Hello, Goodbye Paul bass
Peggy Sue Buddy NA
left_join(songs2, artists2, by = c('first','last')) %>% 
  kable()
song first last plays
Across the Universe John Lennon guitar
Come Together John Lennon guitar
Hello, Goodbye Paul McCartney bass
Peggy Sue Buddy Holly NA
inner_join(songs, artists, by = 'name')  %>% # same as left_join() but rows that are not related are eliminated
  kable()
song name plays
Across the Universe John guitar
Come Together John guitar
Hello, Goodbye Paul bass
semi_join(songs, artists, by = 'name') %>% # same as join but doesnt add new variable
  kable()
song name
Across the Universe John
Come Together John
Hello, Goodbye Paul
anti_join(songs, artists, by = 'name') %>%  # returns the entries that are not related in both df
  kable()
song name
Peggy Sue Buddy

4. Tidying data: Answers and Tasks

Transforming the CO2 data set

xls = '../data/co2_europa.xls'
co2 = read_excel(xls, skip=12)
## DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 0c 00 e0 00 00 00 2c 00 
## DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 0c 00 e0 00 00 00 2c 00 
## DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 0c 00 e0 00 00 00 2c 00 
## DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 0c 00 e0 00 00 00 2c 00
co2Long <- co2 %>% 
  gather("Year","Emissions", -Country) %>% 
  mutate(Year = as.numeric(Year))

kable(head(co2Long), digits = 2)
Country Year Emissions
Afghanistan 1970 1813.98
Albania 1970 4435.43
Algeria 1970 18850.75
American Samoa 1970 6.18
Angola 1970 8946.50
Anguilla 1970 2.17

What are the top 5 emitting countries for 2014 ?

co2Long %>% 
  filter(Year == 2014, Country != "World", Country != "EU28") %>% 
  arrange(desc(Emissions)) %>% 
  head(n = 5) %>% 
  kable()
Country Year Emissions
China 2014 10540750
United States of America 2014 5334530
India 2014 2341897
Russian Federation 2014 1766427
Japan 2014 1278922

What are the total emissions of the top 5 emitting countries ?

co2Long %>% 
  filter(Country != "World", Country != "EU28") %>% 
  group_by(Country) %>% 
  summarise(Total = sum(Emissions)) %>% 
  arrange(desc(Total)) %>% 
  head(n = 5) %>% 
  kable(format.args = list(big.mark = ","))
Country Total
United States of America 231,948,899
China 174,045,927
Russian Federation 81,242,427
Japan 51,276,329
Germany 43,382,205