I’m very interested in investigating how can dynamic ecosystem-based management strategies be designed to protect and recover marine resources. In particular, I’m interested in reef-associated predators and their role in ecosystem stability and resilience. Some burning questions are:
- What can we learn from studying the populations of reef-associated predators that can inform the design and implementation of dynamic ecosytem-based management ?
- How can dynamic ecosystem-based management increase resource stewardship of coastal communities?
I’m also passionate about ocean exploration, science communication, and outreach. I sail with the Ocean Exploration Trust doing deep sea reseach onboard the E/V Nautilus. Follow our research at:
I believe that having a streamlined, transparent, and reproducible approach to managing data and conducting scientific analysis is of paramount importance to do interdisciplinary and collaborative work. I’m looking forward to deepening my R skills, to become confortable with GitHub, and to expand my skills in visualization and communication of results.
Currently, I don’t have data related to the specific research question stated above. The data that I’ll use in this assignment pertains to a long term ecological assessment of reef fish populations in the lagoons of Rarotonga and Aitutaki for the years 2002 and 2014. This data has been provided by profesor Hunter Lenihan for his course on Applied Marine Ecology.
# read csv
d1 = read.csv('data/juanmayorgahenao_hunterdata.csv')
surgeon <- subset(d1, Species == "Surgeonfish")
trout <- subset(d1, Species == "Coral Trout")
spotted <- subset(d1, Species == "Spotted Damselfish")
yellow <- subset(d1, Species == "Yellow Damselfish")
densities <- data.frame(surgeon$Adults, trout$Adults, spotted$Adults, yellow$Adults)
colnames(densities) <- c("Surgeon", "Coral Trout", "Spotted Damselfish", "Yellow Damselfish")
# output summary
## Surgeon Coral Trout Spotted Damselfish Yellow Damselfish
## Min. : 20.0 Min. : 4.0 Min. : 19.0 Min. :32.0
## 1st Qu.:192.5 1st Qu.: 31.0 1st Qu.:197.5 1st Qu.:35.0
## Median :305.0 Median : 80.0 Median :388.5 Median :58.0
## Mean :255.0 Mean : 83.5 Mean :426.5 Mean :59.5
## 3rd Qu.:367.5 3rd Qu.:132.5 3rd Qu.:617.5 3rd Qu.:82.5
## Max. :390.0 Max. :170.0 Max. :910.0 Max. :90.0
Wrangling data —-
Reading Data with readr and dplyr
d = read_csv('../data/r-ecology/species.csv') %>%
AB |
Amphispiza |
bilineata |
Bird |
AH |
Ammospermophilus |
harrisi |
Rodent |
AS |
Ammodramus |
savannarum |
Bird |
BA |
Baiomys |
taylori |
Rodent |
CB |
Campylorhynchus |
brunneicapillus |
Bird |
CM |
Calamospiza |
melanocorys |
Bird |
Length:54 |
Length:54 |
Length:54 |
Length:54 |
Class :character |
Class :character |
Class :character |
Class :character |
Mode :character |
Mode :character |
Mode :character |
Mode :character |
Gather() and Spread()
# Loading all the required packages
# This is the data set being used
FR |
7000 |
6900 |
7000 |
DE |
5800 |
6000 |
6200 |
US |
15000 |
14000 |
13000 |
# Using the gather() function
cases %>%
gather("year","n",2:4) %>% # params : the name of the new key column (string), name of the new value column, which rows to collapse
FR |
2011 |
7000 |
DE |
2011 |
5800 |
US |
2011 |
15000 |
FR |
2012 |
6900 |
DE |
2012 |
6000 |
US |
2012 |
14000 |
FR |
2013 |
7000 |
DE |
2013 |
6200 |
US |
2013 |
13000 |
# Using the spread() function
casesLong <- gather(cases,"year","n",2:4)
casesLong %>%
spread(year,n) %>% # params:column to use for new keys, column to use for values
DE |
5800 |
6000 |
6200 |
FR |
7000 |
6900 |
7000 |
US |
15000 |
14000 |
13000 |
Separate() and Unite()
storms %>%
Alberto |
110 |
1007 |
2000-08-03 |
Alex |
45 |
1009 |
1998-07-27 |
Allison |
65 |
1005 |
1995-06-03 |
Ana |
40 |
1013 |
1997-06-30 |
Arlene |
50 |
1010 |
1999-06-11 |
Arthur |
45 |
1010 |
1996-06-17 |
MDYstorms <- separate(storms, date, c("year","month","day"), sep = "-")
Alberto |
110 |
1007 |
2000 |
08 |
03 |
Alex |
45 |
1009 |
1998 |
07 |
27 |
Allison |
65 |
1005 |
1995 |
06 |
03 |
Ana |
40 |
1013 |
1997 |
06 |
30 |
Arlene |
50 |
1010 |
1999 |
06 |
11 |
Arthur |
45 |
1010 |
1996 |
06 |
17 |
StormsUnite <- unite(MDYstorms, "date", year, month, day, sep = "-")
Alberto |
110 |
1007 |
2000-08-03 |
Alex |
45 |
1009 |
1998-07-27 |
Allison |
65 |
1005 |
1995-06-03 |
Ana |
40 |
1013 |
1997-06-30 |
Arlene |
50 |
1010 |
1999-06-11 |
Arthur |
45 |
1010 |
1996-06-17 |
Using dplyr
storms %>%
select(storm, pressure) %>% # Selects some columns from the table
Alberto |
1007 |
Alex |
1009 |
Allison |
1005 |
Ana |
1013 |
Arlene |
1010 |
Arthur |
1010 |
storms %>%
filter(wind >= 50, storm %in% c("Alberto", "Alex", "Allison")) %>% # %in% is group membership
Alberto |
110 |
1007 |
2000-08-03 |
Allison |
65 |
1005 |
1995-06-03 |
storms %>%
mutate(ratio = pressure/wind, inverse = ratio^-1) %>% # This function creates a new variable column by making operations between other columns.
kable(digits = 2)
Alberto |
110 |
1007 |
2000-08-03 |
9.15 |
0.11 |
Alex |
45 |
1009 |
1998-07-27 |
22.42 |
0.04 |
Allison |
65 |
1005 |
1995-06-03 |
15.46 |
0.06 |
Ana |
40 |
1013 |
1997-06-30 |
25.32 |
0.04 |
Arlene |
50 |
1010 |
1999-06-11 |
20.20 |
0.05 |
Arthur |
45 |
1010 |
1996-06-17 |
22.44 |
0.04 |
pollution %>%
summarise(median = median(amount), variance = var(amount), n = n()) %>% # creates a summary table with the specified stats
storms %>%
arrange(desc(wind)) %>% # This AWESOME function arranges data from min to max or max to min (desc())
arrange(wind, date) %>%
Ana |
40 |
1013 |
1997-06-30 |
Arthur |
45 |
1010 |
1996-06-17 |
Alex |
45 |
1009 |
1998-07-27 |
Arlene |
50 |
1010 |
1999-06-11 |
Allison |
65 |
1005 |
1995-06-03 |
Alberto |
110 |
1007 |
2000-08-03 |
Selecting the unit of analysis
pollution %>%
group_by(city) %>%
summarise(mean = mean(amount), sum = sum(amount), n = n()) %>%
Beijing |
88.5 |
177 |
2 |
London |
19.0 |
38 |
2 |
New York |
18.5 |
37 |
2 |
pollution %>%
group_by(size) %>%
summarise(mean = mean(amount), sum = sum(amount), n = n()) %>%
large |
55.33333 |
166 |
3 |
small |
28.66667 |
86 |
3 |
tb %>%
group_by(country, year) %>%
head() %>%
Afghanistan |
1995 |
female |
NA |
NA |
NA |
Afghanistan |
1995 |
male |
NA |
NA |
NA |
Afghanistan |
1996 |
female |
NA |
NA |
NA |
Afghanistan |
1996 |
male |
NA |
NA |
NA |
Afghanistan |
1997 |
female |
5 |
96 |
1 |
Afghanistan |
1997 |
male |
0 |
26 |
0 |
Joining data —-
bind_cols(y,z) %>% # adds all the columns into one df
bind_rows(y,z) %>% # adds all rows into a df
union(y,z) %>% # unites 2 df without producing replicates
intersect(y,z) %>% # find the replicates between df
setdiff(y,z) %>% # find the different entries between df
left_join(songs, artists, by = 'name') %>% # joins artists to songs using the variable "name" to relate both df
Across the Universe |
John |
guitar |
Come Together |
John |
guitar |
Hello, Goodbye |
Paul |
bass |
Peggy Sue |
Buddy |
NA |
left_join(songs2, artists2, by = c('first','last')) %>%
Across the Universe |
John |
Lennon |
guitar |
Come Together |
John |
Lennon |
guitar |
Hello, Goodbye |
Paul |
McCartney |
bass |
Peggy Sue |
Buddy |
Holly |
NA |
inner_join(songs, artists, by = 'name') %>% # same as left_join() but rows that are not related are eliminated
Across the Universe |
John |
guitar |
Come Together |
John |
guitar |
Hello, Goodbye |
Paul |
bass |
semi_join(songs, artists, by = 'name') %>% # same as join but doesnt add new variable
Across the Universe |
John |
Come Together |
John |
Hello, Goodbye |
Paul |
anti_join(songs, artists, by = 'name') %>% # returns the entries that are not related in both df
4. Tidying data: Answers and Tasks
What are the top 5 emitting countries for 2014 ?
co2Long %>%
filter(Year == 2014, Country != "World", Country != "EU28") %>%
arrange(desc(Emissions)) %>%
head(n = 5) %>%
China |
2014 |
10540750 |
United States of America |
2014 |
5334530 |
India |
2014 |
2341897 |
Russian Federation |
2014 |
1766427 |
Japan |
2014 |
1278922 |
What are the total emissions of the top 5 emitting countries ?
co2Long %>%
filter(Country != "World", Country != "EU28") %>%
group_by(Country) %>%
summarise(Total = sum(Emissions)) %>%
arrange(desc(Total)) %>%
head(n = 5) %>%
kable(format.args = list(big.mark = ","))
United States of America |
231,948,899 |
China |
174,045,927 |
Russian Federation |
81,242,427 |
Japan |
51,276,329 |
Germany |
43,382,205 |