Content
I’m very interested in investigating how can dynamic ecosystem-based management strategies be designed to protect and recover marine resources. In particular, I’m interested in reef-associated predators and their role in ecosystem stability and resilience. Some burning questions are:
- What can we learn from studying the populations of reef-associated predators that can inform the design and implementation of dynamic ecosytem-based management ?
- How can dynamic ecosystem-based management increase resource stewardship of coastal communities?
I’m also passionate about ocean exploration, science communication, and outreach. I sail with the Ocean Exploration Trust doing deep sea reseach onboard the E/V Nautilus. Follow our research at: http://www.nautiluslive.org
Techniques
I believe that having a streamlined, transparent, and reproducible approach to managing data and conducting scientific analysis is of paramount importance to do interdisciplinary and collaborative work. I’m looking forward to deepening my R skills, to become confortable with GitHub, and to expand my skills in visualization and communication of results.
Data
Currently, I don’t have data related to the specific research question stated above. The data that I’ll use in this assignment pertains to a long term ecological assessment of reef fish populations in the lagoons of Rarotonga and Aitutaki for the years 2002 and 2014. This data has been provided by profesor Hunter Lenihan for his course on Applied Marine Ecology.
# read csv
d1 = read.csv('data/juanmayorgahenao_hunterdata.csv')
surgeon <- subset(d1, Species == "Surgeonfish")
trout <- subset(d1, Species == "Coral Trout")
spotted <- subset(d1, Species == "Spotted Damselfish")
yellow <- subset(d1, Species == "Yellow Damselfish")
densities <- data.frame(surgeon$Adults, trout$Adults, spotted$Adults, yellow$Adults)
colnames(densities) <- c("Surgeon", "Coral Trout", "Spotted Damselfish", "Yellow Damselfish")
# output summary
summary(densities)
## Surgeon Coral Trout Spotted Damselfish Yellow Damselfish
## Min. : 20.0 Min. : 4.0 Min. : 19.0 Min. :32.0
## 1st Qu.:192.5 1st Qu.: 31.0 1st Qu.:197.5 1st Qu.:35.0
## Median :305.0 Median : 80.0 Median :388.5 Median :58.0
## Mean :255.0 Mean : 83.5 Mean :426.5 Mean :59.5
## 3rd Qu.:367.5 3rd Qu.:132.5 3rd Qu.:617.5 3rd Qu.:82.5
## Max. :390.0 Max. :170.0 Max. :910.0 Max. :90.0
Wrangling data —-
Reading Data with readr and dplyr
suppressWarnings(library(readr))
suppressWarnings(suppressMessages(library(dplyr)))
d = read_csv('../data/r-ecology/species.csv') %>%
tbl_df()
knitr::kable(head(d))
AB |
Amphispiza |
bilineata |
Bird |
AH |
Ammospermophilus |
harrisi |
Rodent |
AS |
Ammodramus |
savannarum |
Bird |
BA |
Baiomys |
taylori |
Rodent |
CB |
Campylorhynchus |
brunneicapillus |
Bird |
CM |
Calamospiza |
melanocorys |
Bird |
knitr::kable(summary(d))
|
Length:54 |
Length:54 |
Length:54 |
Length:54 |
|
Class :character |
Class :character |
Class :character |
Class :character |
|
Mode :character |
Mode :character |
Mode :character |
Mode :character |
Gather() and Spread()
# Loading all the required packages
suppressWarnings(library(readr))
suppressWarnings(library(tidyr))
suppressWarnings(library(knitr))
suppressWarnings(library(readxl))
library(dplyr)
library(EDAWR)
library(nycflights13)
# This is the data set being used
kable(cases)
FR |
7000 |
6900 |
7000 |
DE |
5800 |
6000 |
6200 |
US |
15000 |
14000 |
13000 |
# Using the gather() function
cases %>%
gather("year","n",2:4) %>% # params : the name of the new key column (string), name of the new value column, which rows to collapse
kable()
FR |
2011 |
7000 |
DE |
2011 |
5800 |
US |
2011 |
15000 |
FR |
2012 |
6900 |
DE |
2012 |
6000 |
US |
2012 |
14000 |
FR |
2013 |
7000 |
DE |
2013 |
6200 |
US |
2013 |
13000 |
# Using the spread() function
casesLong <- gather(cases,"year","n",2:4)
casesLong %>%
spread(year,n) %>% # params:column to use for new keys, column to use for values
kable()
DE |
5800 |
6000 |
6200 |
FR |
7000 |
6900 |
7000 |
US |
15000 |
14000 |
13000 |
Separate() and Unite()
storms %>%
kable()
Alberto |
110 |
1007 |
2000-08-03 |
Alex |
45 |
1009 |
1998-07-27 |
Allison |
65 |
1005 |
1995-06-03 |
Ana |
40 |
1013 |
1997-06-30 |
Arlene |
50 |
1010 |
1999-06-11 |
Arthur |
45 |
1010 |
1996-06-17 |
MDYstorms <- separate(storms, date, c("year","month","day"), sep = "-")
kable(MDYstorms)
Alberto |
110 |
1007 |
2000 |
08 |
03 |
Alex |
45 |
1009 |
1998 |
07 |
27 |
Allison |
65 |
1005 |
1995 |
06 |
03 |
Ana |
40 |
1013 |
1997 |
06 |
30 |
Arlene |
50 |
1010 |
1999 |
06 |
11 |
Arthur |
45 |
1010 |
1996 |
06 |
17 |
StormsUnite <- unite(MDYstorms, "date", year, month, day, sep = "-")
kable(StormsUnite)
Alberto |
110 |
1007 |
2000-08-03 |
Alex |
45 |
1009 |
1998-07-27 |
Allison |
65 |
1005 |
1995-06-03 |
Ana |
40 |
1013 |
1997-06-30 |
Arlene |
50 |
1010 |
1999-06-11 |
Arthur |
45 |
1010 |
1996-06-17 |
Using dplyr
storms %>%
select(storm, pressure) %>% # Selects some columns from the table
kable()
Alberto |
1007 |
Alex |
1009 |
Allison |
1005 |
Ana |
1013 |
Arlene |
1010 |
Arthur |
1010 |
storms %>%
filter(wind >= 50, storm %in% c("Alberto", "Alex", "Allison")) %>% # %in% is group membership
kable()
Alberto |
110 |
1007 |
2000-08-03 |
Allison |
65 |
1005 |
1995-06-03 |
storms %>%
mutate(ratio = pressure/wind, inverse = ratio^-1) %>% # This function creates a new variable column by making operations between other columns.
kable(digits = 2)
Alberto |
110 |
1007 |
2000-08-03 |
9.15 |
0.11 |
Alex |
45 |
1009 |
1998-07-27 |
22.42 |
0.04 |
Allison |
65 |
1005 |
1995-06-03 |
15.46 |
0.06 |
Ana |
40 |
1013 |
1997-06-30 |
25.32 |
0.04 |
Arlene |
50 |
1010 |
1999-06-11 |
20.20 |
0.05 |
Arthur |
45 |
1010 |
1996-06-17 |
22.44 |
0.04 |
pollution %>%
summarise(median = median(amount), variance = var(amount), n = n()) %>% # creates a summary table with the specified stats
kable()
storms %>%
arrange(desc(wind)) %>% # This AWESOME function arranges data from min to max or max to min (desc())
arrange(wind, date) %>%
kable()
Ana |
40 |
1013 |
1997-06-30 |
Arthur |
45 |
1010 |
1996-06-17 |
Alex |
45 |
1009 |
1998-07-27 |
Arlene |
50 |
1010 |
1999-06-11 |
Allison |
65 |
1005 |
1995-06-03 |
Alberto |
110 |
1007 |
2000-08-03 |
Selecting the unit of analysis
pollution %>%
group_by(city) %>%
summarise(mean = mean(amount), sum = sum(amount), n = n()) %>%
kable()
Beijing |
88.5 |
177 |
2 |
London |
19.0 |
38 |
2 |
New York |
18.5 |
37 |
2 |
pollution %>%
group_by(size) %>%
summarise(mean = mean(amount), sum = sum(amount), n = n()) %>%
kable()
large |
55.33333 |
166 |
3 |
small |
28.66667 |
86 |
3 |
tb %>%
group_by(country, year) %>%
head() %>%
kable()
Afghanistan |
1995 |
female |
NA |
NA |
NA |
Afghanistan |
1995 |
male |
NA |
NA |
NA |
Afghanistan |
1996 |
female |
NA |
NA |
NA |
Afghanistan |
1996 |
male |
NA |
NA |
NA |
Afghanistan |
1997 |
female |
5 |
96 |
1 |
Afghanistan |
1997 |
male |
0 |
26 |
0 |
Joining data —-
bind_cols(y,z) %>% # adds all the columns into one df
kable()
bind_rows(y,z) %>% # adds all rows into a df
kable()
union(y,z) %>% # unites 2 df without producing replicates
kable()
intersect(y,z) %>% # find the replicates between df
kable()
setdiff(y,z) %>% # find the different entries between df
kable()
left_join(songs, artists, by = 'name') %>% # joins artists to songs using the variable "name" to relate both df
kable()
Across the Universe |
John |
guitar |
Come Together |
John |
guitar |
Hello, Goodbye |
Paul |
bass |
Peggy Sue |
Buddy |
NA |
left_join(songs2, artists2, by = c('first','last')) %>%
kable()
Across the Universe |
John |
Lennon |
guitar |
Come Together |
John |
Lennon |
guitar |
Hello, Goodbye |
Paul |
McCartney |
bass |
Peggy Sue |
Buddy |
Holly |
NA |
inner_join(songs, artists, by = 'name') %>% # same as left_join() but rows that are not related are eliminated
kable()
Across the Universe |
John |
guitar |
Come Together |
John |
guitar |
Hello, Goodbye |
Paul |
bass |
semi_join(songs, artists, by = 'name') %>% # same as join but doesnt add new variable
kable()
Across the Universe |
John |
Come Together |
John |
Hello, Goodbye |
Paul |
anti_join(songs, artists, by = 'name') %>% # returns the entries that are not related in both df
kable()
4. Tidying data: Answers and Tasks
What are the top 5 emitting countries for 2014 ?
co2Long %>%
filter(Year == 2014, Country != "World", Country != "EU28") %>%
arrange(desc(Emissions)) %>%
head(n = 5) %>%
kable()
China |
2014 |
10540750 |
United States of America |
2014 |
5334530 |
India |
2014 |
2341897 |
Russian Federation |
2014 |
1766427 |
Japan |
2014 |
1278922 |
What are the total emissions of the top 5 emitting countries ?
co2Long %>%
filter(Country != "World", Country != "EU28") %>%
group_by(Country) %>%
summarise(Total = sum(Emissions)) %>%
arrange(desc(Total)) %>%
head(n = 5) %>%
kable(format.args = list(big.mark = ","))
United States of America |
231,948,899 |
China |
174,045,927 |
Russian Federation |
81,242,427 |
Japan |
51,276,329 |
Germany |
43,382,205 |