Markdown Assignment 1

Content
Techniques
Data
Wrangling data —-
4. Tidying data: Answers and Tasks

Content

I’m very interested in investigating how can dynamic ecosystem-based management strategies be designed to protect and recover marine resources. In particular, I’m interested in reef-associated predators and their role in ecosystem stability and resilience. Some burning questions are:

What can we learn from studying the populations of reef-associated predators that can inform the design and implementation of dynamic ecosytem-based management ?
How can dynamic ecosystem-based management increase resource stewardship of coastal communities?

I’m also passionate about ocean exploration, science communication, and outreach. I sail with the Ocean Exploration Trust doing deep sea reseach onboard the E/V Nautilus. Follow our research at: http://www.nautiluslive.org

Techniques

I believe that having a streamlined, transparent, and reproducible approach to managing data and conducting scientific analysis is of paramount importance to do interdisciplinary and collaborative work. I’m looking forward to deepening my R skills, to become confortable with GitHub, and to expand my skills in visualization and communication of results.

Data

Currently, I don’t have data related to the specific research question stated above. The data that I’ll use in this assignment pertains to a long term ecological assessment of reef fish populations in the lagoons of Rarotonga and Aitutaki for the years 2002 and 2014. This data has been provided by profesor Hunter Lenihan for his course on Applied Marine Ecology.

# read csv
d1 = read.csv('data/juanmayorgahenao_hunterdata.csv')
surgeon <- subset(d1, Species == "Surgeonfish")
trout <- subset(d1, Species == "Coral Trout")
spotted <- subset(d1, Species == "Spotted Damselfish")
yellow <- subset(d1, Species == "Yellow Damselfish")
densities <- data.frame(surgeon$Adults, trout$Adults, spotted$Adults, yellow$Adults)
colnames(densities) <- c("Surgeon", "Coral Trout", "Spotted Damselfish", "Yellow Damselfish")
# output summary
summary(densities)

##     Surgeon       Coral Trout    Spotted Damselfish Yellow Damselfish
##  Min.   : 20.0   Min.   :  4.0   Min.   : 19.0      Min.   :32.0     
##  1st Qu.:192.5   1st Qu.: 31.0   1st Qu.:197.5      1st Qu.:35.0     
##  Median :305.0   Median : 80.0   Median :388.5      Median :58.0     
##  Mean   :255.0   Mean   : 83.5   Mean   :426.5      Mean   :59.5     
##  3rd Qu.:367.5   3rd Qu.:132.5   3rd Qu.:617.5      3rd Qu.:82.5     
##  Max.   :390.0   Max.   :170.0   Max.   :910.0      Max.   :90.0

Wrangling data —-

Reading Data with readr and dplyr

suppressWarnings(library(readr))
suppressWarnings(suppressMessages(library(dplyr)))
d = read_csv('../data/r-ecology/species.csv') %>%
  tbl_df() 

knitr::kable(head(d))

species_id	genus	species	taxa
AB	Amphispiza	bilineata	Bird
AH	Ammospermophilus	harrisi	Rodent
AS	Ammodramus	savannarum	Bird
BA	Baiomys	taylori	Rodent
CB	Campylorhynchus	brunneicapillus	Bird
CM	Calamospiza	melanocorys	Bird

knitr::kable(summary(d))

species_id	genus	species	taxa
Length:54	Length:54	Length:54	Length:54
Class :character	Class :character	Class :character	Class :character
Mode :character	Mode :character	Mode :character	Mode :character

Gather() and Spread()

# Loading all the required packages
suppressWarnings(library(readr))
suppressWarnings(library(tidyr))
suppressWarnings(library(knitr))
suppressWarnings(library(readxl))
library(dplyr)
library(EDAWR)
library(nycflights13)

# This is the data set being used
kable(cases)

country	2011	2012	2013
FR	7000	6900	7000
DE	5800	6000	6200
US	15000	14000	13000

# Using the gather() function
cases %>% 
  gather("year","n",2:4) %>% # params : the name of the new key column (string), name of the new value column, which rows to collapse
  kable()

country	year	n
FR	2011	7000
DE	2011	5800
US	2011	15000
FR	2012	6900
DE	2012	6000
US	2012	14000
FR	2013	7000
DE	2013	6200
US	2013	13000

# Using the spread() function

casesLong <- gather(cases,"year","n",2:4)
casesLong %>% 
  spread(year,n) %>%  # params:column to use for new keys, column to use for values
  kable()

country	2011	2012	2013
DE	5800	6000	6200
FR	7000	6900	7000
US	15000	14000	13000

Separate() and Unite()

storms %>% 
  kable()

storm	wind	pressure	date
Alberto	110	1007	2000-08-03
Alex	45	1009	1998-07-27
Allison	65	1005	1995-06-03
Ana	40	1013	1997-06-30
Arlene	50	1010	1999-06-11
Arthur	45	1010	1996-06-17

MDYstorms <- separate(storms, date, c("year","month","day"), sep = "-") 
kable(MDYstorms)

storm	wind	pressure	year	month	day
Alberto	110	1007	2000	08	03
Alex	45	1009	1998	07	27
Allison	65	1005	1995	06	03
Ana	40	1013	1997	06	30
Arlene	50	1010	1999	06	11
Arthur	45	1010	1996	06	17

StormsUnite <- unite(MDYstorms, "date", year, month, day, sep = "-")
kable(StormsUnite)

storm	wind	pressure	date
Alberto	110	1007	2000-08-03
Alex	45	1009	1998-07-27
Allison	65	1005	1995-06-03
Ana	40	1013	1997-06-30
Arlene	50	1010	1999-06-11
Arthur	45	1010	1996-06-17

Using dplyr

storms %>%
  select(storm, pressure) %>% # Selects some columns from the table 
  kable()

storm	pressure
Alberto	1007
Alex	1009
Allison	1005
Ana	1013
Arlene	1010
Arthur	1010

storms %>%
  filter(wind >= 50, storm %in% c("Alberto", "Alex", "Allison")) %>% # %in% is group membership
  kable()

storm	wind	pressure	date
Alberto	110	1007	2000-08-03
Allison	65	1005	1995-06-03

storms %>%
  mutate(ratio = pressure/wind, inverse = ratio^-1) %>%  # This function creates a new variable column by making operations between other columns. 
  kable(digits = 2)

storm	wind	pressure	date	ratio	inverse
Alberto	110	1007	2000-08-03	9.15	0.11
Alex	45	1009	1998-07-27	22.42	0.04
Allison	65	1005	1995-06-03	15.46	0.06
Ana	40	1013	1997-06-30	25.32	0.04
Arlene	50	1010	1999-06-11	20.20	0.05
Arthur	45	1010	1996-06-17	22.44	0.04

pollution %>%
  summarise(median = median(amount), variance = var(amount), n = n()) %>%  # creates a summary table with the specified stats
  kable()

median	variance	n
22.5	1731.6	6

storms %>% 
  arrange(desc(wind)) %>%   # This AWESOME function arranges data from min to max or max to min (desc())
  arrange(wind, date) %>% 
  kable()

storm	wind	pressure	date
Ana	40	1013	1997-06-30
Arthur	45	1010	1996-06-17
Alex	45	1009	1998-07-27
Arlene	50	1010	1999-06-11
Allison	65	1005	1995-06-03
Alberto	110	1007	2000-08-03

Selecting the unit of analysis

pollution %>% 
  group_by(city) %>% 
  summarise(mean = mean(amount), sum = sum(amount), n = n()) %>% 
  kable()

city	mean	sum	n
Beijing	88.5	177	2
London	19.0	38	2
New York	18.5	37	2

pollution %>% 
  group_by(size) %>% 
  summarise(mean = mean(amount), sum = sum(amount), n = n()) %>% 
  kable()

size	mean	sum	n
large	55.33333	166	3
small	28.66667	86	3

tb %>% 
  group_by(country, year) %>% 
  head() %>% 
  kable()

country	year	sex	child	adult	elderly
Afghanistan	1995	female	NA	NA	NA
Afghanistan	1995	male	NA	NA	NA
Afghanistan	1996	female	NA	NA	NA
Afghanistan	1996	male	NA	NA	NA
Afghanistan	1997	female	5	96	1
Afghanistan	1997	male	0	26	0

Joining data —-

bind_cols(y,z) %>% # adds all the columns into one df
  kable()

x1	x2	x1	x2
A	1	B	2
B	2	C	3
C	3	D	4

bind_rows(y,z) %>%  # adds all rows into a df
  kable()

x1	x2
A	1
B	2
C	3
B	2
C	3
D	4

union(y,z) %>% # unites 2 df without producing replicates
  kable()

x1	x2
D	4
C	3
B	2
A	1

intersect(y,z) %>% # find the replicates between df
  kable()

x1	x2
B	2
C	3

setdiff(y,z) %>% # find the different entries between df
  kable()

x1	x2
A	1

left_join(songs, artists, by = 'name') %>% # joins artists to songs using the variable "name" to relate both df
  kable()

song	name	plays
Across the Universe	John	guitar
Come Together	John	guitar
Hello, Goodbye	Paul	bass
Peggy Sue	Buddy	NA

left_join(songs2, artists2, by = c('first','last')) %>% 
  kable()

song	first	last	plays
Across the Universe	John	Lennon	guitar
Come Together	John	Lennon	guitar
Hello, Goodbye	Paul	McCartney	bass
Peggy Sue	Buddy	Holly	NA

inner_join(songs, artists, by = 'name')  %>% # same as left_join() but rows that are not related are eliminated
  kable()

song	name	plays
Across the Universe	John	guitar
Come Together	John	guitar
Hello, Goodbye	Paul	bass

semi_join(songs, artists, by = 'name') %>% # same as join but doesnt add new variable
  kable()

song	name
Across the Universe	John
Come Together	John
Hello, Goodbye	Paul

anti_join(songs, artists, by = 'name') %>%  # returns the entries that are not related in both df
  kable()

song	name
Peggy Sue	Buddy

4. Tidying data: Answers and Tasks

Transforming the CO₂ data set

xls = '../data/co2_europa.xls'
co2 = read_excel(xls, skip=12)

## DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 0c 00 e0 00 00 00 2c 00 
## DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 0c 00 e0 00 00 00 2c 00 
## DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 0c 00 e0 00 00 00 2c 00 
## DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 0c 00 e0 00 00 00 2c 00

co2Long <- co2 %>% 
  gather("Year","Emissions", -Country) %>% 
  mutate(Year = as.numeric(Year))

kable(head(co2Long), digits = 2)

Country	Year	Emissions
Afghanistan	1970	1813.98
Albania	1970	4435.43
Algeria	1970	18850.75
American Samoa	1970	6.18
Angola	1970	8946.50
Anguilla	1970	2.17

What are the top 5 emitting countries for 2014 ?

co2Long %>% 
  filter(Year == 2014, Country != "World", Country != "EU28") %>% 
  arrange(desc(Emissions)) %>% 
  head(n = 5) %>% 
  kable()

Country	Year	Emissions
China	2014	10540750
United States of America	2014	5334530
India	2014	2341897
Russian Federation	2014	1766427
Japan	2014	1278922

What are the total emissions of the top 5 emitting countries ?

co2Long %>% 
  filter(Country != "World", Country != "EU28") %>% 
  group_by(Country) %>% 
  summarise(Total = sum(Emissions)) %>% 
  arrange(desc(Total)) %>% 
  head(n = 5) %>% 
  kable(format.args = list(big.mark = ","))

Country	Total
United States of America	231,948,899
China	174,045,927
Russian Federation	81,242,427
Japan	51,276,329
Germany	43,382,205

Markdown Assignment 1

Juan S. Mayorga - https://github.com/fish-ecol/fish-ecol.github.io

January 16, 2016

Content

Techniques

Data

Wrangling data —-

Reading Data with readr and dplyr

Gather() and Spread()

Separate() and Unite()

Using dplyr

Selecting the unit of analysis

Joining data —-

4. Tidying data: Answers and Tasks

Transforming the CO₂ data set

What are the top 5 emitting countries for 2014 ?

What are the total emissions of the top 5 emitting countries ?

Markdown Assignment 1

Juan S. Mayorga - https://github.com/fish-ecol/fish-ecol.github.io

January 16, 2016

Content

Techniques

Data

Wrangling data —-

Reading Data with readr and dplyr

Gather() and Spread()

Separate() and Unite()

Using dplyr

Selecting the unit of analysis

Joining data —-

4. Tidying data: Answers and Tasks

Transforming the CO2 data set

What are the top 5 emitting countries for 2014 ?

What are the total emissions of the top 5 emitting countries ?

Transforming the CO₂ data set