Author: Niko Hartline
Acknowledgements: BBest for the great .Rmd guide
## Content
I’d love to examine the locations of endangered species of fish as listed by the IUCN Red List. In freshwater systems it would be interesting to find how nitrogen use may affect endangerment status.
## Techniques
Since the data will be coming from different source (the IUCN Red List and FAOSTAT), effective data concatenation will be key
## Data
IUCN Red List and FAOSTAT will be the two primary data sources for this endeavor.
Singleton_Country_Aggregates = read.csv('./data/zebos1_Singleton_Country_Aggregates.csv')
summary(Singleton_Country_Aggregates)
## ISO3 All_Species_Table FAO_Region FAO_Country
## ABW : 1 Afghanistan : 1 Africa :58 Afghanistan : 1
## AFG : 1 Albania : 1 America:51 Albania : 1
## AGO : 1 Algeria : 1 Asia :51 Algeria : 1
## AIA : 1 AmericanSamoa: 1 Europe :48 American Samoa: 1
## ALA : 1 Andorra : 1 Oceania:25 Andorra : 1
## ALB : 1 Angola : 1 NA's :17 (Other) :228
## (Other):244 (Other) :244 NA's : 17
## CR_EN_VU_AMPHIBIA_Singleton CR_EN_VU_AVES_Singleton
## Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 0.000 Median : 0.000
## Mean : 6.096 Mean : 3.236
## 3rd Qu.: 1.000 3rd Qu.: 1.000
## Max. :150.000 Max. :89.000
##
## CR_EN_VU_MAMMALIA_Singleton CR_EN_VU_REPTILIA_Singleton
## Min. : 0.00 Min. : 0.000
## 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 0.00 Median : 0.000
## Mean : 3.14 Mean : 2.912
## 3rd Qu.: 1.00 3rd Qu.: 1.000
## Max. :112.00 Max. :132.000
##
## CR_EN_VU_PLANTAE_Singleton CR_EN_VU_CHORDATA_Singleton
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 1.00 Median : 1.00
## Mean : 35.88 Mean : 15.38
## 3rd Qu.: 14.00 3rd Qu.: 6.00
## Max. :1750.00 Max. :338.00
##
## CR_EN_VU_TOTAL_Singleton
## Min. : 0.00
## 1st Qu.: 0.00
## Median : 3.00
## Mean : 51.27
## 3rd Qu.: 22.75
## Max. :1883.00
##
## Data Wrangling
##The .. tells R to back up one directory! Extremely useful! . Uses the current folder (I’d assume … backs out two directory folders)
# Run this chunk only once in your Console
# Do not evaluate when knitting Rmarkdown
# list of packages
pkgs = c(
'readr', # read csv
'readxl', # read xls
'dplyr', # data frame manipulation
'tidyr', # data tidying
'nycflights13', # test dataset of NYC flights for 2013
'gapminder') # test dataset of life expectancy and popultion
# install packages if not found
for (p in pkgs){
if (!require(p, character.only=T)){
install.packages(p)
}
}
readr with read_csv is different from read.csv and shows the class of variables. If you see this Ben/Naomi, what are some other differences between the two functions?
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
read_csv('../data/r-ecology/surveys.csv') %>%
select(species_id,year)%>%
#filter(species_id=='NL')%>%
group_by(species_id,year)%>%
count(species_id,year)%>%
head()
## Source: local data frame [6 x 3]
## Groups: species_id [1]
##
## species_id year n
## (chr) (int) (int)
## 1 AB 1980 5
## 2 AB 1981 7
## 3 AB 1982 34
## 4 AB 1983 41
## 5 AB 1984 12
## 6 AB 1985 14