Week 3: Reading and Wrangling Data
This week we’ll dive into reading and manipulating, ie “wrangling” (cowyboy style), the data. Yeehaw!
We’ll start by recapping the “conversation on code” we started having by using Github, especially through pull requests and issues.
We’ll also hear about best practices for data management from UCSB librarian Stephanie Tulley.
8:30 - 9:30 am: Wrangling Data (individual)
9:30 - 10:30 am: Data Management Plan (group)
10:30 - 11:30 am: Wrangling Data (group)
- group assignment to generate a data managment plan
Due: Jan 28, Thursday 5pm
Ensure you have the latest from
bren-ucsb/env-infoby issuing a pull request to your
<user>/env-info(You may need to “switch the base”.) Since you have write permissions on
<user>/env-info, you should then Merge changes.
Work through the wk03_dplyr and wrangling-webinar.pdf pdfs by typing in code as R chunks into your
env-info/students/<user>.Rmd. I recommend starting this section with a
## Data Wranglingheader and use subheaders below to match the instructions, like
### Multiple Variables. Be sure to knit to
students/<user>.html, commit changes locally with a message, push to your
github.com/<user>/env-infoand submit as a pull request to
Generate a Data Management Plan
Use the DMP Tool and select the DMP Template for National Science Foundation > NSF-EAR: Earth Sciences.
Transfer the headings and your group’s specific text into an
index.Rmdfrom your group project’s
<org>.github.iorepository. When you knit the
index.Rmd, the output
index.htmlwill become your group project’s home page viewable at
Per your github workflow, be sure to pull the latest changes from other members, commit changes with a message, and push to your
When I look at the github blame history of your group’s
index.Rmdfile, I want to see that every member has contributed by pulling and pushing changes from their computer.
datafolder and csv/xls/etc files inside. (Note that empty folders are not recognized by Git, only when they have files inside.)
At the bottom of your group repo’s
index.Rmd, add a header
## Data Questionand type a question similar to wk03_dplyr for a csv of your choice (besides
surveys.csvand hopefully relevant to your group’s area of study) like How many observations of species ‘NL’ appear each year?. Answering your question should require chaining the following
Include the R chunk below the question and knit the
index.html. Be sure to push your results so they show up on the site
When I look at the Blame for your
index.Rmd, I would like to see that every member of the group contributed. You can make up another question, add comments, improve code, etc.
- Best Practices Primer | DataONE
- Data Management Guide for Public Participation | DataONE
- Education Modules | DataONE
Data Wrangling in R
Git, Github and RStudio
- Git and GitHub cheat sheet
- Git and GitHub with RStudio
- PLOS Computational Biology: A Quick Introduction to Version Control with Git and GitHub