Week 3: Reading and Wrangling Data

Objectives

This week we’ll dive into reading and manipulating, ie “wrangling” (cowyboy style), the data. Yeehaw!

We’ll start by recapping the “conversation on code” we started having by using Github, especially through pull requests and issues.

We’ll also hear about best practices for data management from UCSB librarian Stephanie Tulley.

Schedule

8:30 - 9:30 am: Wrangling Data (individual)
- wk03_dplyr: recap Github, command line navigation, readr, dplyr, tidyr
- wrangling-webinar.pdf
- individual assignment to work on env-info/students/<user>.Rmd
9:30 - 10:30 am: Data Management Plan (group)
- Break [10 min]
- Introduction to the Data Management Planning Tool (DMP Tool) by Stephanie Tulley from UCSB Library
- group assignment to generate a data managment plan
10:30 - 11:30 am: Wrangling Data (group)
- group assignment to generate a data managment plan

Assignment

Due: Jan 28, Thursday 5pm

Individual

Ensure you have the latest from bren-ucsb/env-info by issuing a pull request to your <user>/env-info (You may need to “switch the base”.) Since you have write permissions on <user>/env-info, you should then Merge changes.
Work through the [**wk03dplyr**](/ESM296-3W-2016/wk03_dplyr.html) and wrangling-webinar.pdf pdfs by typing in code as R chunks into your env-info/students/<user>.Rmd. I recommend starting this section with a ## Data Wrangling header and use subheaders below to match the instructions, like ### Multiple Variables. Be sure to knit to students/<user>.html, _commit changes locally with a message, push to your github.com/<user>/env-info and submit as a pull request to github.com/ucsb-bren/env-info.

Group

Generate a Data Management Plan
- Use the DMP Tool and select the DMP Template for National Science Foundation > NSF-EAR: Earth Sciences.
- Transfer the headings and your group’s specific text into an index.Rmd from your group project’s <org>.github.io repository. When you knit the index.Rmd, the output index.html will become your group project’s home page viewable at http://<org>.github.io.
- Per your github workflow, be sure to pull the latest changes from other members, commit changes with a message, and push to your github.com/<org>/<repo>.
- When I look at the github blame history of your group’s index.Rmd file, I want to see that every member has contributed by pulling and pushing changes from their computer.
Wrangle Data
- Add a data folder and csv/xls/etc files inside. (Note that empty folders are not recognized by Git, only when they have files inside.)
- At the bottom of your group repo’s index.Rmd, add a header ## Data Question and type a question similar to [**wk03dplyr](/ESM296-3W-2016/wk03_dplyr.html) for a csv of your choice (besides surveys.csv and hopefully relevant to your group’s area of study) like _How many observations of species ‘NL’ appear each year?**. Answering your question should require chaining the following dplyr functions:
  - select()
  - filter()
  - group_by()
  - summarize()
  Include the R chunk below the question and knit the index.Rmd into index.html. Be sure to push your results so they show up on the site http://<org>.github.io.
  
  When I look at the Blame for your index.Rmd, I would like to see that every member of the group contributed. You can make up another question, add comments, improve code, etc.

Week 3: Reading and Wrangling Data

Objectives

Schedule

Assignment

Individual

Group

Resources

Command Line

Data Management

Data Wrangling in R

Git, Github and RStudio

Rmarkdown