Syllabus: PCE Data Science 350, Methods for Data Analysis
Instructor: Nick McClure (nickmc at uw dot edu)
Course Description
This course is designed to build on what students have learned so far about structuring and manipulating data. By introducing core statistical techniques, students will gain tools that data scientists use to extract insights from data. Students will be asked to apply the course content to real-life scenarios and think creatively as well as critically through issues. Because of this, the class will also focus on developing more advanced programming skills as well. While this class mostly focuses on statistics skills, there are major parts of the class that focus on the programming and implementation of algorithms.
Course Learning Objectives
By the end of the class, students will be able to apply these methods to data and interpret and communicate their results. Topics will include:
- Understand and implement various statistical procedures in R.
- Describe and interpret the results of such procedures and algorithms.
- Expand R programming skills to be able to write/test/log code from scratch.
Course Format
Each course session will be a mixture of lecture and in-class exercises. Typically the materials for each evening include presentation slides, one or more data sets, and R scripts with illustrations and exercises related to the material. There will also be 8 weekly homework assignments which will include a combination of programming and reading. There will also be a final individual project that students will work on over the course of the class.
Course Materials
No required textbooks. All required reading will be available online as articles or free pdf's. There will also be additional optional reading if students wish to read more on a subject, which may invlude books or textbooks.
- Required Reading Sources:
- “An Introduction to Data Science.” By Jeffrey Stanton.
- “Statistical Thinking for Programmers.” By Allen B. Downey.
- Additional Resources (optional):
- “Computational Statistics Using R and R Studio: An Introduction for Scientists” by Randall Pruim.
- "Online Statistics Education: A Multimedia Course of Study" by Rice University. I recommend section 1 (all), section 2 (all), and section 5 (parts A, B) as a brush up on statistics.
- "Team Leada R Tutorial".
- "Data Camp R Tutorial".
- "Code School - Try R"
- "An Introduction to Statistical Learning with Applications in R". By James et al. Free PDF Link
- Further Fun Reading (optional):
- "The Signal and the Noise." By Nate Silver. Penguin Press HC, 2012. Amazon Link
- "Dataclysm". By Christian Rudder. Crown Publishing Group, 2014. Amazon Link
- "The Master Algorithm". By Pedro Domingos. Basic Books 2015. Amazon Link
Technical Requirements
Students are expected to use personal machines in class that are able to:
- Run R [http://cran.r-project.org/] and R-Studio IDE [http://www.rstudio.com/]
- We will spend one day exploring other free tools, such as:
- Python V2.X
- Gephi (Note: Gephi is paticular on a type of Java. Last I checked, it required >= Java 1.6).
Course Topics and Assignments by Date:
(Topics and Dates are tentative and subject to change)
Lecture | Topic | Reading |
---|---|---|
Week 1 | Introduction; Data Exploration; R overview | -Intro DS Ch 3,9; -StatThink Ch 2. |
Week 2 | Probability Distributions; Conditional Prob; Missing Data; Getting/Storing Data | -Intro DS Ch 7,10; -StatThink Ch 4. |
Week 3 | Outliers and Missing Data; Intro to Hypothesis Testing | -Intro DS Ch 6 |
Week 4 | Hypothesis Testing; The Central Limit Theorem; Intro to Regression | -StatThink Ch 6, 7 |
Week 5 | More on Regression; Extra Topic #1 | -StatThink Pg 93-97 |
Week 6 | Regression and Feature Selection | -Intro DS Ch 16 |
Week 7 | Time Series; Spatial Statistics | -None |
Week 8 | EBayesian and Computational Statistics | -StatThink Pg 97-101 |
Week 9 | Guest Lecture; Extra Topic | None |
Week 10 | Review; Possible Extra Topics | None |
Student Assessment
Students MUST attend at least 6 of 10 classes. Your grade will be based on eight homework assignments and one individual project. Details on these will be handed out/distributed on the first day. For complete homework Rubric, please see the class syllabus on the Canvas page.
Extra Articles to read about working in the analytics/data science field:
- "Is Data Scientist the Right Career Choice? Candid Advice"
- Overview of an Analytics Career
- Trey Causey on Data Science Interviews
- Trey Causey on Hiring Data Scientists
- "Crushed it: Landing a Data Science Job" by Erin Shellman
- "Stuff I’ve Messed Up While Interviewing" by Ellen Chisa
- "Doing Data Science at Twitter" by Robert Chang
- "Advice for Data Scientists on Where to Work", Multiple Authors
- "50 Years of Data Science" by David Donoho, This is a ~40 page document, but is full of great insights to all the questions people tend to have about Data Science in general.
Policies and Values:
Your gain from this course is highly dependent on your attendance and completion of the exercises. I fully expect students to actively participate (asking questions, doing the homework, helping others).
Students are expected to behave professionally and abide by all student policies outlined by The University of Washington Student Conduct Code. [http://www.washington.edu/cssc/]