Datascience350

Notes for Data Science 350 Class

Download .zip Download .tar.gz View on GitHub

Syllabus: PCE Data Science 350, Methods for Data Analysis

Instructor: Nick McClure (nickmc at uw dot edu)

Course Description

This course is designed to build on what students have learned so far about structuring and manipulating data. By introducing core statistical techniques, students will gain tools that data scientists use to extract insights from data. Students will be asked to apply the course content to real-life scenarios and think creatively as well as critically through issues. Because of this, the class will also focus on developing more advanced programming skills as well. While this class mostly focuses on statistics skills, there are major parts of the class that focus on the programming and implementation of algorithms.

Course Learning Objectives

By the end of the class, students will be able to apply these methods to data and interpret and communicate their results. Topics will include:

  • Understand and implement various statistical procedures in R.
  • Describe and interpret the results of such procedures and algorithms.
  • Expand R programming skills to be able to write/test/log code from scratch.

Course Format

Each course session will be a mixture of lecture and in-class exercises. Typically the materials for each evening include presentation slides, one or more data sets, and R scripts with illustrations and exercises related to the material. There will also be 8 weekly homework assignments which will include a combination of programming and reading. There will also be a final individual project that students will work on over the course of the class.

Course Materials

No required textbooks. All required reading will be available online as articles or free pdf's. There will also be additional optional reading if students wish to read more on a subject, which may invlude books or textbooks.

Technical Requirements

Students are expected to use personal machines in class that are able to:

Course Topics and Assignments by Date:

(Topics and Dates are tentative and subject to change)

Lecture Topic Reading
Week 1 Introduction; Data Exploration; R overview -Intro DS Ch 3,9; -StatThink Ch 2.
Week 2 Probability Distributions; Conditional Prob; Missing Data; Getting/Storing Data -Intro DS Ch 7,10; -StatThink Ch 4.
Week 3 Outliers and Missing Data; Intro to Hypothesis Testing -Intro DS Ch 6
Week 4 Hypothesis Testing; The Central Limit Theorem; Intro to Regression -StatThink Ch 6, 7
Week 5 More on Regression; Extra Topic #1 -StatThink Pg 93-97
Week 6 Regression and Feature Selection -Intro DS Ch 16
Week 7 Time Series; Spatial Statistics -None
Week 8 EBayesian and Computational Statistics -StatThink Pg 97-101
Week 9 Guest Lecture; Extra Topic None
Week 10 Review; Possible Extra Topics None

Student Assessment

Students MUST attend at least 6 of 10 classes. Your grade will be based on eight homework assignments and one individual project. Details on these will be handed out/distributed on the first day. For complete homework Rubric, please see the class syllabus on the Canvas page.

Extra Articles to read about working in the analytics/data science field:

Policies and Values:

Your gain from this course is highly dependent on your attendance and completion of the exercises. I fully expect students to actively participate (asking questions, doing the homework, helping others).
Students are expected to behave professionally and abide by all student policies outlined by The University of Washington Student Conduct Code. [http://www.washington.edu/cssc/]