AGILE DATA SCIENCE Syllabus

Fall 2016

Professor: Dr. Eloi Puertas

Dates: Monday : 17h - 19h
Friday: 15h - 17h

Location: B1 classroom.

Email: epuertas@ub.edu

Tutorial times: On demand.

Lab Instructor: TBA

Teaching Assistant: TBA

Prerequisites:

Some programming experience in Python

Goals of Course:

1) Learn about agile methodologies for software developing.

2) Learn about what it’s like to be a data scientist.

3) Learn about infrastructures for data science.

4) Learn about ethics and data privacy.

Course Topics:

I. Software Engineering for Data Scientists

Software Development Methodologies

II. Data Science Workflow

III. Infrastructure for Data Science

(Brief intro to Data Engineering)

IV. Ethics and Data Privacy

Course Structure:

The first 4 weeks I will teach the basic concepts about data science and agile methodologies to build up sufficient background and foundation.

After that, we will go through a Data Science project, following each step of the data science workflow. The project will be the same for all the class. The classes will be divided into two parts: (1) Review of previous material and introduction of any new material necessary to solve the current step in the project. (2) Work in team for solving and implement a solution for the current step in the project.

Finally, the last 4 weeks we will see fundamental aspects from Ethics and Data Privacy in the data sicence context.

Course Schedule

Weeks Topic
1 Introduction to Data Science
2 - 4 Software Development Methodologies
5 Data Science Workflow: Breaking down a Data Science Problem
6 Data Science Workflow: Setting up Data Science Infrastructure
7-8 Data Science Workflow: Data Preparation
9 Data Science Workflow: Analysis
10 Data Science Workflow: Dissemination and Deployment
11-14 Ethics and data privacity

Course requirements and Grading

50 % In Class Project
40 % Final In Class Exam
10 % Attendance / Participation

In Class Project

The in class project will be a typical datascience problem with real data. You will form teams and work together. The project starts on the end of October and the deadline will be in December. More details to come in October.

A Note on Programming Languages

Most of my instruction will involve Python and Python Bindings.

Recommended Texts and Readings

  1. Clean Code: A Handbook of Agile Software Craftsmanship. (Prentice Hall) Robert C. Martin
  2. Agile Data Science: Building Data Analytics Applications with Hadoop. (O'Reilly) Russell Jurney
  3. Data Science from Scratch: First principles with Python. (O'Reilly) Joel Grus
  4. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. (O'Reilly) Wes McKinney
  5. Doing Data Science: Straight Talk from the Frontline. (O'Reilly) Cathy O'Neil, Rachel Schutt
  6. Python Data Science Handbook: Tools and Techniques for Developers. (O'Reilly) Jake VanderPlas