Welcome!

CS 498: End-to-End Data Science

David Dalpiaz

January 17, 2024

CS 498: End-to-End Data Science

Course Website

Plan for First Two Weeks

Week 01

  • Wednesday [Lecture]: Welcome!
  • Friday [Discussion]: Course Goals and Looking Ahead

Week 02

  • Monday [Lecture]: What is Data Science?
  • Wednesday [Lecture]: History of Data Science
  • Friday [Tutorial]: Computing Setup

General Course Tempo

  • Monday: Lecture
  • Wednesday: Lecture
  • Friday: Discussion / Tutorial / Lab

Breaks / “work time” during weeks when MPs are due.

What is This Course?

Course Explorer Description

Broad coverage of the principles, tools, and products of data science. Throughout the course, students will build data products such as models, packages, dashboards, and APIs using real-world data for real-world applications. Emphasis will be given to open-source tools that exist in or connect with the Jupyter languages (Python, R, and Julia). Their applications, interactions, and tradeoffs will be discussed.

Figure 1: Gromit, an animated dog, laying train track in front of himself as he rides a train down the track he is creating, from Wallace and Gromit.

Figure 2: A humourous tweet from Josh Wills that “defines” a data scientist in a somewhat meaningless way.

Broad Course Goals

  • Students will have a better understanding of the broad data science landscape
  • Students will develop their data science “stack”
    • Think tools you can list on your resume
  • Students will complete a project that will demonstrate their data science skills
    • Think something that could be part of a “portfolio”

Breadth-First Course

What This Course Is Not

This course is not…

  • a machine learning course.
  • an artificial intelligence course.
  • a statistics course.
  • a methods course.

Generally, we will do a “deep dive” into almost no topics!

Who Should Take This Course?

Anyone interested in doing data science, or working with data scientists!

The specific target audience is computer science students who want to work as data scientists.

The course could also be useful for those who want to work as…

  • machine learning engineers.
  • data engineers.
  • software engineers is the data space.
  • data analysts.

What Should You Know Before This Course?

Generally the more experience and interest in data science, the better. Specifically, you should…

  • have a working knowledge of Python.
  • have a probability and statistics background at the level of CS 361.

Machine learning knowledge at the level of CS 441 will be very helpful, but not necessarily required.

What Are You Going To Do?

During the course you will…

  • create a well documented and well tested Python package for data science.
  • create a machine learning model and “put it in production” for use in a larger system.
  • create an interactive data dashboard or web application using reactive programming.

As a final project, you will repeat one (or some combination) of these, with data of your choosing.

The Syllabus

Questions?

Who Am I?

David Dalpiaz

Figure 3: A name tag with the name Dave written on it, in all caps.

My Background

I’ve been an Illini for a while…

  • 2005 - 2009: BS in Mathematics from Illinois
    • Minors in Computer Science and Statistics
  • 2009 - 2014: PhD in Statistics from Illinois
  • 2014 - 2023: Taught for Statistics @ Illinois
    • 2018 - 2019: Taught for Statistics @ OSU
  • Now: Teaching for Computer Science @ Illinois

Current Work

Other and Previous Work

  • Bioinformatics (RNA-Seq Data Analysis) during PhD
  • Developed and launched STAT 420 for the MCS-DS @ Illinois
  • CS @ Illinois Artificial Intelligence Research Area Member
  • Illinois and Chicago Cubs Research Program
  • Taught all across the Statistics curriculum at Illinois:
    • Intro: STAT 100, STAT 200, STAT 212
    • Mathematical Statistics: STAT 400, STAT 410
    • Modeling and Machine Learning: STAT 420, STAT 432
    • Programming: STAT 385

Who Are You, The Students?

Survey

Check your email later today! We’ll be sending out a survey to better collect data to understand who you are.

Introduce Yourself

Also, don’t forget to respond to the “Introduce Yourself” thread on Ed.

Discussion Questions

  • What is…
    • Data? Data Science? Data Analysis?
    • Computing and Computer Science?
    • Probability? Statistics (thing) and Statistics (field)?
    • A (Statistical) Model?
    • Machine Learning? Deep Learning? Artificial Intelligence?

That’s All Folks!