A Very Biased and Incomplete History of Data Science

Standing on the Shoulders of Giants

David Dalpiaz

January 24, 2024

History of Data Science

Data Scientist: The Sexiest Job of the 21st Century

Let’s start in 2012! Thomas H. Davenport and DJ Patil, Harvard Business Review: Data Scientist: The Sexiest Job of the 21st Century

Modern Usage

The 2012 article Data Scientist: The Sexiest Job of the 21st Century claims that the term “Data Scientist” was coined in 2008.

DJ Patil

Jeff Hammerbacher

Gotta Get Back in Time

Data Science did not begin in 2012 nor did it start in 2008.

  • A lot of data science has been done without being called data science.
  • The history of Data Science is dependent on several other fields and ideas with long histories.

50 Years of Data Science

Published in 2017, but first appeared as Version 1.00 in September 2015.

The Future of Data Analysis

For a long time I have thought I was a statistician, interested in inferences from the particular to the general. But as I have watched mathematical statistics evolve, I have had cause to wonder and to doubt. … All in all I have come to feel that my central interest is in data analysis, which I take to include, among other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.

FoDA: Data Analysis as a “New Science”

Four major influences act on data analysis today:

  1. The formal theories of statistics
  2. Accelerating developments in computers and display devices
  3. The challenge, in many fields, of more and ever larger bodies of data
  4. The emphasis on quantification in an ever wider variety of disciplines

FoDA: Data Analysis is Complex

… data analysis is a very difficult field. It must adapt itself to what people can and need to do with data. In the sense that biology is more complex than physics, and the behavioral sciences are more complex than either, it is likely that the general problems of data analysis are more complex than those of all three. It is too much to ask for close and effective guidance for data analysis from any highly formalized structure, either now or in the near future. Data analysis can gain much from formal statistics, but only if the connection is kept adequately loose.

Chambers, Wu, Cleveland, and Others: Moving Statistics Forward

Greater or Lesser Statistics: A Choice for Future Research

The statistics profession faces a choice in its future research between continuing concentration on traditional topics—based largely on data analysis supported by mathematical statistics—and a broader viewpoint—based on an inclusive concept of learning from data. The latter course presents severe challenges as well as exciting opportunities. The former risks seeing statistics become increasingly marginal …

Data Science: An Action Plan for Expanding the Technical Areas of the field of Statistics

  • Multidisciplinary Investigations (25%)
  • Models and Methods for Data (20%)
  • Computing with Data (15%)
  • Pedagogy (15%)
  • Tool Evaluation (5%)
  • Theory (20%)

Statistical Modeling: The Two Cultures

The Data Modeling Culture

  • Focus on inference

The Algorithmic Modeling Culture

  • Focus on prediction

Selected Pioneers and Milestones

Ada Lovelace

Wikipedia: Ava Lovelace

  • Worked on Charles Babbage’s Analytical Engine.
  • Thought to have written the first computer program!

John Snow

Wikipedia: John Snow

  • Considered one of the founders of modern epidemiology and early germ theory.
  • 1854 Broad Street Cholera Outbreak

Florence Nightingale

Wikipedia: Florence Nightingale

Ronald Fisher

Wikipedia: Ronald Fisher

  • “A genius who almost single-handedly created the foundations for modern statistical science.”
  • A reminder of the ugly history of the field of statistics.

Edgar F. Codd

Wikipedia: Edgar F. Codd

R

Wikipedia: R (programming language)

  • August 1993: Ihaka and Gentleman
  • Open source version of Chambers’ S.

Numpy

Wikipedia: NumPy

  • 2005: Travis Oliphant
  • Sparked the rise of data work in Python.

Netflix Prize

Wikipedia: Netflix Prize

  • 2006: $1,000,000 competition
  • Improve Netflix recommendations
  • Inspired modern Kaggle competitions

Fei-Fei Li

Nate Silver

Wikipedia: Nate Silver

  • Founder of FiveThirtyEight
  • Writer for Baseball Prospectus
    • Developed PECOTA
  • Known for Election Forecasting

Attention Is All You Need

Wikipedia: Attention Is All You Need

  • 2017 research paper by Google
  • Known for popularizing the transformer

Data Science at Illinois

The Future of Data Science

Is Data Scientist Still the Sexiest Job of the 21st Century?

Looking back ten years in 2022.

Thomas H. Davenport and DJ Patil, Harvard Business Review: Is Data Scientist Still the Sexiest Job of the 21st Century?

Additional References

That’s All Folks!