The Jupyter Languages

Julia, Python, and R: The Glue of Data Science

David Dalpiaz

January 26, 2024

Disclaimer

Some of the statements made in these slides are opinions.

  • Some of the opinions are held by Dave.
  • Some of the opinions are held by others.

Before Jupyter

How Did Data Science Get Done Before Jupyter?

Good question! But I wasn’t around.

  • I think the answer was partially spreadsheets?
    • Maybe Excel as early as the 1980s.
  • Maybe MATLAB?
    • Or whatever else folks were using for scientific computing.
  • S was around a Bell Labs in the 1970s.

You Suck at Excel

YouTube: You Suck at Excel with Joel Spolsky

Why Not Spreadsheets for Data Science?

They combine code and data!

This, plus some other quirks, can cause massive problems.

If you must:

The Jupyter Languages

What is Jupyter?

The Jupyter Project is an organization that develops open-source software, much of which is targeted at data science. It spun-off from the popular IPython.

  • Jupyter Notebook
  • Jupyter Lab
  • Jupyter Hub

What are the Jupyter Languages?

Jupyter has kernels for many languages, but the three that are most used, and make up its name are:

  • Julia
  • Python
  • R

We will call these the Jupyter Languages.

What do the Jupyter Languages have in common?

Lots!

But generally they are:

  • High level
  • Somewhat general purpose
  • Scripting languages
  • That can serve as the glue for data science

Why Use a Scripting Language for DS?

The needs of a data science language are highly variable. Some common needs:

  • MATLAB style array structures and computations
  • Data frames: a tabular data structure
  • Graphical capabilities
  • “Fast” computation
  • Automating system tasks

Other Languages

Other language are used, but can they meet these needs?

  • Bash
    • Sed
    • Awk
    • Grep
  • SQL
  • Javascript
  • Fortran
  • C / C++ / Rust
  • MATLAB
  • Mathematica

A Note About SQL

Do you know SQL?

If not, learn it this weekend.

Yes, I’m serious.

Comparing and Contrasting the Jupyter Languages

What is R?

The R Project for Statistical Computing

R Quirks and Features

To understand computations in R, two slogans are helpful:

  • Everything that exists is an object.
  • Everything that happens is a function call.

John Chambers

More R Quirks and Features

  • Interpreted
  • Multi-paradigm: procedural, object-oriented, functional, imperative
    • Usage feels functional
  • One-based indexing
  • Multiple OOP systems
  • Dynamic typing
  • Can use <- for assignment.
    • Dave’s note: Please don’t!
  • When things go wrong? [Blame S!]
    • R maintains backwards compatibility.
  • Comprehensive R Archive Network (CRAN)

R Resources

What is Python?

Welcome to Python.org

Python Quirks and Features

  • Interpreted
  • Monty Python’s Flying Circus
  • Multi-paradigm: procedural, object-oriented, functional, imperative
    • Usage feels object-oriented
  • Zero-based indexing
  • Dynamic typing (allows annotations)
  • Whitespace indentation
    • Dave’s note: Don’t mix tabs and spaces! Use spaces!
  • Python Package Index (PyPI)
    • The Cheese Shop!

Python Resources

R versus Python

Much has been said about the decision between R versus Python. Much of it is wrong, or simply unimportant.

  • “R is slower than Python.”
    • No it isn’t. They are both slow interpreted languages.

Some legitimate comparisons:

  • Python’s community is much larger, but R’s is more focused.
  • Python feels object-oriented but R feel functional, but they are both multi-paradigm.
  • Python is more popular for machine learning. R has more software for specific statistical methods.

Norm Matloff: R versus Python for Data Science

What is Julia?

The Julia Programming Language

Julia Quirks and Features

  • JIT compiled
    • Still has REPL
  • Multi-paradigm: multiple dispatch, procedural, object-oriented, functional, imperative
  • One-based indexing
  • Dynamic typing
  • Solves the so-called “two-language problem”
    • Performance approach statically-typed languages like C
  • Designed with parallel computation in mind
  • Suffers from a slow “time to first plot”
  • Built-in package manager
    • Packages often distributed via GitHub

Julia Resources

Data Frames

While array programming is important and available in each language (via NumPy for Python), the data frame structure, which originated in R, truly allows for getting a lot of data science done efficiently.

Jupyter Notebooks

I Don’t Like Notebooks

YouTube: Joel Grus - I Don’t Like Notebooks

The Trouble with Notebooks

  • Hard to version control
  • Make state confusing
  • Poor editing tools
  • Encourages bad coding practices

I Like Notebooks

YouTube: Jeremy Howard - I Like Notebooks

Alternatives

That’s All Folks