Julia, Python, and R: The Glue of Data Science
January 26, 2024
Some of the statements made in these slides are opinions.
Good question! But I wasn’t around.
They combine code and data!
This, plus some other quirks, can cause massive problems.
If you must:
The Jupyter Project is an organization that develops open-source software, much of which is targeted at data science. It spun-off from the popular IPython.
Jupyter has kernels for many languages, but the three that are most used, and make up its name are:
We will call these the Jupyter Languages.
Lots!
But generally they are:
The needs of a data science language are highly variable. Some common needs:
Other language are used, but can they meet these needs?
Do you know SQL?
If not, learn it this weekend.
Yes, I’m serious.
The R Project for Statistical Computing
To understand computations in R, two slogans are helpful:
- Everything that exists is an object.
- Everything that happens is a function call.
<-
for assignment.
Much has been said about the decision between R versus Python. Much of it is wrong, or simply unimportant.
Some legitimate comparisons:
The Julia Programming Language
While array programming is important and available in each language (via NumPy for Python), the data frame structure, which originated in R, truly allows for getting a lot of data science done efficiently.
data.frame
tibble
/ dplyr
data.table