Array Programming

Arrays and Vectorization for Scientific Computing

David Dalpiaz

January 31, 2024

Array Programming in Python

NumPy Resources

From Python to NumPy

Nicolas P. Rougier: From Python to NumPy

Other Languages

MATLAB
R
Julia

These languages have support for arrays by default.

In R, vectorization is a core feature of the language, as nearly all objects are vectors.

Motivation

Vectorization

NumPy (Numerical Python) is all about vectorization.

vectors
arrays
views
ufuncs

Why? Gotta go fast!

Also: Smaller memory footprint!

Example: A Simple Random Walk

Object Oriented Approach

import random

class RandomWalker:
    def __init__(self):
        self.position = 0

    def walk(self, n):
        self.position = 0
        for i in range(n):
            yield self.position
            self.position += 2*random.randint(0, 1) - 1

walker = RandomWalker()

%timeit -n 1000 -r 10 [position for position in walker.walk(1000)]

462 µs ± 13.8 µs per loop (mean ± std. dev. of 10 runs, 1,000 loops each)

Procedural Approach

import random

def random_walk(n):
    position = 0
    walk = [position]
    for i in range(n):
        position += 2*random.randint(0, 1)-1
        walk.append(position)
    return walk

%timeit -n 1000 -r 10 random_walk(1000)

409 µs ± 3.37 µs per loop (mean ± std. dev. of 10 runs, 1,000 loops each)

Vectorized NumPy Approach

import numpy as np

def random_walk_numpy(n):
    steps = np.random.choice([-1, +1], n)
    return np.cumsum(steps)

%timeit -n 1000 -r 10 random_walk_numpy(1000)

20.7 µs ± 1.1 µs per loop (mean ± std. dev. of 10 runs, 1,000 loops each)

A Tradeoff: Speed for Readability

# readable, kinda
def function_python(seq, sub):
    return [i for i in range(len(seq) - len(sub)) if seq[i:i+len(sub)] == sub]

# fast, unreadable
def function_numpy(seq, sub):
    target = np.dot(sub, sub)
    candidates = np.where(np.correlate(seq, sub, mode='valid') == target)[0]
    check = candidates[:, np.newaxis] + np.arange(len(sub))
    mask = np.all((np.take(seq, check) == sub), axis=-1)
    return candidates[mask]

Basics

`ndarray`: The Array Class

An array is the central data structure of the NumPy library.

An array is a grid of values and it contains information about the raw data, how to locate an element, and how to interpret an element.

import numpy as np

x = np.array([1, 2, 42, 42, 42])

print(x)
print(type(x))

[ 1  2 42 42 42]
<class 'numpy.ndarray'>

`ndarray` Example

import numpy as np

a = np.arange(15).reshape(3, 5)
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

a.size

a.ndim

a.shape

(3, 5)

a.dtype.name

'int64'

a.itemsize

Live Coding

To Python!

Array Programming

Array Programming in Python

NumPy Resources

From Python to NumPy

Other Languages

Motivation

Vectorization

Example: A Simple Random Walk

Object Oriented Approach

Procedural Approach

Vectorized NumPy Approach

A Tradeoff: Speed for Readability

Basics

`ndarray`: The Array Class

`ndarray` Example

Live Coding

That’s All Folks!

TODO

Array Programming

Array Programming in Python

NumPy Resources

From Python to NumPy

Other Languages

Motivation

Vectorization

Example: A Simple Random Walk

Object Oriented Approach

Procedural Approach

Vectorized NumPy Approach

A Tradeoff: Speed for Readability

Basics

ndarray: The Array Class

ndarray Example

Live Coding

That’s All Folks!

TODO

`ndarray`: The Array Class

`ndarray` Example