Array Programming

Arrays and Vectorization for Scientific Computing

David Dalpiaz

January 31, 2024

Array Programming in Python

NumPy Resources

From Python to NumPy

Other Languages

  • MATLAB
  • R
  • Julia

These languages have support for arrays by default.

In R, vectorization is a core feature of the language, as nearly all objects are vectors.

Motivation

Vectorization

NumPy (Numerical Python) is all about vectorization.

  • vectors
  • arrays
  • views
  • ufuncs

Why? Gotta go fast!

  • Also: Smaller memory footprint!

Example: A Simple Random Walk

Object Oriented Approach

import random

class RandomWalker:
    def __init__(self):
        self.position = 0

    def walk(self, n):
        self.position = 0
        for i in range(n):
            yield self.position
            self.position += 2*random.randint(0, 1) - 1

walker = RandomWalker()

%timeit -n 1000 -r 10 [position for position in walker.walk(1000)]
462 µs ± 13.8 µs per loop (mean ± std. dev. of 10 runs, 1,000 loops each)

Procedural Approach

import random

def random_walk(n):
    position = 0
    walk = [position]
    for i in range(n):
        position += 2*random.randint(0, 1)-1
        walk.append(position)
    return walk

%timeit -n 1000 -r 10 random_walk(1000)
409 µs ± 3.37 µs per loop (mean ± std. dev. of 10 runs, 1,000 loops each)

Vectorized NumPy Approach

import numpy as np

def random_walk_numpy(n):
    steps = np.random.choice([-1, +1], n)
    return np.cumsum(steps)

%timeit -n 1000 -r 10 random_walk_numpy(1000)
20.7 µs ± 1.1 µs per loop (mean ± std. dev. of 10 runs, 1,000 loops each)

A Tradeoff: Speed for Readability

# readable, kinda
def function_python(seq, sub):
    return [i for i in range(len(seq) - len(sub)) if seq[i:i+len(sub)] == sub]

# fast, unreadable
def function_numpy(seq, sub):
    target = np.dot(sub, sub)
    candidates = np.where(np.correlate(seq, sub, mode='valid') == target)[0]
    check = candidates[:, np.newaxis] + np.arange(len(sub))
    mask = np.all((np.take(seq, check) == sub), axis=-1)
    return candidates[mask]

Basics

ndarray: The Array Class

An array is the central data structure of the NumPy library.

An array is a grid of values and it contains information about the raw data, how to locate an element, and how to interpret an element.

import numpy as np

x = np.array([1, 2, 42, 42, 42])

print(x)
print(type(x))
[ 1  2 42 42 42]
<class 'numpy.ndarray'>

ndarray Example

import numpy as np

a = np.arange(15).reshape(3, 5)
a
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
a.size
15
a.ndim
2
a.shape
(3, 5)
a.dtype.name
'int64'
a.itemsize
8

Live Coding

To Python!

That’s All Folks!