Arrays and Vectorization for Scientific Computing
January 31, 2024
These languages have support for arrays by default.
In R, vectorization is a core feature of the language, as nearly all objects are vectors.
NumPy (Numerical Python) is all about vectorization.
Why? Gotta go fast!
import random
class RandomWalker:
def __init__(self):
self.position = 0
def walk(self, n):
self.position = 0
for i in range(n):
yield self.position
self.position += 2*random.randint(0, 1) - 1
walker = RandomWalker()
%timeit -n 1000 -r 10 [position for position in walker.walk(1000)]
462 µs ± 13.8 µs per loop (mean ± std. dev. of 10 runs, 1,000 loops each)
# readable, kinda
def function_python(seq, sub):
return [i for i in range(len(seq) - len(sub)) if seq[i:i+len(sub)] == sub]
# fast, unreadable
def function_numpy(seq, sub):
target = np.dot(sub, sub)
candidates = np.where(np.correlate(seq, sub, mode='valid') == target)[0]
check = candidates[:, np.newaxis] + np.arange(len(sub))
mask = np.all((np.take(seq, check) == sub), axis=-1)
return candidates[mask]
ndarray
: The Array ClassAn array is the central data structure of the NumPy library.
An array is a grid of values and it contains information about the raw data, how to locate an element, and how to interpret an element.
ndarray
Examplearray([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
To Python!