In Part 1 I showed that a lot of time is spent in the gol_defs updateState function indexing the neighbors to see who’s alive and who’s dead. For a 500 cell x 500 cell grid run for 50 cycles this accounts for 35% of the execution time. So much time is spent in this one location because the code is iterating over each cell in the grid, looking at the state of each neighbor and adding it to the count if it’s alive.

for neighbor in neighbors: # % does modulo indexing to create toroidal universe neighborR = (Rindex + neighbor[1]) % len(currentState) neighborC = (Cindex + neighbor[0]) % len(currentState[0]) if currentState[neighborR][neighborC] == 1: count += 1

Since Python is an interpreted language this is a slow process but fortunately there are much faster ways to do the same thing. One way to speed this process up is to use an idea from image processing – convolution. This works by taking a small matrix, the kernel, and repeatedly applying it to each cell of a larger matrix in the following way. If the kernel is represented as:

a b c d e f g h i

and the larger grid is

A B C D E F G H I J K L M N O P Q R S T U V W X

Then applying the kernel to cell H gives:

Result = a*A + b*B + c*C + d*G + e*H + f*I + g*M + h*N + i*O

If the kernel is defined as

1 1 1 1 0 1 1 1 1

then for each cell we apply it to the result is the number of neighbors that are alive around the cell. The 0 in the middle keeps the cell itself from being counted. This is exactly what’s needed to calculate the number of neighbors so we can apply the game of life rules.

For instance if the grid contains

1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0

and we apply the kernel to the second cell in the second row from the top we get

Result = 1*1 + 0*1 + 1*1 + 1*1 + 0*0 + 0*1 + 0*1 + 0*1 + 0*1 = 3

which is the count of the number of alive neighbors. Since the count = 3 according to the game of life rules the cell will change from dead to alive.

The kernel is held in the neighbor variable in the main gol.py module. Now, instead of relative values used to calculate a neighbor index the variable will hold a 1 in each location that we want to consider as a neighbor. This kernel is then applied to each cell in the grid and used to calculate the number of alive neighbors.

So how is this faster? All I’ve done is replace a huge number of iterations with a huge number of iterations.

The difference is that I can now use SciPy which is a math, science and engineering library for Python that can speed up matrix operations tremendously. Replacing the existing code with a call to the Scipy convolve2d function does the convolution in a fraction of the time.

#Module import import scipy.signal # Function Definitions def updateState(currentState, neighbors): nextState = copy.deepcopy(currentState) neighborCount = scipy.signal.convolve2d(currentState, neighbors, mode="same",boundary="wrap") Rindex = 0 for row in neighborCount: Cindex = 0 for cell in row: count = neighborCount[Rindex,Cindex] if count == 3: nextState[Rindex][Cindex] = 1 elif count != (2 or 3): nextState[Rindex][Cindex] = 0 Cindex += 1 Rindex += 1 return nextState

This takes the currentState grid, the neighbors kernel and convolves them to create a new matrix called neighborCount. Each cell in neighborCount contains the number of neighbors that are alive for that cell. Then all I have to do is iterate through neighborCount, read it and apply the game of life rules to find the next state of the cells in the grid. This happens much faster than the code in Hurry Up part 1. Now if I run the profiler I get the following (click to enlarge):

New Code:

Old Code:

The new code only takes a little over 3% of the run time versus the 35% the old code took. So what does this mean for the overall run time? For 500 iterations I get:

New Old Improvement 100 cells x 100 cells = 10 seconds 15 seconds 33% 250 cells x 250 cells = 63 seconds 92 seconds 32% 300 cells x 300 cells = 89 seconds 151 seconds 41% 500 cells x 500 cells = 243 seconds 393 seconds 38%

The improvement seen by the profiler shows up as a real improvement in the runtime. Notice that the improvement actually increases for larger grid sizes since the relative amount of time spent iterating over the grid is larger.

Getting better.

The next place to work on is in the update to the nextState array which again requires iterating through the whole array.

Rindex = 0 for row in neighborCount: Cindex = 0 for cell in row: count = neighborCount[Rindex,Cindex] if count == 3: nextState[Rindex][Cindex] = 1 elif count != (2 or 3): nextState[Rindex][Cindex] = 0 Cindex += 1 Rindex += 1

Using NumPy, which is part of SciPy, I can replace this with the NumPy where function.

nextState[np.where(neighborCount == 3)] = 1 nextState[np.where((neighborCount != 2) & (neighborCount != 3))] = 0 return nextState

The where function returns an array of indices of all cells that match the condition. These indices are then used to set the values of the cells in nextState. Of course to do this I had to go through and change all of the standard 2D lists I’ve been using into NumPy arrays but it was worth it. This and a few other small improvements result in (500 cycles):

New Old Improvement 100 cells x 100 cells = 1.2 seconds 15 seconds 92% 250 cells x 250 cells = 5.8 seconds 92 seconds 94% 300 cells x 300 cells = 8.3 seconds 151 seconds 95% 500 cells x 500 cells = 22.4 seconds 393 seconds 94%