Make your Python code 10,000 times faster with parallel numpy!

Python is widely used in scientific research for tasks such as data processing, analysis and visualisation. Although many HPC applications have high-level Python interfaces, Python itself it is not yet widely used for implementing large-scale modelling and simulation programs due to issues with performance: Python is primarily designed for ease of use and flexibility, not for speed. For example, a C program naively translated in to Python can often run over 50 times slower.

However, there are techniques that can be used to dramatically increase the speed of Python programs such as fast array processing using numpy, parallelisation using MPI message-passing and running on HPC systems.

In this short webinar we will illustrate these techniques in practice by applying them to a toy application which simulates traffic flow using a simple cellular automaton model. Having tested performance on a latop we will then move to the UK National Supercomputer ARCHER2 which has in excess of 750,000 CPU-cores. With a combination of numpy and mpi4py we will aim for a performance increase of more than a factor of 10,000 compared to the original program.