In many scientific applications, arrays come up that are mostly empty or filled with zeros. These arrays are aptly named sparse arrays. However, it is a matter of choice as to how these are stored. One may store the full array, i.e., with all the zeros included. This incurs a significant cost in terms of memory and performance when working with these arrays.

An alternative way is to store them in a standalone data structure that keeps track of only the nonzero entries. Often, this improves performance and memory consumption but most operations on sparse arrays have to be re-written. sparse tries to provide one such data structure. It isn’t the only library that does this. Notably, scipy.sparse achieves this, along with Pysparse.


So why use sparse? Well, the other libraries mentioned are mostly limited to two-dimensional arrays. In addition, inter-compatibility with numpy is hit-or-miss. sparse strives to achieve inter-compatibility with numpy.ndarray, and provide mostly the same API. It defers to scipy.sparse when it is convenient to do so, and writes custom implementations of operations where this isn’t possible. It also supports general N-dimensional arrays.

Where to from here?

If you’re new to this library, you can visit the user manual page. If you’re already familiar with this library, or you want to dive straight in, you can jump to the API reference. You can also see the contents in the sidebar.