# Sparse¶

This implements sparse arrays of arbitrary dimension on top of `numpy`

and `scipy.sparse`

.
It generalizes the `scipy.sparse.coo_matrix`

and `scipy.sparse.dok_matrix`

layouts,
but extends beyond just rows and columns to an arbitrary number of dimensions.

Additionally, this project maintains compatibility with the `numpy.ndarray`

interface
rather than the `numpy.matrix`

interface used in `scipy.sparse`

These differences make this project useful in certain situations where scipy.sparse matrices are not well suited, but it should not be considered a full replacement. It lacks layouts that are not easily generalized like CSR/CSC and depends on scipy.sparse for some computations.

## Motivation¶

Sparse arrays, or arrays that are mostly empty or filled with zeros, are common in many scientific applications. To save space we often avoid storing these arrays in traditional dense formats, and instead choose different data structures. Our choice of data structure can significantly affect our storage and computational costs when working with these arrays.

## Design¶

The main data structure in this library follows the Coordinate List (COO) layout for sparse matrices, but extends it to multiple dimensions.

The COO layout, which stores the row index, column index, and value of every element:

row | col | data |
---|---|---|

0 | 0 | 10 |

0 | 2 | 13 |

1 | 3 | 9 |

3 | 8 | 21 |

It is straightforward to extend the COO layout to an arbitrary number of dimensions:

dim1 | dim2 | dim3 | … | data |
---|---|---|---|---|

0 | 0 | 0 | . | 10 |

0 | 0 | 3 | . | 13 |

0 | 2 | 2 | . | 9 |

3 | 1 | 4 | . | 21 |

This makes it easy to *store* a multidimensional sparse array, but we still
need to reimplement all of the array operations like transpose, reshape,
slicing, tensordot, reductions, etc., which can be challenging in general.

Fortunately in many cases we can leverage the existing `scipy.sparse`

algorithms if we can intelligently transpose and reshape our multi-dimensional
array into an appropriate 2-d sparse matrix, perform a modified sparse matrix
operation, and then reshape and transpose back. These reshape and transpose
operations can all be done at numpy speeds by modifying the arrays of
coordinates. After scipy.sparse runs its operations (often written in C) then
we can convert back to using the same path of reshapings and transpositions in
reverse.