Software
I am a core contributor and maintainer of several widely used tools for scientific computing in Python. See my GitHub profile for the full list.
This page covers the general-purpose libraries. Many of my research projects also have their own open source releases — including NeuralGCM, JAX-CFD, Dinosaur and others. See Research for the full picture.
Xarray
Xarray is a library for multi-dimensional labeled data in Python that has been widely adopted across weather and climate, geospatial analysis, plasma physics, genomics, and finance. The project is downloaded ~5m times per week, with 4,000+ stars and 500+ contributors on GitHub.
I created Xarray in 2014 while at The Climate Corporation and continue to lead the project, working with a team of 21 volunteer and grant-funded developers. Xarray has received over $500,000 in grant funding from the NSF, NASA, Nvidia and the Chan-Zuckerberg Initiative.
Xarray-Beam
I am a primary author of Xarray-Beam, which implements reliable large-scale data pipelines on Xarray data using the map-reduce paradigm (via Apache Beam). Six weather research teams at Google use it to process petabytes of data.
Coordax
With Dmitrii Kochkov, I built Coordax — labeled axes for JAX, in the spirit of Xarray but tailored to physics- and AI-based simulation codes like NeuralGCM. Coordax supports arbitrary JAX transformations, lossless conversion to and from Xarray, and easy wrapping of code not written for labeled arrays.
NumPy
I served on the NumPy steering council from 2015 to 2024 and authored six NumPy Enhancement Proposals, focused on interoperability between NumPy and other array libraries. Contributions include widely used functionality such as broadcast_to, stack, moveaxis and __array_function__.
JAX
I have contributed 100+ pull requests to JAX, focused on matrix-free linear algebra and scientific computing — including much of jax.scipy.
Other projects
- h5netcdf — an alternative implementation of the netCDF4 file format in Python.
- numbagg — fast N-dimensional aggregation functions built with Numba and NumPy ufuncs.
- cyordereddict — a Cython port of the Python standard library’s
OrderedDict, 2–6× faster.
In the past I was also a core developer for Dask and Pandas.