Useful Python packages for science

I’ve done 5 years of science research and development using mainly Python, and have used the major scientific computing libraries. While this topic may seem intimidating, the actual usage of each library is typically pretty brief (a few lines of code), so you don’t need to be an expert in each library, you just need to generally know what they do so you can use them when you need to. Here’s a brief overview with some of the most common code explained.

Installation

All of these package are heavily-used scientific computing Python packages, so they’re available through pip or conda (the two main Python package managers).

For arrays: Numpy

Numpy is the library you use to turn data into arrays in Python. It is open source on Github. You can use these arrays to perform linear algebra. Back before PyTorch/TensorFlow and tensors, Numpy was the only good way to create arrays in Python. Now, for AI, you’ll probably use Tensors more, but arrays are still great to know how to use.

General Usage

Creating Arrays

Indexing Arrays

For loading data: Pandas

Pandas is a great all-around data analysis toolkit for scientific computing. Free and open-source on Github. I pretty much use it to load external files. For AI, this is crucial, since you’re often loading external datasets. It also does a great job at a variety of tasks like handling missing data, merging datasets together, converting other Python/Numpy datasets into Pandas Dataframes, and more, but I mostly use the same three lines.

It’s pretty easy to install Pandas since it’s a heavily used Python package, it’s available through Pip and Conda.

General Usage

For scientific algorithms: Scipy

Scipy is a package built on Numpy that adds a lot of built-in functions to run integrations, optimization, interpolation, signal processing, linear algebra, Fourier transforms, eigenvalues, and more. Yes, as you can already see, there is some overlap between different packages. Free and open source on Github.

Scipy is not an everyday style of package, you’ll use it more for specific scenarios, so as such, there’s not really any code you should memorize. Moreso, just be aware of what kinds of things Scipy can do, and know that you don’t need to reinvent the whell every time you want to integrate or optimize an equation.

For making visualizations: Matplotlib

The go-to plotting tool for making data visualizations. Free and open source on Github. You can make the following types of plots:

General Usage

Editing the plot !matplotlib_commands.png

For more visualizations: Seaborn

Check out Seaborn if you need more advanced viz than matplotlib. I’ve never used it, but I’ve heard it’s cool.