I’ve done 5 years of science research and development using mainly Python, and have used the major scientific computing libraries. While this topic may seem intimidating, the actual usage of each library is typically pretty brief (a few lines of code), so you don’t need to be an expert in each library, you just need to generally know what they do so you can use them when you need to. Here’s a brief overview with some of the most common code explained.

Installation

All of these package are heavily-used scientific computing Python packages, so they’re available through pip or conda (the two main Python package managers).

  • Pip or Conda? I use pip as much as I can since it’s simple, and then I use Conda when I can’t find something on pip.
  • pip install numpy pandas matplotlib # Just type em out, no need for commas
  • conda install -c conda-forge numpy pandas matplotlib # Same, but specify conda-forge for the channel (-c) because cf is cool.

For arrays: Numpy

Numpy is the library you use to turn data into arrays in Python. It is open source on Github. You can use these arrays to perform linear algebra. Back before PyTorch/TensorFlow and tensors, Numpy was the only good way to create arrays in Python. Now, for AI, you’ll probably use Tensors more, but arrays are still great to know how to use.

General Usage

  • import numpy as np # Everyone imports numpy as ’np’ for efficiency lol

Creating Arrays

  • A Numpy array is called an ndarray
  • An ndarray is processed much faster than a normal list
  • An array holds one type of data, e.g. floats, so it can’t store stuff like words and numbers together. For that, use a Pandas dataframe.
  • Once created, the size of the array can’t change, but the items within the array are mutable i.e. you can change them
  • a1D = np.array([1, 2, 3, 4]) # Creates a 1D array… and so on
  • a2D = np.array(1, 2], [3, 4) # Create a 2D array
  • a3D = np.array([1, 2], [3, 4]], [[5, 6], [7, 8]) # Create a 3D array
  • You can create arrays using np.ones(), np.zeros(), np.arange(), np.linspace(), np.eye(), and more functions for specific behaviors.

Indexing Arrays

  • a1D[:3] returns array([1,2,3]), the 0th-2nd elements of the array
  • a2D[0,1] returns 2, the element in row 0, column 1
  • a2D.shape prints out (2, 2) the dimensions of the array

For loading data: Pandas

Pandas is a great all-around data analysis toolkit for scientific computing. Free and open-source on Github. I pretty much use it to load external files. For AI, this is crucial, since you’re often loading external datasets. It also does a great job at a variety of tasks like handling missing data, merging datasets together, converting other Python/Numpy datasets into Pandas Dataframes, and more, but I mostly use the same three lines.

It’s pretty easy to install Pandas since it’s a heavily used Python package, it’s available through Pip and Conda.

General Usage

  • import pandas as pd # Are you seeing a theme?
  • data = {'column1': [1, 2, 3], 'column2': [4, 5, 6]} # An example dict
  • df_dict = pd.DataFrame(data) # Create a dataframe (df) from the data dictionary declared above
  • df_csv = pd.read_csv('file.csv') # Create a df from a csv file
  • df_xlsx = pd.read_excel('file.xlsx') # Create a df from an Excel file
  • df['column1'] select the data in column1
  • df'column1','column2' # Select the data in columns 1 and 2
  • You can do a lot with selecting which columns and rows you want, ommitting data, replacing bad data with NaNs, handling missing data
  • df.isnull() # Check if there are any NULL values in the df
  • df.fillna(0, inplace=True) # Replace missing values with a 0
  • df.to_csv('filename.csv', index=False) # Save the data into a file of your choice

For scientific algorithms: Scipy

Scipy is a package built on Numpy that adds a lot of built-in functions to run integrations, optimization, interpolation, signal processing, linear algebra, Fourier transforms, eigenvalues, and more. Yes, as you can already see, there is some overlap between different packages. Free and open source on Github.

Scipy is not an everyday style of package, you’ll use it more for specific scenarios, so as such, there’s not really any code you should memorize. Moreso, just be aware of what kinds of things Scipy can do, and know that you don’t need to reinvent the whell every time you want to integrate or optimize an equation.

For making visualizations: Matplotlib

The go-to plotting tool for making data visualizations. Free and open source on Github. You can make the following types of plots:

  • Pairwise: plot(x,y), scatter(x,y), bar(x,height), stem(x,y), fill_between(x,y1,y2), stackplot(x,y), stairs(values)
  • Statistical: hist(x), boxplot(x), errorbar(x,y,yerr,xerr), violinplot(D), eventplot(D), hist2d(x,y), hexbin(x,y,C), pie(x), ecdf(x)
  • Gridded: imshow(Z), pcolormesh(X,Y,Z), contour(X,Y,Z), contourf(X,Y,Z), barbs(X,Y,U,V), quiver(X,Y,U,V), streamplot(X,Y,U,V)
  • Plus Irregular Gridded Data and 3D and Volumetric data

General Usage

  • import matplotlib.pyplot as plt # This one’s a mouthful, but you’ll typically see “plt” everywhere
  • fig, ax = plt.subplots() # Create the figure
  • ax.plot([1, 2, 3, 4], [1, 4, 2, 3]) # Plot some data on the Axes.
  • plt.show() # Show the figure.

Editing the plot !matplotlib_commands.png

For more visualizations: Seaborn

Check out Seaborn if you need more advanced viz than matplotlib. I’ve never used it, but I’ve heard it’s cool.