Skip to content

Python

Awesome

AST

  • Parsimonious
  • Pyparsing

AsyncIO

  • curio — The coroutine concurrency library. BSD
  • tokio — Asyncio event loop written in Rust language. Apache
  • trio — A friendly Python library for async concurrency and I/O.
  • unsync — Unsynchronize asyncio by using an ambient event loop, or executing in separate threads or processes.

Bioinformatics

  • biotite — Generate a variety of bioinformatics figures, including protein secondary structure ribbon diagrams. [Docs, Paper]
  • logomaker — Generate DNA and protein LOGOs using pandas and matplotlib! [Docs, Paper]

CLI

  • click
  • docopt
  • python-fire
  • colorama
  • coloredlogs ★ — Logs are not coloured when redirected to files. Good command-line interface. Can output to HTML. MIT
  • colorlog -
  • ptpython — A better Python REPL.

C / C++ integraction

  • cffi — C Foreign Function Interface for Python. Interact with almost any C code from Python, based on C-like declarations that you can often copy-paste from header files or documentation.
  • cppimport
  • ctypes — Call a dynamic library defined by a header that you provide. Does not work on PyPy. For all intents and purposes, deprecated in favour of cffi.
  • cython — Use a Pythonesque language to write and interface with C / C++ code.
  • numba — Compiles Python code to Cling IR on the fly and uses the generated code for faster execution.
  • pybind11 — Evolution of Boost Python for writing Python bindings to C++ code.

See also:

Other language bindings:

Dashboards

  • anvil → Compiles Python to JS, allowing for full-stack development in Python. Includes a WYSIWYG editor. [Website]
  • dash — Interactive, Reactive Web Apps for Python.
  • lux — Visualize pandas dataframes in Jupyter notebooks.
  • pyxley
  • redash
  • streamlit — Python-based dashboards using its own widget style.
  • voila — Jupyter-based dashboards leveraging IPywidgets.

Data types

  • attrs — "Python classes without boilerplate".
  • dataclasses — New in Python 3.7. Nice way do define type-annotated data classes. See PEP 557.
  • pydantic — Python dataclasses with native support for serialization / deserialization. Also supports reading configs from .env files.

Data engineering

  • dagster — A data orchestrator for machine learning, analytics, and ETL.
  • dataset
  • dbt — Write data transformations using SQL.
  • dvc — Seems like a neat way to version data input and generated by a machine learning project.
  • faust — Python Stream Processing. [BSD]
  • great expectation — Always know what to expect from your data.
  • intake — Extends conda to manage data packages. [BSD]
  • mimesis — Mimesis is a fast and easy to use library for Python, which helps generate synthetic data for a variety of purposes in a variety of languages.
  • pickle5 — Backport of Pickle's protocol v5, which supports for efficient serialization of out-of-band data buffers (see PEP574). [BSD]
  • retriever — Data retriever.

Data science

  • dask — Distributed data processing engine.
  • modin — Parallel pandas. Can use either dask or ray as the data processing engine.
  • pandas — Needs no explanation.
  • pyspark
  • ray —Distributed data processing engine and library. As of yet, no conda support.
  • vaex — Lazy pandas.

Debugging

  • ipdb — Exports functions to access the IPython debugger, which features tab completion, syntax highlighting, better tracebacks, better introspection with the same interface as the pdb module.

Deployment

  • chalice — Python Serverless Microframework for AWS.
  • zappa — Deploy WSGI apps to AWS Lambda.

Documentation

File system

  • atomicwrites — Powerful Python library for atomic file writes.
  • cloudpickle — Serialize things that can't be serialized by pickle. Seems to be better than dill.
  • fasteners — A python package that provides useful locks.
  • globglob.glob has a recursive= parameter which allows the expansion of the ** to zero or more directories.
  • pyfilesystem2 — Python's filesystem abstraction layer. Work with files on the filesystem, in memory, in tar and zip archives, and on S3 using the same API.
  • smart_open — Utils for streaming large files (S3, HDFS, gzip, bz2...).
  • urllib.parse — Tools for parsing and constructing URLs and URIs (part of the standard library).
  • xopen — Open any compressed file in Python.

Functional

Graph analysis

  • graph-tool — Efficient Python module for manipulation and statistical analysis of graphs (a.k.a. networks). [📙]
  • networkit — Sort of like graph-tool, agian. [📙]
  • networkx — Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

Java integration

  • [GraalVM]
  • voc — A transpiler that converts Python code into Java bytecode. The beeware team now recommends including a Python runtime in Android projects instead.
  • py4j — Py4J enables Python programs running in a Python interpreter to dynamically access Java objects in a Java Virtual Machine. Methods are called as if the Java objects resided in the Python interpreter and Java collections can be accessed through standard Python collection methods. Py4J also enables Java programs to call back Python objects. "Py4J uses a remote tunnel to operate the JVM. This has the advantage that the remote JVM does not share the same memory space and multiple JVMs can be controlled. It provides a fairly general API, but the overall integration to Python is as one would expect when operating a remote channel operating more like an RPC front-end. It seems well documented and capable. Although I haven’t done benchmarking, a remote access JVM will have a transfer penalty when moving data." Source: https://jpype.readthedocs.io/en/latest/userguide.html.
  • rubicon-java
  • jpype — JPype is a Python module to provide full access to Java from within Python. Start the JVM at Python startup and then access the runtime through the JPype API. py4j is better if the goal is to start a JVM ad hoc. Garbage collection can become an issue, especially with circular dependencies in and out of the JVM.

Jupyter

See Jupyter.

Linters

  • jedi — Auto-completion and static analysis.
  • pre-commit — A framework for managing and maintaining multi-language pre-commit hooks.
  • black

Machine learning frameworks

  • chainer — A Powerful, Flexible, and Intuitive Framework for Neural Networks. (Website)
  • CuPy — A NumPy-compatible matrix library accelerated by CUDA. (Website)
  • deepdive - Transform "dark data" into SQL tables.
  • edward — A library for probabilistic modelling, inference, and criticism. Deep generative models, variational inference. Runs on TensorFlow.
  • faiss — A library for efficient similarity search and clustering of dense vectors.
  • LightGBM
  • pomegranate — Package which implements fast, efficient, and extremely flexible probabilistic models ranging from probability distributions to Bayesian networks to mixtures of hidden Markov models. The most basic level of probabilistic modelling is the a simple probability distribution.
  • prophet — Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
  • pymc3 — Probabilistic Programming in Python. A new version is being developed on top of JAX.
  • SerpentAI — Game Agent Framework. Helping you create AIs / Bots to play any game you own!
  • snorkel - Write functions that go over your data to generate features (one feature per function). Then combine these features in a smart way in order to build a predictor which performs well.
  • statsmodels — Statistical modelling and econometrics in Python.
  • surprise — A Python scikit for building and analyzing recommender systems.
  • tpol - Automates model selection and hyperparameter optimization.
  • venture — Probabilistic programming in python
  • [xgboost]

PyTorch ecosystem

Machine learning lifecycle

  • mlflow — Open source platform for the machine learning lifecycle.
  • torch-elastic — TorchElastic enables distributed PyTorch training jobs to be executed in a fault tolerant and elastic manner.
  • torch-serve — TorchServe is a flexible and easy to use tool for serving PyTorch models.

Operating system

  • monotonic — An implementation of time.monotonic() for Python 2 & Python 3.

ORMs

  • sqlfluff — SQL autoformatter.
  • peewee — Too limited in my experience.
  • PonyORM — Very Pythonic, but probably slow and limited?
  • sqlalchemy — De facto standard.

Profiling

Rust integration

  • milksnake — A setuptools/wheel/cffi extension to embed a binary data in wheels.
  • PyO3 — Rust bindings for the Python interpreter.

Standard library

  • __future__
    annotations — Allows using classes as types before those classes are defined.
  • trace — Code coverage and tracing.
  • sched — Schedule the execution of things.
  • timetime.perf_counter (useful for timing the execution of things).
  • urllib.parse — Parse URI strings (including database connection strings) into objects.
  • typing.NamedTuple — Nice way to define type-annotated namedtuples.
  • datetime — Use datetime.isoformat() and datetime.fromisoformat() to serialize datetime objects to and from strings. Can divide two timedelta objects to get time as a floating point value in a desired denomination. Additional ways of parsing datetime strings and iterating over datetime ranges is available in the dateutil package (using the parse and rrule modules, respectively).

Testing

Text processing

  • bounter — Efficient Counter that uses a limited (bounded) amount of memory regardless of data size.
  • fuzzywuzzy — Fuzzy String Matching in Python.
  • gensim — Topic Modelling for Humans.
  • pdftabextract — A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
  • spacy — Industrial-strength Natural Language Processing (NLP) with Python and Cython.

Visualization

Web async

  • aiohttp — Async HTTP client/server framework using asyncio. Did not use uvloop originally (not sure about now), so slower than alternatives.
  • apistar — Self-documenting REST API. Supports both WSGI and asyncio. Uses the OpenAPI standard to declare types. By the same author as django-rest-framework. Authors moved over to working on starlette.
  • fastapi 🌟 — Combines the speed of starlette with the user experience of APIStar. Uses starlette.
  • httptools — Python binding for the Node.js HTTP parser (the default Python parser is slow).
  • hug — Auto-documenting APIs.
  • sanic — Async web framework that looks like Flask.
  • starlette — Lightweight ASGI framework/toolkit, which is ideal for building high performance asyncio services. Faster than sanic. Uses uvicorn.
  • uvicorn — Fast ASGI server. Uses uvloop and httptools.
  • uvloop — Fast asyncio event loop, implemented using Cython and libuv (used by Node.js).

See also:

Web sync

  • django — The Web framework for perfectionists with deadlines.
  • falcon — Very fast, particularly on PyPy2.
  • flask — A microframework based on Werkzeug, Jinja2 and good intentions.

Web crawlers

  • gain — Web crawling framework based on asyncio.

Resources

Interface with C / C++

matplotlib

Add figure letters:

plt.text(0, 1.10, string.ascii_uppercase[i], transform=ax.transAxes, size=20, weight='bold')

Control colorbar label size:

ax = sns.heatmap(...)
cbar_ax = ax.figure.axes[-1]
plt.setp(cbar_ax.get_yticklabels(), fontsize=12)

graph-tool

Create graph from sparse matrix:

g = Graph()
g.add_edge_list(np.array(a.nonzero()).T)

Thinks to keep in mind

Unneccessary pass

It is not neccessary to use pass when there is already a docstring.

class Error(Exception):
    """My custom error"""
    # pass ← Not needed here

Method chaining in pandas

def show_df(df):
    display(df)
    return df

ans = (
    df
    .loc[lambda x: x['name'] != 'Bob']
    .query("city == 'London'")
    .pipe(show_df)
    .loc[lambda x: x['age'].between(10,50)]
    .eval("age_above_20 = age > 20")
    .assign(dummy=lambda :)
)