curio— The coroutine concurrency library.
tokio— Asyncio event loop written in Rust language.
trio— A friendly Python library for async concurrency and I/O.
asyncioby using an ambient event loop, or executing in separate threads or processes.
biotite— Generate a variety of bioinformatics figures, including protein secondary structure ribbon diagrams. [Docs, Paper]
logomaker— Generate DNA and protein LOGOs using pandas and matplotlib! [Docs, Paper]
coloredlogs★ — Logs are not coloured when redirected to files. Good command-line interface. Can output to HTML.
ptpython— A better Python REPL.
C / C++ integraction
cffi— C Foreign Function Interface for Python. Interact with almost any C code from Python, based on C-like declarations that you can often copy-paste from header files or documentation.
ctypes— Call a dynamic library defined by a header that you provide. Does not work on PyPy. For all intents and purposes, deprecated in favour of
cython— Use a Pythonesque language to write and interface with C / C++ code.
numba— Compiles Python code to Cling IR on the fly and uses the generated code for faster execution.
pybind11— Evolution of Boost Python for writing Python bindings to C++ code.
- Speeding up Pandas file parsing with Cython
Other language bindings:
anvil→ Compiles Python to JS, allowing for full-stack development in Python. Includes a WYSIWYG editor. [
dash— Interactive, Reactive Web Apps for Python.
lux— Visualize pandas dataframes in Jupyter notebooks.
streamlit— Python-based dashboards using its own widget style.
voila— Jupyter-based dashboards leveraging IPywidgets.
attrs— "Python classes without boilerplate".
dataclasses— New in Python 3.7. Nice way do define type-annotated data classes. See PEP 557.
pydantic— Python dataclasses with native support for serialization / deserialization. Also supports reading configs from
dagster— A data orchestrator for machine learning, analytics, and ETL.
dbt— Write data transformations using SQL.
dvc— Seems like a neat way to version data input and generated by a machine learning project.
faust— Python Stream Processing. [
great expectation— Always know what to expect from your data.
intake— Extends conda to manage data packages. [
mimesis— Mimesis is a fast and easy to use library for Python, which helps generate synthetic data for a variety of purposes in a variety of languages.
pickle5— Backport of Pickle's protocol v5, which supports for efficient serialization of out-of-band data buffers (see PEP574). [
retriever— Data retriever.
dask— Distributed data processing engine.
pandas. Can use either
rayas the data processing engine.
pandas— Needs no explanation.
ray—Distributed data processing engine and library. As of yet, no conda support.
ipdb— Exports functions to access the IPython debugger, which features tab completion, syntax highlighting, better tracebacks, better introspection with the same interface as the pdb module.
pdoc— Simple documentation for Python code. Developed by
@burntsushiand maintained by
nbsphinx— Sphinx source parser for Jupyter notebooks.
sphinx_rtd_theme— Sphinx theme for readthedocs.org.
sphinx.ext.autosummary— This extension generates function/method/attribute summary lists.
sphinx-autodoc-typehints— Moves typehints from function / method declaration to the docs for easy display with Sphinx.
atomicwrites— Powerful Python library for atomic file writes.
cloudpickle— Serialize things that can't be serialized by
pickle. Seems to be better than
fasteners— A python package that provides useful locks.
recursive=parameter which allows the expansion of the
**to zero or more directories.
pyfilesystem2— Python's filesystem abstraction layer. Work with files on the filesystem, in memory, in tar and zip archives, and on S3 using the same API.
smart_open— Utils for streaming large files (S3, HDFS, gzip, bz2...).
urllib.parse— Tools for parsing and constructing URLs and URIs (part of the standard library).
xopen— Open any compressed file in Python.
graph-tool— Efficient Python module for manipulation and statistical analysis of graphs (a.k.a. networks). [
networkit— Sort of like
graph-tool, agian. [
networkx— Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
voc— A transpiler that converts Python code into Java bytecode. The beeware team now recommends including a Python runtime in Android projects instead.
py4j— Py4J enables Python programs running in a Python interpreter to dynamically access Java objects in a Java Virtual Machine. Methods are called as if the Java objects resided in the Python interpreter and Java collections can be accessed through standard Python collection methods. Py4J also enables Java programs to call back Python objects. "Py4J uses a remote tunnel to operate the JVM. This has the advantage that the remote JVM does not share the same memory space and multiple JVMs can be controlled. It provides a fairly general API, but the overall integration to Python is as one would expect when operating a remote channel operating more like an RPC front-end. It seems well documented and capable. Although I haven’t done benchmarking, a remote access JVM will have a transfer penalty when moving data." Source: https://jpype.readthedocs.io/en/latest/userguide.html.
jpype— JPype is a Python module to provide full access to Java from within Python. Start the JVM at Python startup and then access the runtime through the JPype API.
py4jis better if the goal is to start a JVM ad hoc. Garbage collection can become an issue, especially with circular dependencies in and out of the JVM.
jedi— Auto-completion and static analysis.
pre-commit— A framework for managing and maintaining multi-language pre-commit hooks.
Machine learning frameworks
chainer— A Powerful, Flexible, and Intuitive Framework for Neural Networks. (Website)
CuPy— A NumPy-compatible matrix library accelerated by CUDA. (Website)
deepdive- Transform "dark data" into SQL tables.
edward— A library for probabilistic modelling, inference, and criticism. Deep generative models, variational inference. Runs on TensorFlow.
faiss— A library for efficient similarity search and clustering of dense vectors.
pomegranate— Package which implements fast, efficient, and extremely flexible probabilistic models ranging from probability distributions to Bayesian networks to mixtures of hidden Markov models. The most basic level of probabilistic modelling is the a simple probability distribution.
prophet— Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
pymc3— Probabilistic Programming in Python. A new version is being developed on top of JAX.
SerpentAI— Game Agent Framework. Helping you create AIs / Bots to play any game you own!
snorkel- Write functions that go over your data to generate features (one feature per function). Then combine these features in a smart way in order to build a predictor which performs well.
statsmodels— Statistical modelling and econometrics in Python.
surprise— A Python scikit for building and analyzing recommender systems.
tpol- Automates model selection and hyperparameter optimization.
venture— Probabilistic programming in python
Accelerate— A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision.
GPyTorch](https://github.com/cornellius-gp/gpytorch) — A highly efficient and modular implementation of Gaussian Processes in PyTorch. Used by e.g. [finite_ntk`.
Pyro— Deep universal probabilistic programming with Python and PyTorch.
PyTorch Lightning— Rewrite Pytorch code such that the logic and the underlying architecture are decoupled.
Machine learning lifecycle
mlflow— Open source platform for the machine learning lifecycle.
torch-elastic— TorchElastic enables distributed PyTorch training jobs to be executed in a fault tolerant and elastic manner.
torch-serve— TorchServe is a flexible and easy to use tool for serving PyTorch models.
monotonic— An implementation of
time.monotonic()for Python 2 & Python 3.
sqlfluff— SQL autoformatter.
peewee— Too limited in my experience.
PonyORM— Very Pythonic, but probably slow and limited?
sqlalchemy— De facto standard.
vmprof— vmprof is a platform to understand and resolve performance bottlenecks in your code. It includes a lightweight profiler for CPython 2.7, CPython 3 and PyPy and an assembler log visualizer for PyPy. (Source Code)
coverage.py— Provides code coverage.
milksnake— A setuptools/wheel/cffi extension to embed a binary data in wheels.
PyO3— Rust bindings for the Python interpreter.
annotations— Allows using classes as types before those classes are defined.
trace— Code coverage and tracing.
sched— Schedule the execution of things.
time.perf_counter(useful for timing the execution of things).
urllib.parse— Parse URI strings (including database connection strings) into objects.
typing.NamedTuple— Nice way to define type-annotated namedtuples.
datetimeobjects to and from strings. Can divide two
timedeltaobjects to get time as a floating point value in a desired denomination. Additional ways of parsing datetime strings and iterating over datetime ranges is available in the
dateutilpackage (using the
golem— Test automation framework for testing websites.
MechanicalSoup— A Python library for automating interaction with websites.
bounter— Efficient Counter that uses a limited (bounded) amount of memory regardless of data size.
fuzzywuzzy— Fuzzy String Matching in Python.
gensim— Topic Modelling for Humans.
pdftabextract— A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
spacy— Industrial-strength Natural Language Processing (NLP) with Python and Cython.
ivis— Project high-dimensional data to two dimensions for easy visualization.
matplotlib— Note that the authors recommend using the object-oriented interface over the pyplot interface when possible. Also, see the MatPlotLib cheatsheet.
pdvega— Interactive plotting for Pandas using Vega-Lite.
aiohttp— Async HTTP client/server framework using
asyncio. Did not use
uvlooporiginally (not sure about now), so slower than alternatives.
apistar— Self-documenting REST API. Supports both WSGI and
asyncio. Uses the OpenAPI standard to declare types. By the same author as
django-rest-framework. Authors moved over to working on
fastapi🌟 — Combines the speed of
starlettewith the user experience of
httptools— Python binding for the Node.js HTTP parser (the default Python parser is slow).
hug— Auto-documenting APIs.
sanic— Async web framework that looks like Flask.
starlette— Lightweight ASGI framework/toolkit, which is ideal for building high performance
asyncioservices. Faster than
uvicorn— Fast ASGI server. Uses
asyncioevent loop, implemented using
libuv(used by Node.js).
django— The Web framework for perfectionists with deadlines.
falcon— Very fast, particularly on PyPy2.
flask— A microframework based on Werkzeug, Jinja2 and good intentions.
gain— Web crawling framework based on asyncio.
Interface with C / C++
Add figure letters:
plt.text(0, 1.10, string.ascii_uppercase[i], transform=ax.transAxes, size=20, weight='bold')
Control colorbar label size:
ax = sns.heatmap(...) cbar_ax = ax.figure.axes[-1] plt.setp(cbar_ax.get_yticklabels(), fontsize=12)
Create graph from sparse matrix:
g = Graph() g.add_edge_list(np.array(a.nonzero()).T)
Thinks to keep in mind
It is not neccessary to use
pass when there is already a docstring.
class Error(Exception): """My custom error""" # pass ← Not needed here
Method chaining in
def show_df(df): display(df) return df ans = ( df .loc[lambda x: x['name'] != 'Bob'] .query("city == 'London'") .pipe(show_df) .loc[lambda x: x['age'].between(10,50)] .eval("age_above_20 = age > 20") .assign(dummy=lambda :) )