A Poetic Apology
Or Why Should You Use Poetry to Manage Python Dependencies
Posted by Pedro Ferrari
on August 21, 2020 · 12 mins read
If you ever spent some time trying to write a Python application you have probably experienced Python's Infamous Dependency Hell at some point. You've probably also gathered by now that it has become a folk tradition to start any piece about such subject with the following notorious xkcd comic:
Luckily for you (and all of us) at the time of writing, there are some good solutions to the entanglement of pains brilliantly depicted in the image above. In fact, you most likely already know that if you want to develop against multiple Python versions you can readily use pyenv to start unraveling your twisted environment. You might have also learned that if you happen to be working at the same time on multiple projects with conflicting dependencies then you can employ virtual environments to isolate clashing libraries. In this document, we'll introduce yet another tool, Poetry, and make an argument about why you should probably add it to your own toolkit.
The Problem
Imagine one lonely night you decide to start a simple dummy Python project accurately
named foo
with the following structure
├── foo
│ ├── foo
│ │ ├── bar
│ │ │ └── data.py
│ │ └── constants.py
│ └── README.md
Since this is not your first Python project and you want to avoid spending more endless nights fixing incompatibilities between your system and project modules, you diligently initiate a virtual environment from your shell with
$> python -m venv ~/Desktop/venv/foo-venv
and activate it within the newly created project with
$> source ~/Desktop/venv/foo-venv/bin/activate
Equipped with an isolated environment, you triumphantly proceed to install the ubiquitous
Pandas data library. To achieve such thing you use Python's de-facto package manager
pip
and carefully pin the library version to ensure replicability
$> pip install pandas==0.25.3
Since you are a bit lazy at doing exploratory data analysis, you also install the nifty
pandas-profiling
module to help you with that tedious work
$> pip install pandas-profiling==2.5.0
After all this flirting, you finally start coding (assuming that adding the
following lines to the data.py
file can, in fact, be called that)
import pandas as pd from pandas_profiling import ProfileReport df=pd.DataFrame([['a',1],['b',None]],columns=['category', 'value']) df['category']=df['category'].astype('category') if __name__=='__main__':ProfileReport(df).to_file('foo.html')
Given that your days of abusing print
statements for debugging purposes are long gone,
you install the beautiful and handy pdbpp library to
check that these lines work as intended
$> pip install pdbpp==0.10.2
and run your code in post mortem debugging mode with python -m pdb -cc data.py
.
Happy with the clean run, you now realize that in order to ship your voluptuous application
without falling into the "works on my machine" trap you need a way to collect all your
dependencies. A quick Google search will show you that pip's freeze
subcommand allows to
record the current environment package into a requirements.txt
file by means of the
following incantation
$> pip freeze > requirements.txt
which lets anyone use your project by simply installing the needed dependencies with
$> pip install -r requirements.txt
Just as you are about to reveal your masterpiece project to the world, you become aware
that the improved debugging module is actually only used by you while developing. With the
idea of splitting the frozen requirements into separate production and development files,
you take a peek into the generated file only to discover that each and every
single sub-dependency of your application's dependencies is listed therein and locked to a
specific version. Foreseeing the nightmare of maintaining this immense list you uninstall
the pdbpp
library to ensure a clean requirements file again by means of
$> pip uninstall -y pdbpp && pip freeze > requirements.txt
A quick glance at the modified requirements file, however, shows that things didn't quite
turn out as expected: pdbpp
was indeed removed but its dependencies, such as
fancycompleter
, are still installed. Since this seems a dead-end, you choose to start
from scratch by manually creating a requirements.txt
file with only production
dependencies
pandas==0.25.3 pandas-profiling==2.5.0
and an equivalent development file, requirements_dev.txt
, solely containing
pdbpp==0.10.2
Impressed by the cleverness that has seemingly helped you dodge the dreaded Python dependency hell by keeping a record of isolated top-level packages, you decide to call it a day and give a final spin to your application the next day.
When you wake up in the morning, the news are all over the place: Pandas v1 is finally out (after only twelve years!). A couple of hours procrastinating with the incredibly long changelog makes you conclude that your complex foo-project will surely gain notable improvements by updating to the brand new version. Now since you've locked Pandas to an exact version, you cannot simply run
$> pip install -U -r requirements.txt
Instead you must execute
$> pip install pandas==1.0.0
which leads to a particularly bizarre and confusing situation: an error pops in your terminal
ERROR: pandas-profiling 2.5.0 has requirement pandas==0.25.3, but you'll have pandas 1.0.0 which is incompatible.
but the installation of pandas 1.0.0
nonetheless takes place. Assuming this to be a warning
that pip
mistakes for an error, you update your requirements.txt
file accordingly and
joyfully proceed to run your data.py
module one last time, only to discover that it throws
an enigmatic TypeError
. Feeling now betrayed by pip's apparent inability to resolve
dependencies, you rollback your changes and stick with Pandas' (now) outdated version.
At this point, you seem to have a working project but i) you are unsure whether reverting
Pandas version might have broken the desired replicability of your application, ii) the
code could definitely look better and iii) after a good night’s sleep you acknowledge that
the overall functionality of your application is not as complex and rich as you thought it
was the night before. To address the first two issues, you first add the black
formatter
to your requirements_dev.txt
black==19.10b0
and then within your project directory, you recreate your virtual environment with
$> rm -rf ~/Desktop/venv/foo-venv $> python -m venv ~/Desktop/venv/foo-venv $> source ~/Desktop/venv/foo-venv/bin/activate $> pip install -r requirements_dev.txt $> pip install -r requirements.txt
Now you run black
in your project root (with black .
) and are mostly satisfied with
the prettifying job it did but to abide by Mutt Data's format style (which is coincidentally
consistent with your dislike of making every single quote a double quote), you add a
pyproject.toml
telling black
to skip such appalling string normalization default
setting
[tool.black] skip-string-normalization = true
The code looks great now and a new post mortem debugging run shows that things seem to work
fine in the new (replicable) environment. The only thing left to be done before deploying
the code on the server or sharing it with the world is to avoid having constants, such as
the report name, hardcoded all around the code. You thus
decide to add the following lines to your constants.py
empty module
REPORT_FILE = 'foo.html'
and modify the data.py
to import such constant from the relative parent file with
from ..constants import REPORT_FILE
A new data.py
run however now, unfortunately, shows the next error
ImportError: attempted relative import with no known parent package
which according to the omniscient SO makes sense since Python relative imports only work
within a package and therefore if you want to import from a parent directory you should
either create such package or hack the sys.path
. As a true purist pythonista, you pick
the former path and create a setup.py
with the following contents
from setuptools import setup with open('requirements.txt') as f: install_requires = f.read().splitlines() with open('requirements_dev.txt') as f: extras_dev_requires = f.read().splitlines() setup( name='foo', version='0.0.1', author='Mutt', author_email='info@muttdata.ai', install_requires=install_requires, extras_require={'dev': extras_dev_requires}, packages=['foo'], )
Now in a brand new virtual env you install your package in editable mode with pip install -e .[dev]
, change the import line in data.py
to account for the package structure
from foo.constants import REPORT_FILE
and cross your fingers hoping everything finally works...
Everything does indeed (brittlely) work, but somehow all the hoop-jumping to make it function makes you uneasy. A brief introspection reveals several reasons for the wave of apprehension:
- Since you plan to work on multiples Python projects at the same time, isolation is a fundamental piece of your workflow. Virtual environments do solve this problem, but the activate/deactivate process is cumbersome and easy to forget.
- Having dependencies isolated between projects doesn't address dependency clash within a
project. Proper dependency resolution is the top required feature of any package
manager worthy of respect yet
pip
will only get such feature by October 2020. Manually guaranteeing dependency consistency in complex projects is a deadlock. - If you want to install your application/project as a package, you must go through the
overhead of adding a
setup.py
on top of your already multiple requirements files. However, you've read PEPs 517-518 and want to try out the simpler and safer build mechanisms mentioned therein. - You thought about trying your application on a different machine, but realized it ran
Python 3.7 while your local box runs 3.8. To use
pyenv
with your isolated virtual envs, you need an extra plugin, pyenv-virtualenv, which makes managing venvs even more burdensome. - You briefly played with Pipenv which promised to
bring to Python the envied features of other languages’ more mature package managers
(such as Javascript's
yarn/npm
or Rust'sCargo
) only to become quickly disappointed. Not only did Pipenv misleadingly claim to be Python's official recommended packaging tool (when it was truly designed to write applications and not packages) but it also didn't do releases for more than a year and still hangs endlessly when creating the lock file that ensures repeatable/deterministic builds.
In a state of hopeless despair, you frantically start searching online to see if a solution addressing all these problems already exists. Amid a plethora of partial/incomplete candidates, you at last encounter one that incredibly cracks them all: it's called Poetry.
The Solution
Installation (with Pipx)
Poetry is a CLI app written in Python so you can simply install it with pip install --user poetry
. However, you probably already installed or will install other Python CLI
apps (for instance the fancy PostgreSQL client pgcli
or youtube-dl
to download YouTube
videos). If you install these with your system's package manager (say for instance apt
,
yay
or brew
) they will be installed at a global level and their dependencies could
potentially clash. You could create an individual venv for each instead, but in order to
use them you would have to go through the hassle of activating the environment first...
To circumvent this annoying scenario you can use
pipx which will precisely install packages in an
isolated virtual enviroment and at the same time will make them readily available in your
shell (i.e add the executable to your binaries $PATH
). On top of exposing CLI apps for
global access, it also makes easy to list, upgrade and uninstall these apps. To install
Poetry with pipx
you first install pipx
with
$> python -m pip install --user pipx $> python -m pipx ensurepath
and then directly do
$> pipx install poetry
If you prefer living on the edge (like I do) you can alternatively install a pre-release
version with pipx install --pip-args='--pre' poetry
.
Usage
Now, you are all set to try the wonders promised by Poetry. To that effect, you create a new
folder/project called foo-poetry
with your .py
files above and then run poetry init
.
An interactive prompt will start asking you to provide basic information about your
package (name, author, etc) that will be used to create a pyproject.toml
file. This is
essentially the same metadata you previously added to the setup.py
with some minimal
variations
This command will guide you through creating your pyproject.toml config. Package name [foo-poetry]: foo Version [0.1.0]: 0.0.1 Description []: Author [petobens <petobens@yahoo.com>, n to skip]: Mutt <info@muttdata.ai> License []: Compatible Python versions [^3.8]: ~3.7 Would you like to define your main dependencies interactively? (yes/no) [yes] no Would you like to define your development dependencies interactively? (yes/no) [yes] no Generated file [tool.poetry] name = "foo" version = "0.0.1" description = "" authors = ["Mutt <info@muttdata.ai>"] [tool.poetry.dependencies] python = "^3.7" [tool.poetry.dev-dependencies] [build-system] requires = ["poetry-core>=1.0.0a5"] build-backend = "poetry.core.masonry.api" Do you confirm generation? (yes/no) [yes] yes
The two relevant settings to highlight are the build-system and the Python
version specification. The only thing you need to know for the time being about the first
one is that it uses the standards in PEPs 517-518 to define an alternative way to build a
project from source code without setuptools
(and hence removes the need of a setup.py
file). Regarding the second setting, to understand the syntax that specifies the Python version
constraints, you should read Poetry versions
docs where you will find out that the caret (^)
requirement means that only minor and patch updates are allowed (i.e that our application
will work with Python 3.7 and 3.8 but not with 4.0).
So far, you only have a TOML
file (which you can also use to centralize your black
configuration). How do you specify dependencies? Simply run
$> poetry add pandas==0.25.3
which results in
Creating virtualenv foo-KLaC03aC-py3.8 in /home/pedro/.cache/pypoetry/virtualenvs Updating dependencies Resolving dependencies... (0.6s) Writing lock file Package operations: 5 installs, 0 updates, 0 removals - Installing six (1.15.0) - Installing numpy (1.19.1) - Installing python-dateutil (2.8.1) - Installing pytz (2020.1) - Installing pandas (0.25.3)
In other words an initial add
command will i) create a virtual environment, ii) install
the requested packages and its subdependencies, iii) write the exact version of each
downloaded dependency to the poetry.lock
file (which you should commit to your VCS so as
to enforce replicability) and iv) append a line with the newly added package to the
tool.poetry.dependencies
section of the pyproject.toml
file. The last item also
signals that if you want to install a new dependency you can reuse the add
command or
directly add such a line to your pyproject.toml
file. For instance, if you now want to
add the pandas-profiling
library, you can modify the pyproject so as to have
pandas-profiling = "2.5.0"
Since at this stage a poetry.lock
file already exists, if you now run poetry install
then Poetry will resolve and install dependencies using the versions specified in such
lock file (to ensure version consistency). However, due to the fact that you added a new
dependency manually to the pyproject.toml
file, the install
command will fail.
Therefore, in this case, you need to run poetry update
which is essentially equivalent to
deleting the lock file and running poetry install
again.
Adding a development
dependency works in a similar fashion with the only caveat that you
need to use the --dev
flag when executing the add
command
$> poetry add pdbpp==0.10.2 --dev $> poetry add black==19.10b0 --dev
and the resulting packages will be appended to the tool.poetry.dev-dependencies
section.
Now that dependencies are set, you can run your code data.py
file executing
$> poetry run python data.py
which will execute the command within the project's virtualenv. Alternatively, you can spawn a shell within the active venv simply by running
$> poetry shell
Now imagine that you want to update the Pandas version as you did before when checking pip's inability to enforce dependency resolution. To do that, you update the constraint like
$> poetry add pandas==1.0.0
which this time correctly fails with the following error
Updating dependencies Resolving dependencies... (0.0s) [SolverProblemError] Because pandas-profiling (2.5.0) depends on pandas (0.25.3) and foo depends on pandas (1.0.0), pandas-profiling is forbidden. So, because foo depends on pandas-profiling (2.5.0), version solving failed.
By now, you notice that Poetry seems to address the initial two requests you listed in the
previous section (namely easy project isolation and proper automatic dependency
resolution). Before getting your hopes up, you proceed to verify whether it can
straightforwardly package your code (particularly without a setup.py
). Notably, this
simple boils down to the inclusion of the following line the tool.poetry
section of the
pyproject.toml
file
packages = [{include = "foo"}]
followed by the execution of a new poetry install
which will by default install the
project in editable mode.
Excited by Poetry's simplicity and ease of use you start to wonder if Poetry is the
ultimate tool you've been looking for. Can it check all the boxes? To conclusively answer
that question you want to see if it is easy to switch between different Python versions.
Given that your local machine uses Python 3.8 by default, you consequently install 3.7.7
with pyenv install 3.7.7
(installing a prior release would not have worked down the road
since you set 3.7 to be the lower bound in your application pyproject.toml
). To make
this version locally available, you add a .python-version
file to the root of your
project containing a single line with 3.7.7
and then tell Poetry to create and use a
virtualenv with that version with
$> poetry env use 3.7
Once you check it's correctly activated with poetry env list
you then install all
dependencies with poetry install
and ultimately run your code which (unsurprisingly)
finishes without issues.
Marveled by its intuitive plainness, you conclude that Poetry is exactly what you needed. In fact, you don't yet know this, but you got way more than you bargained for since you've only scratched the surface of features. You still have to discover that it installs packages in parallel, throws beautiful colored exceptions when all hell breaks loose, integrates with your IDE/editor of choice (if that's vim you can try your humble servant's shameless take on the matter), has a command to directly publish a package and, among other countless delights, is scheduled to have a plugin system for further extensibility.
One thing is crystal clear: Poetry is tomorrow's Python package manager. You might as well start using it today.
Note: this article belongs to our internal onboarding docs and has been slightly adjusted for general readership.