setup.py vs requirements.txt
There’s a lot of misunderstanding between setup.py
and requirements.txt
and their roles. A lot of people have felt they are duplicated information and
have even created tools to handle this “duplication”.
Python Libraries
A Python library in this context is something that has been developed and
released for others to use. You can find a number of them on PyPI that others
have made available. A library has a number of pieces of metadata that need to
be provided in order to successfully distribute it. These are things such as
the Name, Version, Dependencies, etc. The setup.py
file gives you the
ability to specify this metadata like:
from setuptools import setup
setup(
name="MyLibrary",
version="1.0",
install_requires=[
"requests",
"bcrypt",
],
# ...
)
This is simple enough, you have the required pieces of metadata declared. However something you don’t see is a specification as to where you’ll be getting those dependencies from. There’s no url or filesystem where you can fetch these dependencies from, it’s just “requests” and “bcrypt”. This is important and for lack of a better term I call these “abstract dependencies”. They are dependencies which exist only as a name and an optional version specifier. Think of it like duck typing your dependencies, you don’t care what specific “requests” you get as long as it looks like “requests”.
Python Applications
Here when I speak of a Python application I’m going to typically be speaking about something that you specifically deploy. It may or may not exist on PyPI but it’s something that likely does not have much in the way of reusability. An application that does exist on PyPI typically requires a deploy specific configuration file and this section deals with the “deploy specific” side of a Python application.
An application typically has a set of dependencies, often times even a very complex set of dependencies, that it has been tested against. Being a specific instance that has been deployed, it typically does not have a name, nor any of the other packaging related metadata. This is reflected in the abilities of a pip requirements file. A typical requirements file might look something like:
# This is an implicit value, here for clarity
--index-url https://pypi.python.org/simple/
MyPackage==1.0
requests==1.2.0
bcrypt==1.0.2
Here you have each dependency shown along with an exact version specifier. While a library tends to want to have wide open ended version specifiers an application wants very specific dependencies. It may not have mattered up front what version of requests was installed but you want the same version to install in production as you developed and tested with locally.
At the top of this file you’ll also notice a
--index-url https://pypi.python.org/simple/
. Your typical requirements.txt
won’t have this listed explicitly like this unless they are not using PyPI, it
is however an important part of a requirements.txt
. This single line is
what turns the abstract dependency of requests==1.2.0
into a “concrete”
dependency of “requests 1.2.0 from https://pypi.python.org/simple/". This is
not like duck typing, this is the packaging equivalent of an isinstance()
check.
So Why Does Abstract and Concrete Matter?
You’ve read this far and maybe you’ve said, ok I know that setup.py
is
designed for redistributable things and that requirements.txt
is designed
for non-redistributable things but I already have something that reads a
requirements.txt
and fills out my install_requires=[...]
so why should I
care?
This split between abstract and concrete is an important one. It was what allows the PyPI mirroring infrastructure to work. It is what allows a company to host their own private package index. It is even what enables you to fork a library to fix a bug or add a feature and use your own fork. Because an abstract dependency is a name and an optional version specifier you can install it from PyPI or from Crate.io, or from your own filesystem. You can fork a library, change the code, and as long as it has the right name and version specifier that library will happily go on using it.
A more extreme version of what can happen when you use a concrete requirement
where an abstract requirement should be used can be found in the
Go language. In the go language the default package manager (go get
)
allows you to specify your imports via an url inside the code which the package
manager collects and downloads. This would look something like:
import (
"github.com/foo/bar"
)
Here you can see that an exact url to a dependency has been specified. Now if I used a library that specified its dependencies this way and I wanted to change the “bar” library because of a bug that was affecting me or a feature I needed, I would not only need to fork the bar library, but I would also need to fork the library that depended on the bar library to update it. Even worse, if the bar library was say, 5 levels deep, then that’s a potential of 5 different packages that I would need to fork and modify only to point it at a slightly different “bar”.
A Setuptools Misfeature
Setuptools has a feature similar to the Go example. It’s called dependency links and it looks like this:
from setuptools import setup
setup(
# ...
dependency_links = [
"http://packages.example.com/snapshots/",
"http://example2.com/p/bar-1.0.tar.gz",
],
)
This “feature” of setuptools removes the abstractness of its dependencies and
hardcodes an exact url from which you can fetch the dependency from. Now very
similarly to Go if we want to modify packages, or simply fetch them from a
different server we’ll need to go in and edit each package in the dependency
chain in order to update the dependency_links
.
Developing Reusable Things or How Not to Repeat Yourself
The “Library” and “Application” distinction is all well and good, but whenever
you’re developing a Library, in a way it becomes your application. You want a
specific set of dependencies that you want to fetch from a specific location
and you know that you should have abstract dependencies in your setup.py
and concrete dependencies in your requirements.txt
but you don’t want to
need to maintain two separate lists which will inevitably go out of sync. As it
turns out pip requirements file have a construct to handle just such a case.
Given a directory with a setup.py
inside of it you can write a requirements
file that looks like:
--index-url https://pypi.python.org/simple/
-e .
Now your pip install -r requirements.txt
will work just as before. It will
first install the library located at the file path .
and then move on to
its abstract dependencies, combining them with its --index-url
option and
turning them into concrete dependencies and installing them.
This method grants another powerful ability. Let’s say you have two or more
libraries that you develop as a unit but release separately, or maybe you’ve
just split out part of a library into its own piece and haven’t officially
released it yet. If your top level library still depends on just the name then
you can install the development version when using the requirements.txt
and
the release version when not, using a file like:
--index-url https://pypi.python.org/simple/
-e https://github.com/foo/bar.git#egg=bar
-e .
This will first install the bar library from https://github.com/foo/bar.git,
making it equal to the name “bar”, and then will install the local package,
again combining its dependencies with the --index
option and installing
but this time since the “bar” dependency has already been satisfied it will
skip it and continue to use the in development version.
Recognition: This post was inspired by Yehuda Katz’s blog post on a
similar issue in Ruby with Gemfile
and gemspec
.