Tools of the Modern Python Hacker: Virtualenv, Fabric and Pip

July 05, 2009

In the jargon of the computer programmer, a hacker is someone who strives to solve problems in elegant and ingenious ways. Part of the path to elegantly solving larger problems is to use tools that solve sub-problems very well.

For the modern Python programmer, some of the most important tools to aid in reduced complexity and repetition are virtualenv, Fabric, and pip. Although these tools have no strict relationship (in the sense that many people may use one or two of these tools often, yet aren’t even aware of the others), they form a powerful suite when combined. An excellent example is the following:

You have a Django project that you want to do automated deployment to both your staging and production environment (Fabric). The project has several dependencies the need to be exactly met at deploy time (pip), and the remote production environment disallows global site-package installation, so you need an isolated environment to install all dependencies (virtualenv).

I have found virtualenv, pip, and Fabric together to be invaluable for my larger personal projects such as Wikipedia Game, but even for the smallest of projects, I always use at least one for something.

In this article I will discuss the benefits and specifics of each of the three tools.

Note: In each example, it is recommended to consider replacing each usage of `easy_install` with a call to `pip` instead - it's just that `easy_install` is still more well known and prevalent. Ironically, an easy way of getting `pip` is by typing `easy_install pip`, although there are other ways, see google. A discussion of the `pip` vs `easy_install` dilemma can be found in the last section entitled **pip**.

virtualenv

virtualenv is a great tool that creates isolated python environments, and is very easy to use. So easy in fact, you basically have little excused not to use it :-). Try switching over to your shell and typing exactly the following code:

  easy_install virtualenv
  virtualenv myvirtenv
  cd myvirtenv
  source bin/activate

Badda-bing, you have used virtualenv! Now for a description of real world usage. Let’s say that you have some older version of Django installed, but you want to try out Django 1.0.2-final (the latest at the time of this writing). You are worried that installing the very latest Django might break a couple apps that you are working on, but you want to just see if thing will just work. So in the same shell that you just started “myvirtenv”, copy over an instance of your Django app (I’ll call it “mydjangoapp”) and then type:

  easy_install -U django
  cd mydjangoapp
  python manage.py runserver #uses the latest Django

The are small bits of quirky behaviour I have hit up against while using virtualenv. Once such instance is when I went to using the indispensable Ipython inside of one of my virtualenvs. It appears that some packages to play nice with virtualenv (the specific reasons escape me right now). So to use Ipython inside a virtualenv, I have just resorted to installing Ipython inside of it, and specifically using it by running the Ipython found in my virtualenv’s bin directory like so:

  easy_install ipython
  bin/ipython

Maybe there is a better way, but this works for me for now.

So virtualenv is a big one. If you have not started using virtualenv for at least your local development environment, it’s worthwhile to give it a shot. In the last couple months I have adopted the mindset that I will be mostly avoiding installing python packages into the global ‘site-packages’, as I’ve found that it pays to manage dependencies on more of a per-project basis, rather than globally installing, and hoping all goes well.

Fabric

Fabric is all about minimizing the laborious repetition that is involved into taking local development code and putting it into a remote deployment environment. In practice, deploying always includes several very boring steps, and to quote an excellent article on deploying Django with Fabric

Repetition leads to boredom, boredom to horrifying mistakes, horrifying mistakes to God-I-wish-I-was-still-bored

So what Fabric allows you to do is write simple file called fabfile.py and using several simple Fabric calls like local (execute a local command), run (execute a remote command on all specific hosts), put (copy over a local file to a remote destination), etc.

For example, let’s say that you have a simple website that you are developing for a client. Every time you want to show your client a new change, you need to deploy to the production server, so the client can see the changes on the site. This always involves several steps like packaging up the source from your source code management system, putting the source in the correct place remotely, and restarting the remote web server. This can be very tedious by hand, especially for a couple of frequent, small changes.

The following shows how to automate these steps with Fabric. File put the below in a file named fabfile.py

  set(fab_user='alex',
       fab_hosts=['clientsite.com'],
       root='/home/alex/websites/',
       site='clientsite')

  def deploy():
      local("git archive --format=tar HEAD | gzip > $(site).tar.gz")
      run("rm -rf $(root)$(site)")
      put("$(site).tar.gz', '$(root)$(site).tar.gz")
      run("cd $(root)$(site)")
      run("tar zxf $(site).tar.gz")
      restart()

  def restart():
      run('sh $(root)$(site)/restart.sh')

A now from the command line you simply run

  fab deploy

and Fabric does 1) A local packaging of the source into a tarball, 2) removing the old remotely deployed code, 3) putting the source on the remote host and then 4) doing a restart of the site. Great features to notice are: the chaining of local-command, remote-command, and remote-copy commands, the substituting of variables, and the splitting of distinct operations (like in the case of the ‘restart’ command) so you run them separately, with a fab restart, for example.

Many more commands and advanced uses of Fabric can be found in the Fabric docs.

pip

In my earlier example you saw me using easy_install, and what I’m about to preach is that in fact you should be using pip. Truth is, this mild taste of hypocrisy is because I myself am in a state of transition/exploration of whether I should completely adopt pip never look back. pip is the future of Python package management, or at least that’s what is seems like. Let’s hear the arguments.

pip vs. easy_install

To start, here is a quote from an article on Python packaging by Django hacker James Bennett

Please, for the love of Guido, stop using setuptools and easy_install, and use distutils and pip instead.

You can see that some are pretty passionate about the subject, and I have found there are real reasons for this once you start doing non-trivial work with multiple python packages. Soon after the posting of that article by James Bennett, the creator of pip responded with a posting of his own describing The state of Python packaging which includes, among other good bits of discussion, an excellent glossary of all the parts involved.

I have also heard exactly the same sentiment while hanging out on the Twisted IRC channel. In fact, Twisted itself, only when installed via easy_install (and hence setuptools is installed), uses setuptools to get the dependencies of Twisted (i.e. zope.interface). One major shortcoming of disutils that setuptools attempts to solve is the resolving of dependencies automatically at install time (like Ubuntu's Debian’s apt-get does so wonderfully. That said, apt-get does not allow multiple versions of packages - a problem virtualenv is designed to solve).

A final important note is that pip finds packages in the same way that easy_install does, so most packages that can be easy_install‘ed can be pip install‘ed.

pip features and how-to

Some great features of pip

  • Downloads all dependencies before installation. Important if you are using pip as part of a automated remote deployment, and you want to ensure install fidelity.
  • Install straight from version control systems such as git, mercurial, subversion, and bazaar.
  • The freeze command - Put all currently installed packages (exact versions) into a requirements file
  • The bundle command - Create pybundles (archives containing multiple packages)

Another great feature of pip is how well it works with virtualenv. This is no coincidence at all, as both tools are written by Python hacker Ian Bicking.

The following example of installing Codenode shows the elegant interplay of pip and virtualenv.

  virtualenv --no-site-packages mycodenode_env
  pip -E mycodenode_env install -U twisted django
  pip -E mycodenode_env install -e git://github.com/codenode/codenode.git#egg=codenode

First we create a fresh virtualenv absent of all pre-installed packages that already exist in the global site-packages. Then we get the very latest Twisted and Django and install the directly into our new virtualenv. Lastly, we install the latest codenode directly from the it’s source code repository. A lot of goodness in 3 lines, to say the least.

Conclusions

At first glance, it may seem that some of these tools just add layer of complexity - especially in the case of smaller projects. To that, I would say: it is decreased complexity and increased efficiency that you will gain in the long run - a key pursuit of the true hacker - that makes these tools worth a small investment of time to learn. Finally, I’d like to give a special thanks to the below resources, which all helped in achieving the needed ‘ah-ha’ moments that is crucial when trying out something new.

Honorable Mention

  • Yolk: Tool for obtaining information about packages installed by setuptools, easy_install and distutils.
  • zc.buildout: Configuration-driven build tool that is ‘coarse-grained’ (applications, config files, databases as opposed to single files) that supports repeatable builds via specification files.

Resources