OK, there’s no programming revolution, but a friend of mine made this fantastic picture of me back in the crazy beard days and I had to share.
]]>When I first started learning Python, I quickly became a fan of the syntax. Tuples, dictionaries and list comprehensions are some cool language features that remove a lot of boilerplate. But as I waded into existing projects I kept running into environmental concepts like eggs, easy_install, pip, virtualenv and other scaffolding that have nothing to do with the language per se, but I needed to know to do anything useful with it.
However I couldn’t find any overview of this Python ecosystem that I needed to navigate.
It’s easy to start grumbling, “this Python thing is a disorganized mess”, but I recognized that if someone were to walk into a world I already know quite well they would be just as lost. A Java newbie might rightly ask:
What is a class path?
What is a JAR?
An EAR?
Wait, who declared WAR?
Why would I want to use Ant or Maven or Ivy?
So I took a step back and did a survey of the land. And here’s what I found.
Of course you want to get all the cool libraries that people are writing. But how do you get them?
An egg is a distribution of a Python package. Eggs are similar to Java JARs but have more in common with Ruby gems, as eggs support declaring dependencies and defining multiple entry points.
Built eggs are how you usually get Python packages. They are typically zip files that contain all the code. A built egg can also be an un-zipped directory structure and there’s a whole specification for how it’s laid out, but you don’t care about that crap for now.
A development egg is an in-place installation of an egg, for when you are working on an egg. It’s just a normal directory of Python code with some ProjectName.egg-info
subdirectories.
Egg links are pointers to where the real egg is on the filesystem. They only exist to support platforms which do not have native symbolic links (ahem, Windows). They are *.egg-link
files that contain the name of a built or development egg.
So now you know what an egg is. Well damn, it’s basically just a bunch of Python code! So let me download some and get to work! Hold on there, tiger. First you need to know how Python finds these eggs. It simply looks in the sys.path
variable for paths to packages. But where does this path come from?
This is the global spot for a Python installation to store packages. I’d highly recommend not globally installing many things on your system’s default Python, as this really dirties the waters. More on that later.
If you’re curious, you can run this cryptic little bugger to find out where your Python’s site-packages
directory is located:
1
|
|
Python also grabs packages from the environment variable PYTHONPATH
. Not much else I can say about that. Yep.
OK, now I know how Python finds these eggs. So where do I get them and how do I install them? First of all, let me warn you that the single most confusing thing about Python is the state of the package install tools. But the good news is that Python has a well-maintained central repository.
PyPi is short for Python Package Index. It is the official repository for third-party Python packages. It is similar to Perl’s CPAN and Java’s somewhat less formal Apache Maven central repository.
Do not confuse this with PyPy! PyPy is an implementation of Python written in (surprise!) Python.
For the longest time everybody just used easy_install
to download and install their packages. easy_install
is part of the setuptools project by the ubiquitous P.J. Eby. But people started wanting newer features and I think Mr. Eby just didn’t have the time to dedicate to a bunch of new work. So setuptools got forked.
The setuptools project was forked into the Distribute project. Now most people use pip
instead of easy_install
. I’ll let the guys who wrote pip
explain why:
pip was originally written to improve on easy_install in the following ways:
- All packages are downloaded before installation. Partially-completed installation doesn’t occur as a result.
- Care is taken to present useful output on the console.
- The reasons for actions are kept track of. For instance, if a package is being installed, pip keeps track of why that package was required.
- Error messages should be useful.
- The code is relatively concise and cohesive, making it easier to use programmatically.
- Packages don’t have to be installed as egg archives, they can be installed flat (while keeping the egg metadata).
- Native support for other version control systems (Git, Mercurial and Bazaar)
- Uninstallation of packages.
- Simple to define fixed sets of requirements and reliably reproduce a set of packages.
pip doesn’t do everything that easy_install does. Specifically:
- It cannot install from eggs. It only installs from source. (In the future it would be good if it could install binaries from Windows .exe or .msi – binary install on other platforms is not a priority.)
- It is incompatible with some packages that extensively customize distutils or setuptools in their setup.py files.
Buildout. What is it? Alright, the name makes it sound like a build tool, but it’s not really. It’s a way to “build out” a development environment for a project. It’s really good at:
python
script that includes the paths for your project dependencies along with your project code so you don’t need to poison your system PythonIt does a lot of other stuff, but start thinking of it that way and not necessarily like make
or rake
or Ant/Maven.
So let’s say you want to play around with some packages but not necessarily setup a project and declare a bunch of dependencies and stuff. Dammit, you want to use pip
! Enter virtualenv.
All virtualenv does is copy your Python executable somewhere safe and create an isolated site-packages
directory so easy_install
and pip
will keep their dirty mitts out of it. That way you can play around and if something gets all screwed up, you can just blow away your virtual environment and start over.
Oddly enough, to use virtualenv, you’ll need to pip install virtualenv
to your system Python. I know, that’s kind of hypocritical. Deal with it.
I highly recommend using a set of helper scripts called virtualenvwrapper for creating and using Python virtual environments.
I hope this helps clear the way for you a little bit. If I’ve made any errors, please add a comment and let me know!
]]>Not many things are more annoying than a build failure that is nobody’s fault:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
The code is fine, but our build depends on a website that we don’t control!
We use Jenkins for our automated builds which runs Buildout to assemble our project for deployment. Buildout checks the PyPi index for any new packages that are needed or any new versions for packages that are not version-pinned. So if PyPi is unavailable (or very slow) for a while, our build fails. That sucks because Jenkins won’t try another build until it detects a new changelist in our repository.
I started looking into how we could cache PyPi locally to avoid this problem altogether. I found several ways to achieve this, but finally settled on collective-eggproxy. I mainly chose it for 2 reasons: 1. It doesn’t cache/sync all 30+ gigs of PyPi 2. We already have an Apache instance and it can be ran as a mod_python module I ran into a couple of installation and configuration problems so I thought I’d share our setup.
BeautifulSoup is an awesome HTML/XML parser but whoever manages their index on PyPi has the wrong links. I’ve seen a couple of packages that rely on BeautifulSoup versions <= 3.09, but those seem to fail. I finally figured out that adding a find-links hint to easy_install fixed me right up:
1
|
|
Note that I’m just installing it in the system Python site-packages. You could use a virtualenv, but this is just our CI server.
I use Ubuntu so I got mod_python via apt:
1
|
|
In /etc/eggproxy.conf I used:
1 2 3 4 5 6 |
|
I added a dedicated virtualhost to Apache like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
In our project’s buildout.cfg I simply had to add:
1
|
|
This has worked out great for us. Not only are our builds more stable, but there’s a noticeable speed improvement. Totally worth it!
]]>This weekend I was tasked with setting up a new Jenkins server on EC2. I’ve used a server built from Amazon’s official Linux AMI before and was happy enough with it. So I went that route. But it didn’t end very well…
I did the typical wrestling with Apache, Tomcat and SSL configuration. Our project uses Python 2.7 and when the first test build failed I was a bit surprised that it wasn’t installed. I confidently typed
sudo yum install python27
but there was no package for me. Python 2.7 was released almost 2 years ago at this point, what the hell? I searched and searched for a yum repo but there was none.
Very reluctantly I installed Python 2.7 from source. No, I’m not allergic to configure and make, but I like the ability to upgrade everything with one shot. Programmers typically make terrible system administrators so I don’t need yet another thing to remember about this server. But whatever, I installed it and went on with my life.
Our project also uses Ant. When the second test build failed I learned that we specifically require Ant 1.8 and the latest available from the Amazon yum repository is 1.7. Now this is a bit ridiculous, 1.8 was released over 2 years ago. Even though it was past 2am at this point, I decided to kick this installation to the curb.
I chose Ubuntu’s Cloud Guest AMI, and it’s brilliant. Most of the defaults are exactly what I’d choose. And by golly, they have packages for Python 2.7 and Ant 1.8. There’s even a package with Jenkins on Tomcat for me (oh and look at that, they have a package browser on the web, how clever).
]]>On the side I’ve just started helping out on a fun iOS project. iPhone development is one of those things that I’ve been meaning to do for literally 2 years now. Finally getting that off the ground feels good.
Being one of those people who prefers instructor-lead learning over a book any day, I was thrilled to find that Stanford offers CS 193P iPhone Application Development via iTunes U for absolutely free. They even provide the course materials for free.
While it rained on a Saturday, I was able to “attend” the first week of classes and am busy finishing my homework. But I had to take a break to give a shout out to Stanford and Apple for providing something so awesome.
Take a look around iTunes U and you’ll definitely find something that interests you. I remember checking it out when Apple first announced it, but it has grown very large since then.
]]>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Anyone could have missed that – I even knew something was potentially wrong in the date formatting code and seeing SimpleDateFormat threw up some immediate alarms, but it still took me quite a while to walk the dependency chain and then realize the EJB-transitioning-to-Spring-bean was a singleton.
What I’m getting at is, the problem really isn’t Spring or singletons or that we should flog the programmer who didn’t read every line of code in the Spring bean when he was converting it from an EJB. Us Java programmers need to ditch using thread-unsafe classes just like we ditched using pointers and managing memory. These days there’s no good reason to use SimpleDateFormat. It is evil. I’ve seen it kick puppies. Cute ones.
The good news is that for date formatting, there is an easy drop-in replacement in Apache’s Commons Lang library called FastDateFormat that is not only thread-safe, but faster to boot.
If you want to take it a step further (because pretty much anything date-related in the JDK sucks), check out the Joda Time API. It can do thread-safe formatting and parsing.
So if you are touching a class and you see a reference to SimpleDateFormat, please consider taking a few minutes to replace it with something better. Think of the puppies.
]]>