Posted by matijs 17/02/2013 at 20h15
I love Travis CI. I love git bisect. I used both recently to track down a bug in GirFFI.
Suddenly, builds were failing on JRuby. The problem did not occur on my own, 64 bit, machine, so it seemed hard to debug. I tried making Travis use different JVMs, but that didn’t help, apart from crashing in a different way (faster, too, which was nice).
Building a Travis box
Using the travis-boxes repository, I created a VM as used by Travis. This is currently not documented well in the READMEs, so I’m writing it down here, slightly out of order of actual events.
I cloned the following three repositories:
First, I created a base box in veewee-definitions, according to its README. In this case, I created a precise32 box, since that’s the box Travis uses for the builds. The final, export, stage creates a precise32.box file.
Then, I moved the precise32.box file to travis-boxes/boxes, making a base box available there. There is a Thor task to create just such a base box right there, but it doesn’t work, and seems to be deprecated anyway, since veewee is no longer supposed to be used in that repository.
So, a base box being available in travis-boxes, I used the following to create a fully functional box for testing Rubies:
bundle exec thor travis:box:build -b precise32 ruby
Oddly, this didn’t produce a box travis-ruby, but it did produce travis-development, which I could then manipulate using vagrant.
Hunting down the bug
I ssh’d into my fresh travis box using vagrant ssh. After a couple of minutes getting to know rvm (I use rbenv myself), I was able to confirm the crash on JRuby. After some initial poking around trying to pin down the problem to one particular test case and failing, I decided to use git bisect. As my check I used the test:introspection task, which reliably crashed when the problem was present.
While it’s possible to automate git bisect, I like to use it manually, since a particular test used may fail for unrelated reasons. Also, since git bisect is a really fast process, there is a pleasent lack of tedium.
Anyway, after a couple of iterations, I was able to locate the problematic commit. By checking the different bits of the commit I then found the culprit: I accidentally broke the code that creates layout definitions, in particular the one used by GValue. Going back to master, I added a simple test and fix. I will have to revisit the code later to clean it up and make it more robust.