Saturday, July 23, 2011

Git and RSpec

I just watched a talk by Linus Torvalds at Google about Git. One of the questions touched on one of Linus' motivations for Git. That is Merging. I want to talk about how Git merging dove-tails nicely with Ruby's testing tool RSpec.

Some source control managers (SCMs) make a big deal about "branching" being cheap. Linus points out that branching is not the problem, merging is the problem people deal with. Git makes merging easy by removing the number of conflicts the developer has to deal with. It does this two ways. First is algorithmic and the second is how Git changes your workflow.

Git allows for, and relies on, three way merging. CVS/SVN by contrast only does two way. Three way merging diffs the original file and each of the two conflicting versions of that file. Git can do so because it stores the whole files, not just a long string of diffs (ala CVS/SVN which I'll just call SVN from now on). Three way merges allow the merge algorithm to look at the context of the text being diffed. SVN can only look at line 245 and see it changed. Git can notice that the line didn't change, it mearly moved. Say you and the conflicting file inserted a text before the same line 245. So there is no conflict just code motion in the file. Three way diffs see this, two way diffs can't.

The second way Git deals with merge conflicts is not as obvious; it stems from the work flow. Specifically, branching is inherent in Git's design and merging is easier and better. So you branch often and merge often. More to the point you start your branch from the central "master" branch for your feature add; change & commit a bunch of times to that small branch; then pull or push your change back into "master". By repeating this branch-change-merge-push cycle early and often you have less opportunity for merge conflicts. Put another way, your branch exists for less time and syncs with the central master more often, resulting in less opportunity for two people to introduce genuine conflicts.

So alot of talk is generated about Git working in a fully distributed way. However, that ignores that the Git, encourages syncing up those distributed merges early and often.

So imagine you replace SVN with Git. First you don't have to do it to your whole tree at once. You can do it by subsystem. Also there is Git-SVN gateway tools to help do this. You can still have the central tree ala SVN, but you just branch off it and merge back more often. In SVN you are usually doing this anyways. Each SVN commits end up being after the feature is completed; much like the final merge of a Git branch, but without all the Git commits in the mean time. Or you create a longer lived development branch and merge that back as a monster merge.

I've explored other SVN branching/merging workflows but all they do push the pain to a different part of the process. A smart colleague of mine called it "squeezing the balloon". What you need to do is keep the master branch stable. That can be achieved by branching and merging smaller and faster. The SCM (or VCS if you prefer) makes all the difference.

What also helps is having a test suite you can rely on. First the test suite has to be aimed at the internal API level. Second, it has to be easy and fast to run. This is where the choice of Test framework dove tails with the SCM choice. "Early and Often" is the catch phrase for both.

Testing the API's "contract" is where you should aim your testing framework. As a side note: you are forced to state what that "contract" is which is a level of internal documentation that often gets forgotten. The contract is what each function takes as input and what the results should be; especially the edge cases. Edge cases are given a function "sum" adds all the elements of an array, what if the function is given an empty array or a nil as input. So that is the trivial case. Then there is the out-of-bounds case, like each element of an IP address (dot-quad) is between 0(inclusive) and 256(exclusive). Correctness should be tested if you can. Most of the time, correctness can't be tested with out recreating the logic of the function or testing pre-canned input and results. But that leads of who-tests-the-tester (my favorite quote: "Qui custodiet ipsos custodes?")

Another variation of this "contract" is how it applied to methods of a object. Methods have input and return results, but they may alter the objects state. Input and result have to be tested as above, edge case and out-of-bounds case. But internal state? One internal state to test is internal consistency. Again there are edge cases of initialized(trivial states) and known inconsistent states.

RSpec is a good ruby oriented tool-set built explicitly to test the contracts in your APIs. What are other API test frameworks for other languages is left as an exercise for the reader :)

No comments:

Post a Comment