Adventures in New Tech for an Old Coder: July 2011

Friday, July 29, 2011

Morans! I am surrounded by Morans!

I posted a response on a blog recently. The fellow claimed, quite explicitly, that "Fibers and EventMachine were a response to Ruby's poor Threading performance". I responded with Fibers and EventMachine were not made in "response" to Ruby's Thread implementation. I am strangely polite in online discussions, where in person I'd have ripped this guy a new asshole.

I explained that Threads exist because n = read(fd, buf, len) blocks. Threads allow execute work to continue in parallel to blocked IO. That he had it backwards, event driven IO, like EventMachine, exist so a user-space program can continue to execute work while the IO finishes in the Kernel. While user-space programmers don't explicitly invoke the switching of one thread to another, context switching is not costless. In fact the cost of context switching is many hundreds of CPU instructions.

Relatedly, Fibers are cheap where Threads are expensive. Fibers switch into a parallel call stack, much like Threads, with the equivalent of a register switch and a long jump. Additionally, the ability to return results from a Fiber while preserving the Fiber's call stack state allow for remarkable new behaviors to be constructed that can not be done with out them.

It seems these Ruby coders one of two types: One, HTML monkeys that have learned how to code, or Two Java refugees that got tired of Bondage-and-Discipline style of programming. Both of these kind of MORANS, don't understand that Ruby is a remarkable Programming language with features not available in other languages. Neither Perl nor Java, have the concept that every variable is an object. Nor do either language have Fibers/coroutines. These features change the very nature of the code you write.

P.S.
I plan to post some code to show synchronous read/write IO calls while doing all the actual IO inside EventMachine. So you get the best of both worlds ala synchronous IO with out blocking. Stay tuned; same Bat Time same Bat Channel...

Saturday, July 23, 2011

Git and RSpec

I just watched a talk by Linus Torvalds at Google about Git. One of the questions touched on one of Linus' motivations for Git. That is Merging. I want to talk about how Git merging dove-tails nicely with Ruby's testing tool RSpec.

Some source control managers (SCMs) make a big deal about "branching" being cheap. Linus points out that branching is not the problem, merging is the problem people deal with. Git makes merging easy by removing the number of conflicts the developer has to deal with. It does this two ways. First is algorithmic and the second is how Git changes your workflow.

Git allows for, and relies on, three way merging. CVS/SVN by contrast only does two way. Three way merging diffs the original file and each of the two conflicting versions of that file. Git can do so because it stores the whole files, not just a long string of diffs (ala CVS/SVN which I'll just call SVN from now on). Three way merges allow the merge algorithm to look at the context of the text being diffed. SVN can only look at line 245 and see it changed. Git can notice that the line didn't change, it mearly moved. Say you and the conflicting file inserted a text before the same line 245. So there is no conflict just code motion in the file. Three way diffs see this, two way diffs can't.

The second way Git deals with merge conflicts is not as obvious; it stems from the work flow. Specifically, branching is inherent in Git's design and merging is easier and better. So you branch often and merge often. More to the point you start your branch from the central "master" branch for your feature add; change & commit a bunch of times to that small branch; then pull or push your change back into "master". By repeating this branch-change-merge-push cycle early and often you have less opportunity for merge conflicts. Put another way, your branch exists for less time and syncs with the central master more often, resulting in less opportunity for two people to introduce genuine conflicts.

So alot of talk is generated about Git working in a fully distributed way. However, that ignores that the Git, encourages syncing up those distributed merges early and often.

So imagine you replace SVN with Git. First you don't have to do it to your whole tree at once. You can do it by subsystem. Also there is Git-SVN gateway tools to help do this. You can still have the central tree ala SVN, but you just branch off it and merge back more often. In SVN you are usually doing this anyways. Each SVN commits end up being after the feature is completed; much like the final merge of a Git branch, but without all the Git commits in the mean time. Or you create a longer lived development branch and merge that back as a monster merge.

I've explored other SVN branching/merging workflows but all they do push the pain to a different part of the process. A smart colleague of mine called it "squeezing the balloon". What you need to do is keep the master branch stable. That can be achieved by branching and merging smaller and faster. The SCM (or VCS if you prefer) makes all the difference.

What also helps is having a test suite you can rely on. First the test suite has to be aimed at the internal API level. Second, it has to be easy and fast to run. This is where the choice of Test framework dove tails with the SCM choice. "Early and Often" is the catch phrase for both.

Testing the API's "contract" is where you should aim your testing framework. As a side note: you are forced to state what that "contract" is which is a level of internal documentation that often gets forgotten. The contract is what each function takes as input and what the results should be; especially the edge cases. Edge cases are given a function "sum" adds all the elements of an array, what if the function is given an empty array or a nil as input. So that is the trivial case. Then there is the out-of-bounds case, like each element of an IP address (dot-quad) is between 0(inclusive) and 256(exclusive). Correctness should be tested if you can. Most of the time, correctness can't be tested with out recreating the logic of the function or testing pre-canned input and results. But that leads of who-tests-the-tester (my favorite quote: "Qui custodiet ipsos custodes?")

Another variation of this "contract" is how it applied to methods of a object. Methods have input and return results, but they may alter the objects state. Input and result have to be tested as above, edge case and out-of-bounds case. But internal state? One internal state to test is internal consistency. Again there are edge cases of initialized(trivial states) and known inconsistent states.

RSpec is a good ruby oriented tool-set built explicitly to test the contracts in your APIs. What are other API test frameworks for other languages is left as an exercise for the reader :)

Thursday, July 14, 2011

Little Idiom of Ruby I like

h = Hash.new { |h,k| h[k] = [] }
h['foo'] << "a"
h['bar'].push "b"
puts h.inspect

outputs {"foo"=>["a"], "bar"=>["b"]}. In other words, we declare a hash with a constructor that sets each new key to have a value of a empty array. Otherwise, we would have to test each hash key to see if it was already initialized to an array, if not initialized we'd set that hash key's value to an empty array. It's a common thing to do, but Ruby makes it easy and automagic.

I suppose in perl you can rely on auto vivification.

push @{$h{'foo'}}, "a";

But as you can imagine the Ruby idiom can be generalized to more complicated initializations.

TextMate clone for winbloze

For anyone who may care there is a TextMate act-a-like called E Text Editor. I supports TextMate bundles and key strokes. I haven't used it. But I imagine it must be manna from heaven if you are a TextMate user banished to the wilds of Microsoft-land.

Wednesday, July 13, 2011

iTerm2 Does Not Suck

Ringing endorsement eh?

Well I've been using iTerm2 for a while instead of the default Terminal. Don't confuse iTerm2 with iTerm. While, I am confused about that, I haven't spent anytime figuring out the diff.

The first problem with iTerm was the colors. It has a very good color palate (MacOSX builtin I presume) and a option in the Pref panel that will send you off to a page of Preset Palets. I was happy to find one that approximated the MacOSX Terminal colors I had come to like. With that color pallet installed I moved over to my next issue.

AntiqueWhite background w/ black foreground. On a PC running Linux AntiqueWhite out of rgb.txt was fine. I entered the same RGB codes for AntiqueWhite into Terminal Prefs and was happy. Entered the same codes into iTerm2 Prefs and not so easy to find. But selected the Emacs pallet and AntiqueWhite2 seemed to give me the happys.

I am transitioning from Emacs, screen. Even my screen command control character is C-o not the default C-a, because, duh, C-a is beginning of line in Emacs. But screen doesn't do top-bottom splits. After a friend suggested iTerm2 I gave it a look then went back to Terminal/screen.

The big change came when I was writing very wide log lines to the terminal. I needed a terminal with full screen width. Given screen wouldn't give me top-bottom splits, I tried iTerm2 again. The commands were easy to learn ⌘-t for new tabs; and ⌘-[ and ⌘-] to go back/forth between panes; and ⌘-⇧-[ and ⌘-⇧-] to got back/forth between tabs.

The scrolling is easy: just mouse scroll up/down in the window; no worries about the lack of scroll bar.

All it all, I've definitely moved from emacs/screen to TextMate/iTerm2 and I think I am more productive because of all the little things I gain. I just feel like all the key strokes programmed into the nerves of my hands are being wasted. I don't even remember alot of the emacs commands. I just pull up emacs, do the command, and look at what my hands do. Then I can say "oh yeah. That's C-space hilite region C-x r t blah blah blah.

NOTE: the unicode characters above, ⌘ and ⇧, may not display correctly on non-Mac OS's. I found them on this page