Looks cool! Having parsed way too much text myself, there's a few things I'm missing. Right now there doesn't seem to be much in the way of error and missing value handling (noticed none in the test case at least). To make this universally applicable (which would be the goal for o.a.c.lib, as opposed to contrib) we'd need a bit more support for dealing with crappy data.
At work we increment separate counters for each field that has an invalid value and a different counter for records that are completely broken. This helps a lot with monitoring data streams over time. Also, my experiences with Java 5 (I never re-measured this) was that throwing multiple exceptions per record when dealing with crapping data significantly slows down processing, even in situations when you think I/O bound should totally dominate. I've seen 600% increases in runtime in pathological situations (throwing exceptions was fast in Java 5, but creating the stack traces wasn't).
A few things from the nitpicking category: I'd move the inner classes to their own files to make things easier to read, maybe move implementations to an Extractors class (Guava style); the private stuff could be made package private. We could also use a package-info.java file for the javadocs and the
CRUNCH-97 marker is missing from the commit messages (you can squash all three commits together using "rebase -i", this lets you edit the messages, too).