It would be nice to have an avro-tool that picks only some records from avro files.
I implemented a new avro-tool cat, which takes a list of avro files with identical schemas and concatenates them into a single file, with options to discard the first n records, to limit the output size and to collect records at a certain samplerate.
This tool allows a quicker peek into large avro files, e.g.:
The tool allows multiple input files or folders, in which case all files inside the folder will be used for input.
This tool uses the hadoop FileSystem api to handle files from any supported filesystem.