If you're dealing with bigger avro files (>100MB) it would be nice to have a way to quickly count the amount of records contained within that file.
With the current state of avro-tools the only way to achieve this (to my current knowledge) is to dump the data to json and count the amount of records. For bigger files this might take a while due to the serialization overhead and since every record needs to be looked at.
I added a new tool which is optimized for counting records, it does not serialize the records and reads only the block count for each block.
This tool uses the HDFS API to handle files from any supported filesystem. I added the unit tests to the already existing TestDataFileTools since it provided convenient utility functions which I could reuse for my test scenarios.