Currently the parquet-mr RecordReader/ParquetFileReader exposes API’s to read parquet file in columnar fashion or record-by-record.
It will be great to extend them to also support rowPosition API which can tell the position of the current record in the parquet file.
The rowPosition can be used as a unique row identifier to mark a row. This can be useful to create an index (e.g. B+ tree) over a parquet file/parquet table (e.g. Spark/Hive).
There are multiple projects in the parquet eco-system which can benefit from such a functionality: