Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
2.5.0
-
None
-
None
Description
Provide a utility to determine HDFS file formats and compression types, akin to Linux's file utility.
There is no easy way to do this today, short of downloading a file and running Linux's file utility on it for at least some intelligence. Although, Linux's magic file does not contain any information to identify the leading bytes of Hadoop's common file formats, for example: 'S', 'E', 'Q' for SequenceFiles, or 'P', 'A', 'R', '1' for Parquet.