Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
ghx-label-14
Description
When Trino writes Puffin stats for a column, it includes the NDV as a property in the "statistics" section of the metadata.json file, in addition to the Theta sketch in the Puffin file. When we are only reading the stats and not writing/updating them, it would be enough to read this property if it is present.
An example of the "statistics" section:
"statistics" : [ { "snapshot-id" : 1226095104912303892, "statistics-path" : "hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/metadata/20240829_112839_00004_p6sck-7f433a45-607b-4561-89a3-fc4c58ef60d9.stats", "file-size-in-bytes" : 306, "file-footer-size-in-bytes" : 257, "blob-metadata" : [ { "type" : "apache-datasketches-theta-v1", "snapshot-id" : 1226095104912303892, "sequence-number" : 4, "fields" : [ 1 ], "properties" : { "ndv" : "2" } } ] } ]
Attachments
Issue Links
- is related to
-
IMPALA-13588 Update Puffin reading doc after IMPALA-13370
- In Progress