XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • Frontend
    • ghx-label-14

    Description

      When Trino writes Puffin stats for a column, it includes the NDV as a property in the "statistics" section of the metadata.json file, in addition to the Theta sketch in the Puffin file. When we are only reading the stats and not writing/updating them, it would be enough to read this property if it is present.

      An example of the "statistics" section:

      "statistics" : [ {
          "snapshot-id" : 1226095104912303892,
          "statistics-path" : "hdfs://localhost:20500/test-warehouse/iceberg_test/hadoop_catalog/ice/iceberg_query_metadata/metadata/20240829_112839_00004_p6sck-7f433a45-607b-4561-89a3-fc4c58ef60d9.stats",
          "file-size-in-bytes" : 306,
          "file-footer-size-in-bytes" : 257,
          "blob-metadata" : [ {
            "type" : "apache-datasketches-theta-v1",
            "snapshot-id" : 1226095104912303892,
            "sequence-number" : 4,
            "fields" : [ 1 ],
            "properties" : {
              "ndv" : "2"
            }
          } ]
        } ]

      Attachments

        Issue Links

          Activity

            People

              daniel.becker Daniel Becker
              daniel.becker Daniel Becker
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: