Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-42

Add HyperLogLog / CountMinSketch to parquet statistics

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • parquet-mr
    • None

    Description

      HLL and CMS for rowgroups could help with query planning (getting a sense of data skew) and with cheaply counting approximate distinct values. Both are commutative which means they can be combined across rowgroups (unlike an exact distinct count for example).

      Attachments

        Activity

          People

            Unassigned Unassigned
            alexlevenson Alex Levenson
            Votes:
            2 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated: