Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-281

Statistic and Filter need a mechanism to get customized comparator from high layer user

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      As discussed in HIVE-10254, we might need a customized comparator from high layer user for generating statistic when writing and applying filter when reading.

      The problem is that (use Decimal type in Hive as an example):
      Decimal in Hive is mapped to Binary in Parquet. When using predicate and statistic to filter values, comparing Binary values in Parquet cannot reflect the correct relationship of Decimal values in Hive. This type mapping causes 2 problems:
      1. When writing Decimal column, Binary.compareTo() is used to judge and set the column statistic (min, max). The generated statistic value is not correct from a Decimal perspective.
      2. When reading with Predicate (also Filter), in which the expected Decimal value is converted to Binary type, Binary.compareTo() is used to compare the expected value and column statistic value. They are Binary perspective, and also the result is not right.

      We could add an interface for customized comparator, and high level user like Hive provides the comparator to Parquet, since Hive knows how to decode the binary to Decimal and compare. Then Parquet could switch between customized and original comparison method.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dongc Dong Chen
            dongc Dong Chen

            Dates

              Created:
              Updated:

              Slack

                Issue deployment