Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-7829

storage partition stats index can not effert in data skipping

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 1.0.0
    • spark-sql
    • None

    Description

      partition stats will not effort, the current implementation does not seem to achieve the effect of partition filtering.

      • first

      in this picture, I change the ut filter to trigger partition stats index.

      partition_stats will not save fileName, so if reuse `CSI` logical, it will throw null point in group by key

      and this will cause skip other index

      • second

      and have a question, I am not sure this pr is use to `partition` purge like physical partition col, which mean use other field min/max to get which physical partitions to list fileSlice. or filter fileName like `CSI`, `RLI`.
      thanks

      Attachments

        1. image-2024-06-05-16-32-02-293.png
          1.09 MB
          KnightChess
        2. image-2024-06-05-16-31-44-871.png
          877 kB
          KnightChess
        3. image-2024-06-05-16-30-50-503.png
          963 kB
          KnightChess

        Issue Links

          Activity

            People

              codope Sagar Sumit
              KnightChess KnightChess
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: