Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-3717

Avoid double-listing w/in BaseHoodieTableFileIndex

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.13.0
    • None
    • None
    • 3

    Description

      Currently in `BaseHoodieTableFileIndex::loadPartitionPathFiles` essentially does file-listing twice: 

      • Once when `getAllQueryPartitionPaths` is invoked
      • Second time when `getFilesInPartitions` is invoked

       

      While this will not result in double-listing of the files on FS (b/c of `FIleStatusCache`, if any), this leads however to MT being queried twice: 

       

      Attachments

        1. Screen Shot 2022-03-25 at 7.14.20 PM.png
          581 kB
          Alexey Kudinkin
        2. Screen Shot 2022-03-25 at 7.05.43 PM.png
          606 kB
          Alexey Kudinkin
        3. Screen Shot 2022-03-25 at 7.05.09 PM.png
          617 kB
          Alexey Kudinkin

        Issue Links

          Activity

            People

              alexey.kudinkin Alexey Kudinkin
              alexey.kudinkin Alexey Kudinkin
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: