Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-2550

Add support to configure no of small files to consider with MOR

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Looks like in MOR, when an index is used which cannot index log files (which is the case for all out of box indexes in hudi), we just choose the smallest parquet file for every commit. So, over time, every file will grow to become fullest is the idea here. In other words, only one small file will be bin backed per commit even though there could be more. 

      source link

       

      We can add a config which can control the total number of files considered as small files for MOR table when index which cannot index log files are used. 

      We can leave the default value to 1 (current behavior). But for interested users, this should be flexible. 

       

      Original issue

      https://github.com/apache/hudi/issues/3676 

       

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            alexey.kudinkin Alexey Kudinkin
            shivnarayan sivabalan narayanan

            Dates

              Created:
              Updated:

              Slack

                Issue deployment