Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3735

Directory pruning is not happening when number of files is larger than 64k

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      When the number of files is larger than 64k limit, directory pruning is not happening.
      We need to increase this limit further to handle most use cases.

      My proposal is to separate the code for directory pruning and partition pruning.
      Say in a parent directory there are 100 directories and 1 million files.
      If we only query the file from one directory, we should firstly read the 100 directories and narrow down to which directory; and then read the file paths in that directory in memory and do the rest stuff.

      Current behavior is , Drill will read all the file paths of that 1 million files in memory firstly, and then do directory pruning or partition pruning. This is not performance efficient nor memory efficient. And also it can not scale.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            amansinha100 Aman Sinha
            haozhu Hao Zhu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment