Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3901

Performance regression with doing Explain of COUNT(*) over 100K files

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0
    • Fix Version/s: 1.2.0
    • Component/s: None
    • Labels:
      None

      Description

      We are seeing a performance regression when doing an Explain of SELECT COUNT over 100K files in a flat directory (no subdirectories) on latest master branch compared to a run that was done on Sept 26. Some initial details (I will have more later):

      master branch on Sept 26
         No metadata cache: 71.452 secs
         With metadata cache: 15.804 secs
      
      Latest master branch 
         No metadata cache: 110 secs
         With metadata cache: 32 secs
      

      So, both cases show regression.

      Mehant Baid and I took an initial look at this and it appears we might be doing the directory expansion twice.

        Attachments

          Activity

            People

            • Assignee:
              amansinha100 Aman Sinha
              Reporter:
              amansinha100 Aman Sinha
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: