Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3901

Performance regression with doing Explain of COUNT(*) over 100K files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.1.0
    • 1.2.0
    • None
    • None

    Description

      We are seeing a performance regression when doing an Explain of SELECT COUNT over 100K files in a flat directory (no subdirectories) on latest master branch compared to a run that was done on Sept 26. Some initial details (I will have more later):

      master branch on Sept 26
         No metadata cache: 71.452 secs
         With metadata cache: 15.804 secs
      
      Latest master branch 
         No metadata cache: 110 secs
         With metadata cache: 32 secs
      

      So, both cases show regression.

      mehant and I took an initial look at this and it appears we might be doing the directory expansion twice.

      Attachments

        Activity

          People

            amansinha100 Aman Sinha
            amansinha100 Aman Sinha
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: