Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7069

Poor performance of transformBinaryInMetadataCache

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.15.0
    • 1.16.0
    • Metadata

    Description

      The performance of the method transformBinaryInMetadataCache scales poorly as the table's numbers of underlying files, row-groups and columns grow. This method is invoked during planning of every query using this table.

           A test on a table using 219 directories (each with 20 files), 1 row-group in each file, and 94 columns, measured about 1340 milliseconds.

          The main culprit are the version checks, which take place in every iteration (i.e., about 400k times in the previous example) and involve construction of 6 MetadataVersion objects (and possibly garbage collections).

           Removing the version checks from the loops improved this method's performance on the above test down to about 250 milliseconds.

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ben-zvi Boaz Ben-Zvi
            ben-zvi Boaz Ben-Zvi
            Vova Vysotskyi Vova Vysotskyi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment