Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6567

Dataload performance regression due to slow invalidate metadata

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 2.12.0
    • Impala 2.12.0
    • Frontend
    • None

    Description

      Recent GVO builds intermittently have a functional dataload of almost 2 hours when it used to be ~30-35 minutes:
      **

      02:12:15 Loading TPC-DS data (logging to /home/ubuntu/Impala/logs/data_loading/load-tpcds.log)...
      02:34:27 Loading workload 'tpch' using exploration strategy 'core' OK (Took: 22 min 12 sec)
      02:34:35 Loading workload 'tpcds' using exploration strategy 'core' OK (Took: 22 min 20 sec)
      04:11:40 Loading workload 'functional-query' using exploration strategy 'exhaustive' OK (Took: 119 min 25 sec)
      

       

      This has happened on multiple runs (including some in progress):

      https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1370/

      https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1382/

      https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1383/ (missing some logs due to abort)

      https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1384/ (in progress)

      https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/1385/ (in progress)

       

      Dataload creates a SQL script that invalidates each table created using an "invalidate metadata ${tablename}" command. There are 830 "invalidate metadata ${tablename}" calls in the invocation of this script (see IMPALA-6386 for why we do invalidate at the table level). Even so, this script should execute very quickly.

      The impalad.INFO from the 1370 run shows that this script is taking a long time. The first invalidate metadata for functional tables is at 2:41 and the last invalidate metadata for this run of the invalidate script is at 3:17. 

      The invalidate script runs twice. The second run begins at 3:19 and finishes at 4:11. 

       

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            alex.behm Alexander Behm
            joemcdonnell Joe McDonnell
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment