Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4475

Compress ExecPlanFragment before shipping it to worker nodes to reduce network traffic

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 2.6.0
    • None
    • Distributed Exec

    Description

      Sending the ExecPlanFragment to remote nodes dominates the query startup time on clusters larger than 100 nodes, size of the ExecPlanFragment grows with number of tables, blocks and partitions in the table.

      On large cluster this is limits query throughput.

      From TPC-DS Q11 on 1K node cluster

          Query Timeline: 5m6s
             - Query submitted: 75.256us (75.256us)
             - Planning finished: 1s580ms (1s580ms)
             - Submit for admission: 2s376ms (795.652ms)
             - Completed admission: 2s377ms (1.512ms)
             - Ready to start 15993 fragment instances: 2s458ms (80.378ms)
             - First dynamic filter received: 2m35s (2m33s)
             - All 15993 fragment instances started: 2m35s (40.934ms)
             - Rows available: 4m53s (2m17s)
             - First row fetched: 4m53s (176.254ms)
             - Unregister query: 4m58s (4s828ms)
           - ComputeScanRangeAssignmentTimer: 600.086ms
      

      Attachments

        1. count_store_returns.txt.zip
          753 kB
          Mostafa Mokhtar
        2. slow_query_start_250K_partitions_134nodes.txt
          714 kB
          Mostafa Mokhtar

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: