Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9660

Distributed codegen

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Distributed Exec

    Description

      Another potential extension of IMPALA-5444 is that we can distribute the codegen work of different fragments across different backends. Today, each fragment will generate the same code on each backend server it's assigned to run on. This is mostly redundant work (except for scan nodes if different scan ranges correspond to different file formats). It would be great to consolidate the code generation work items among the backend servers and avoids redundant work. The codegen for a fragment (or an exec node if we allow ourselves to use multiple LLVM modules per fragment so as to allow parallel codegen for different exec nodes in a fragment) could be assigned to backend servers and the compiled code can be shipped to the backend Impalad servers when it's ready. Of course, this may involve some security issues as we have to trust the binary being shipped over. We may also need to take into account of the latency for shipping the code. However, this is potentially a huge saving in CPUs for queries with many fragments running on a huge cluster.

      Attachments

        Issue Links

          Activity

            People

              daniel.becker Daniel Becker
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: