Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8005

Randomize partitioning exchanges destinations

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: Impala 3.1.0
    • Fix Version/s: None
    • Component/s: Distributed Exec
    • Labels:

      Description

      Currently, we use the same hash seed for partitioning exchanges at the sender. For a table with skew in distribution in the shuffling keys, multiple queries using the same shuffling keys for exchanges will end up hashing to the same destination fragments running on particular host and potentially overloading that host.

      We should consider using the query id or other query specific information to seed the hashing function to randomize the destinations for different queries. Thanks to Todd Lipcon for pointing this problem out.

        Attachments

          Activity

            People

            • Assignee:
              anuragmantri Anurag Mantripragada
              Reporter:
              kwho Michael Ho
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: