Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16282

Semijoin: Disable slow-start for the bloom filter aggregate task

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 3.0.0
    • None
    • None

    Description

      The slow-start of the bloom filter vertex is a scheduling problem which causes more pre-emption than is useful.

      When the bloom filters are arranged as follows

      Map 1(10 tasks)>Reducer 2(1 task)>Map 3(100 tasks)

      Map 3 and Map 1 are immediately active since Reducer 2 -> Map 3 is a broadcast edge.

      Once 3 tasks in Map 1 finish, the engine kills one active task from Map 3 to make room for Reducer 2.

      Attachments

        1. HIVE-16282.5.patch
          28 kB
          Deepak Jaiswal
        2. HIVE-16282.4.patch
          28 kB
          Deepak Jaiswal
        3. HIVE-16282.3.patch
          28 kB
          Deepak Jaiswal
        4. HIVE-16282.2.patch
          9 kB
          Deepak Jaiswal
        5. HIVE-16282.1.patch
          9 kB
          Deepak Jaiswal
        6. extended plan.rtf
          18 kB
          Deepak Jaiswal

        Activity

          People

            djaiswal Deepak Jaiswal
            gopalv Gopal Vijayaraghavan
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: