Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7159

For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.14.0
    • None
    • None

    Description

      A join B on A.x = B.y
      can be transformed to
      (A where x is not null) join (B where y is not null) on A.x = B.y

      Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data.

      Thanks to gopalv for the analysis and coming up with the solution.

      Attachments

        1. HIVE-7159.1.patch
          3 kB
          Harish Butani
        2. HIVE-7159.10.patch
          3.39 MB
          Gunther Hagleitner
        3. HIVE-7159.11.patch
          3.39 MB
          Gunther Hagleitner
        4. HIVE-7159.2.patch
          3 kB
          Gunther Hagleitner
        5. HIVE-7159.3.patch
          2.43 MB
          Gunther Hagleitner
        6. HIVE-7159.4.patch
          2.41 MB
          Gunther Hagleitner
        7. HIVE-7159.5.patch
          2.79 MB
          Gunther Hagleitner
        8. HIVE-7159.6.patch
          3.09 MB
          Gunther Hagleitner
        9. HIVE-7159.7.patch
          3.09 MB
          Harish Butani
        10. HIVE-7159.8.patch
          3.34 MB
          Harish Butani
        11. HIVE-7159.9.patch
          3.34 MB
          Harish Butani
        12. HIVE-7159.addendum.patch
          7 kB
          Gunther Hagleitner

        Issue Links

          Activity

            People

              rhbutani Harish Butani
              rhbutani Harish Butani
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: