Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3667 Umbrella jira for Correlation Optimizer
  3. HIVE-5697

Correlation Optimizer may generate wrong plans for cases involving outer join

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.12.0, 0.13.0
    • 0.13.0
    • None
    • None

    Description

      For example,

      select x.key, y.value, count(*) from src x right outer join src1 y on (x.key=y.key and x.value=y.value) group by x.key, y.value; 
      

      Correlation optimizer will determine that a single MR job is enough for this query. However, the group by key are from both left and right tables of the right outer join.

      We will have a wrong result like

      NULL		4
      NULL	val_165	1
      NULL	val_193	1
      NULL	val_265	1
      NULL	val_27	1
      NULL	val_409	1
      NULL	val_484	1
      NULL		1
      146	val_146	2
      150	val_150	1
      213	val_213	2
      NULL		1
      238	val_238	2
      255	val_255	2
      273	val_273	3
      278	val_278	2
      311	val_311	3
      NULL		1
      401	val_401	5
      406	val_406	4
      66	val_66	1
      98	val_98	2
      

      Rows with both x.key and y.value are null may not be grouped.

      Attachments

        1. HIVE-5697.1.patch
          2 kB
          Yin Huai
        2. HIVE-5697.2.patch
          18 kB
          Yin Huai

        Activity

          People

            yhuai Yin Huai
            yhuai Yin Huai
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: