Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16241

When PTF, explode and join are used together, result is duplicated

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.1.0
    • None
    • None

    Description

      Example is the bellow. Each subquery 'key' column is unique. But when they are joined on 'key' column, a result is duplicated.

      CREATE TABLE test (
        key   STRING,
        type  STRING,
        value INT
      ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
      
      LOAD DATA LOCAL INPATH '/tmp/test.gz' OVERWRITE INTO TABLE test;
      
      SELECT * FROM test;
      
      A	type1	30000
      B	type2	20000
      C	type2	5000
      
      SELECT l.*
      FROM (
        SELECT * FROM test LATERAL VIEW explode(ARRAY(key)) e AS dammy
      ) l JOIN (
          SELECT key, rank() OVER (PARTITION BY type ORDER BY value DESC) rank 
          FROM test
      ) r ON l.key = r.key
      
      A	type1	30000	A
      A	type1	30000	A
      B	type2	20000	B
      B	type2	20000	B
      C	type2	5000	C
      C	type2	5000	C
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            iijima_satoshi Satoshi Iijima
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: