Hive
  1. Hive
  2. HIVE-3945

union all datatype do not match may result wrong result

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.9.0
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None

      Description

      hive (default)> desc src;
      key string
      value string

      select key, value FROM
      ( select 'key' as key, 'value' as value – datatype: string, string
      from src s1 limit 1
      UNION ALL
      select s2.key as key, sum(s2.value) as value – datatype: strung, double
      from src s2 group by s2.key
      ) unionsrc;
      this query exec normally but has wrong result:
      key 2.4081029415476845E-282 – expected is 'value'
      35.0
      100 100.0
      48 0.0

      and sometimes when the string title is too long it may case ArrayIndexOutOfBoundsException:

      Caused by: java.lang.ArrayIndexOutOfBoundsException
      at java.lang.System.arraycopy(Native Method)
      at org.apache.hadoop.io.Text.set(Text.java:205)
      at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48)
      at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:216)
      at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:197)
      at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
      at org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:125)
      at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
      at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)

        Activity

        Hide
        Brock Noland added a comment -

        caofangkun can you clarify when you see the ArrayIndexOutOfBoundsException above?

        Show
        Brock Noland added a comment - caofangkun can you clarify when you see the ArrayIndexOutOfBoundsException above?
        Hide
        caofangkun added a comment -

        Hi Frankline Jose S
        "cast(sum(b.value) as String) as value" is a walkaround way to solve the problem.
        but do you think it's necessary that we'd better throw an exception like below in this situation ?
        Exception like:
        FAILED: Error in semantic analysis: Schema of both sides of union should match

        Show
        caofangkun added a comment - Hi Frankline Jose S "cast(sum(b.value) as String) as value" is a walkaround way to solve the problem. but do you think it's necessary that we'd better throw an exception like below in this situation ? Exception like: FAILED: Error in semantic analysis: Schema of both sides of union should match
        Hide
        Frankline Jose S added a comment -

        Hi Caofangkun,

        Query is slightly modified in ( alias b )table, typecasted summation value to String cast(sum(b.value) as String) as value

        Jose (key) - 2.1700977105416958E-287 (value)
        <==> After <==>
        Jose (key) - jose (value)

        table :- (key String, value String )
        input :-
        Jose - jose
        Arun - arun
        10 - 10
        10 - 100
        10.20 - 10.20
        33 - 35
        33 - 5

        • 22
          Query :-
          select c.key, c.value from (
          select a.key, a.value from uniontbl a limit 3
          union all
          select b.key, cast(sum(b.value) as String) as value from uniontbl b
          group by b.key )c

        output:-
        22.0
        10 110.0
        10.20 10.2
        33 40.0
        Arun 0.0
        Jose 0.0 – above records belongs to alias b ( b.key, sum( b.value) )

        Jose jose – below records belongs to alias a ( a.key, a.value )
        Arun arun
        10 10

        Show
        Frankline Jose S added a comment - Hi Caofangkun, Query is slightly modified in ( alias b )table, typecasted summation value to String cast(sum(b.value) as String) as value Jose (key) - 2.1700977105416958E-287 (value) <==> After <==> Jose (key) - jose (value) table :- (key String, value String ) input :- Jose - jose Arun - arun 10 - 10 10 - 100 10.20 - 10.20 33 - 35 33 - 5 22 Query :- select c.key, c.value from ( select a.key, a.value from uniontbl a limit 3 union all select b.key, cast(sum(b.value) as String) as value from uniontbl b group by b.key )c output:- 22.0 10 110.0 10.20 10.2 33 40.0 Arun 0.0 Jose 0.0 – above records belongs to alias b ( b.key, sum( b.value) ) Jose jose – below records belongs to alias a ( a.key, a.value ) Arun arun 10 10
        Hide
        caofangkun added a comment -

        Hi Frankline Jose S
        the problem is this query before UNION ALL clause is like this :

        select 'username' as key, 'age' as value from ...

        see the column values are Fixed strings .

        Example
        select key, value FROM
        ( select 'USERNAME' as key, 'AGE' as value – Hi Frankline Jose S , take a look at this
        from src s1 limit 1
        UNION ALL
        select s2.key as key, sum(s2.value) as value – datatype: strung, double
        from src s2 group by s2.key
        ) unionsrc;

        Show
        caofangkun added a comment - Hi Frankline Jose S the problem is this query before UNION ALL clause is like this : select 'username' as key, 'age' as value from ... see the column values are Fixed strings . Example select key, value FROM ( select 'USERNAME' as key, 'AGE' as value – Hi Frankline Jose S , take a look at this from src s1 limit 1 UNION ALL select s2.key as key, sum(s2.value) as value – datatype: strung, double from src s2 group by s2.key ) unionsrc;
        Hide
        Frankline Jose S added a comment -

        Hi, I just tried the following queries, Seems to work fine

        // Union all - combine the result-set of two or more select statement.
        Union-all allow duplicate values while matches in both tables

        Input
        33 35
        55 55.62
        20 44
        55 100
        44 44
        33 33
        33 55
        44 44
        33
        55 55
        55

        Query :>

        select c.key, c.value from (
        select a.key, a.value from uniontbl a where a.key='33'
        union all
        select b.key, sum( b.value) as value from uniontbl b where b.key='33'
        group by b.key
        )c;

        o/p:
        33 35.0
        33 33.0
        33 55.0
        33 NULL --> current and above line blongs to alia a ( a.key, a.value )
        33 123.0 --> belongs to alias b ( b.key, sum( b.value) )

        select c.key, c.value from (
        select a.key, a.value from uniontbl a where a.key='55'
        union all
        select b.key, sum( b.value) as value from uniontbl b where b.key='55'
        group by b.key )c

        o/p
        55 55.62
        55 100.0
        55 55.0 --> current and above line blongs to alia a ( a.key, a.value )
        55 210.62 --> belongs to alias b ( b.key, sum( b.value) )

        Can you give me an example scenario for this, in case I have misunderstood the problem.

        Show
        Frankline Jose S added a comment - Hi, I just tried the following queries, Seems to work fine // Union all - combine the result-set of two or more select statement. Union-all allow duplicate values while matches in both tables Input 33 35 55 55.62 20 44 55 100 44 44 33 33 33 55 44 44 33 55 55 55 Query :> select c.key, c.value from ( select a.key, a.value from uniontbl a where a.key='33' union all select b.key, sum( b.value) as value from uniontbl b where b.key='33' group by b.key )c; o/p: 33 35.0 33 33.0 33 55.0 33 NULL --> current and above line blongs to alia a ( a.key, a.value ) 33 123.0 --> belongs to alias b ( b.key, sum( b.value) ) select c.key, c.value from ( select a.key, a.value from uniontbl a where a.key='55' union all select b.key, sum( b.value) as value from uniontbl b where b.key='55' group by b.key )c o/p 55 55.62 55 100.0 55 55.0 --> current and above line blongs to alia a ( a.key, a.value ) 55 210.62 --> belongs to alias b ( b.key, sum( b.value) ) Can you give me an example scenario for this, in case I have misunderstood the problem.

          People

          • Assignee:
            Unassigned
            Reporter:
            caofangkun
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development