Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: vectorization-branch
    • Fix Version/s: vectorization-branch, 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      Text and Orc returning: 1.4499999
      Vectorized Orc Returning: 0.1

      drop table LINEITEM_ORC;
      create external table LINEITEM_ORC(L_DISCOUNT float ) 
      ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
                  STORED AS
                      INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.CommonOrcInputFormat'
                      OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
      
      SELECT max(l_discount)
      FROM   LINEITEM_ORC
      
      1. hive-4686.patch.1.txt
        50 kB
        Remus Rusanu
      2. max_data.zip
        4.31 MB
        Tony Murphy

        Issue Links

          Activity

          Hide
          Tony Murphy added a comment -

          i can provide a repro environment if necessary.

          Show
          Tony Murphy added a comment - i can provide a repro environment if necessary.
          Hide
          Remus Rusanu added a comment -

          Similar to HIVE-4612. The emitted aggregate values for MIN/MAX are incorrect. They should be the same writable type as the input expression.

          Show
          Remus Rusanu added a comment - Similar to HIVE-4612 . The emitted aggregate values for MIN/MAX are incorrect. They should be the same writable type as the input expression.
          Hide
          Remus Rusanu added a comment -

          Not sure about the 1.4999 vs. 0.1 in the comment, but here is the data:

          in the txt file, using awk, I get:
          [root@sandbox ~]# cat /tmp/data.txt | awk 'NR == 1

          { max=$1; min=$1; sum=0 } { if ($1>max) max=$1; if ($1<min) min=$1; sum+=$1;}

          END

          {printf "Min: %f\tMax: %f\tAverage: %f\tSum: %f\tNR: %d\n", min, max, sum/NR, sum, NR}

          '
          Min: 0.000000 Max: 0.100000 Average: 0.049999 Sum: 300057.330001 NR: 6001215

          txt table:
          select min(discount), max(discount), count(discount) from hive_4686_txt;
          0.0 0.1 6001215 300057.3304087687

          orc table:
          0.0 0.1 6001215 300057.3304087687

          vectorized orc, no patch:
          0.0 0.0 63 -3.1554556579462614E-31

          vectorized orc, patch applied:
          0.0 0.1 6001215 300057.3304087687

          Given that the patch gives the same results as txt, orc non-vectorized and awk I assume the fix is correct. Perhaps the data is a different subset from what you tested with, Tony?

          Show
          Remus Rusanu added a comment - Not sure about the 1.4999 vs. 0.1 in the comment, but here is the data: in the txt file, using awk, I get: [root@sandbox ~] # cat /tmp/data.txt | awk 'NR == 1 { max=$1; min=$1; sum=0 } { if ($1>max) max=$1; if ($1<min) min=$1; sum+=$1;} END {printf "Min: %f\tMax: %f\tAverage: %f\tSum: %f\tNR: %d\n", min, max, sum/NR, sum, NR} ' Min: 0.000000 Max: 0.100000 Average: 0.049999 Sum: 300057.330001 NR: 6001215 txt table: select min(discount), max(discount), count(discount) from hive_4686_txt; 0.0 0.1 6001215 300057.3304087687 orc table: 0.0 0.1 6001215 300057.3304087687 vectorized orc, no patch: 0.0 0.0 63 -3.1554556579462614E-31 vectorized orc, patch applied: 0.0 0.1 6001215 300057.3304087687 Given that the patch gives the same results as txt, orc non-vectorized and awk I assume the fix is correct. Perhaps the data is a different subset from what you tested with, Tony?
          Show
          Remus Rusanu added a comment - https://reviews.apache.org/r/11836/
          Hide
          Ashutosh Chauhan added a comment -

          Committed to branch. Thanks, Remus!

          Show
          Ashutosh Chauhan added a comment - Committed to branch. Thanks, Remus!

            People

            • Assignee:
              Remus Rusanu
              Reporter:
              Tony Murphy
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development