Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4730

Join on more than 2^31 records on single reducer failed (wrong results)

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None

      Description

      join on more than 2^31 rows leads to wrong results. for example:

      Create table small_table (p1 string) ROW FORMAT DELIMITED LINES TERMINATED BY '\n';
      Create table big_table (p1 string) ROW FORMAT DELIMITED LINES TERMINATED BY '\n';

      Loading 1 row to small_table (the value 1).
      Loading 2149580800 rows to big_table with the same value (1 on this case).

      create table output as select a.p1 from big_table a join small_table b on (a.p1=b.p1);

      select count from output ; will return only 1 row...

      the reducer syslog:
      ...
      2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 2147000000 rows: used memory = 32925960
      2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 2148000000 rows: used memory = 12815184
      2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 2149000000 rows: used memory = 26684552 <-- looks like wrong value..
      ...
      2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 rows: used memory = 17715896
      2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished. closing...
      2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded 1 rows
      2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
      2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished. closing...
      2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded 1 rows
      2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 finished. closing...
      2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 forwarded 0 rows
      2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1
      2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done
      2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done

        Attachments

        1. HIVE-4730.D11283.1.patch
          4 kB
          Phabricator
        2. HIVE-4730.D11283.2.patch
          3 kB
          Phabricator

          Issue Links

            Activity

              People

              • Assignee:
                navis Navis Ryu
                Reporter:
                gabik Gabi Kazav
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: