Hive
  1. Hive
  2. HIVE-3218

Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.10.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      drop table hive_test_smb_bucket1;
      drop table hive_test_smb_bucket2;
      
      create table hive_test_smb_bucket1 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
      create table hive_test_smb_bucket2 (key int, value string) partitioned by (ds string) clustered by (key) sorted by (key) into 2 buckets;
      
      set hive.enforce.bucketing = true;
      set hive.enforce.sorting = true;
      
      insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-14') select key, value from src;
      insert overwrite table hive_test_smb_bucket1 partition (ds='2010-10-15') select key, value from src;
      insert overwrite table hive_test_smb_bucket2 partition (ds='2010-10-15') select key, value from src;
      
      
      set hive.optimize.bucketmapjoin = true;
      set hive.optimize.bucketmapjoin.sortedmerge = true;
      set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
      
      SELECT /* + MAPJOIN(b) */ * FROM hive_test_smb_bucket1 a JOIN hive_test_smb_bucket2 b ON a.key = b.key;
      

      which make bucket join context..

      Alias Bucket Output File Name Mapping:
              hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000000_0 0
              hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-14/000001_0 1
              hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000000_0 0
              hdfs://localhost:9000/user/hive/warehouse/hive_test_smb_bucket1/ds=2010-10-15/000001_0 1
      

      fails with exception

      java.lang.RuntimeException: Hive Runtime Error while closing operators
      	at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:226)
      	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
      	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
      	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:416)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
      	at org.apache.hadoop.mapred.Child.main(Child.java:264)
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_task_tmp.-ext-10001/_tmp.000001_0 to: hdfs://localhost:9000/tmp/hive-navis/hive_2012-06-29_22-17-49_574_6018646381714861925/_tmp.-ext-10001/000001_0
      	at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.commit(FileSinkOperator.java:198)
      	at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.access$300(FileSinkOperator.java:100)
      	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:717)
      	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
      	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
      	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
      	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
      	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
      	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
      	at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
      	... 8 more
      
      1. HIVE-3218.1.patch.txt
        142 kB
        Navis
      2. hive.3218.2.patch
        166 kB
        Namit Jain

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          19d 15h 52m 3 Namit Jain 27/Jul/12 07:08
          Open Open Patch Available Patch Available
          10d 12h 46m 4 Namit Jain 30/Jul/12 06:13
          Patch Available Patch Available Resolved Resolved
          2h 9m 1 Namit Jain 30/Jul/12 08:22
          Resolved Resolved Closed Closed
          164d 12h 31m 1 Ashutosh Chauhan 10/Jan/13 19:53
          Ashutosh Chauhan made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Ashutosh Chauhan added a comment -

          This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

          Show
          Ashutosh Chauhan added a comment - This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
          HIVE-3218 Stream table of SMBJoin/BucketMapJoin with two or more
          partitions is not handled properly (Navis via namit) (Revision 1367012)

          Result = ABORTED
          namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1367012
          Files :

          • /hive/trunk/data/files/srcsortbucket1outof4.txt
          • /hive/trunk/data/files/srcsortbucket2outof4.txt
          • /hive/trunk/data/files/srcsortbucket3outof4.txt
          • /hive/trunk/data/files/srcsortbucket4outof4.txt
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/BucketMatcher.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecMapperContext.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketMapJoinOptimizer.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedMergeBucketMapJoinOptimizer.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/BucketMapJoinContext.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java
          • /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_1.q
          • /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_2.q
          • /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_3.q
          • /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_4.q
          • /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_1.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_2.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_3.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_4.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin5.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/stats11.q.out
          Show
          Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-3218 Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly (Navis via namit) (Revision 1367012) Result = ABORTED namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1367012 Files : /hive/trunk/data/files/srcsortbucket1outof4.txt /hive/trunk/data/files/srcsortbucket2outof4.txt /hive/trunk/data/files/srcsortbucket3outof4.txt /hive/trunk/data/files/srcsortbucket4outof4.txt /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/BucketMatcher.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecMapperContext.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketMapJoinOptimizer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedMergeBucketMapJoinOptimizer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/BucketMapJoinContext.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_1.q /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_2.q /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_3.q /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_4.q /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_1.q.out /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_2.q.out /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_3.q.out /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_4.q.out /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin5.q.out /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out /hive/trunk/ql/src/test/results/clientpositive/stats11.q.out
          Navis made changes -
          Fix Version/s 0.10.0 [ 12320745 ]
          Kevin Wilfong made changes -
          Link This issue breaks HIVE-3429 [ HIVE-3429 ]
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #1575 (See https://builds.apache.org/job/Hive-trunk-h0.21/1575/)
          HIVE-3218 Stream table of SMBJoin/BucketMapJoin with two or more
          partitions is not handled properly (Navis via namit) (Revision 1367012)

          Result = FAILURE
          namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1367012
          Files :

          • /hive/trunk/data/files/srcsortbucket1outof4.txt
          • /hive/trunk/data/files/srcsortbucket2outof4.txt
          • /hive/trunk/data/files/srcsortbucket3outof4.txt
          • /hive/trunk/data/files/srcsortbucket4outof4.txt
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/BucketMatcher.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecMapperContext.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketMapJoinOptimizer.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedMergeBucketMapJoinOptimizer.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/BucketMapJoinContext.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java
          • /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_1.q
          • /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_2.q
          • /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_3.q
          • /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_4.q
          • /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_1.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_2.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_3.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_4.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin5.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/stats11.q.out
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #1575 (See https://builds.apache.org/job/Hive-trunk-h0.21/1575/ ) HIVE-3218 Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly (Navis via namit) (Revision 1367012) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1367012 Files : /hive/trunk/data/files/srcsortbucket1outof4.txt /hive/trunk/data/files/srcsortbucket2outof4.txt /hive/trunk/data/files/srcsortbucket3outof4.txt /hive/trunk/data/files/srcsortbucket4outof4.txt /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/BucketMatcher.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultBucketMatcher.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecMapperContext.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketMapJoinOptimizer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedMergeBucketMapJoinOptimizer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/BucketMapJoinContext.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_1.q /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_2.q /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_3.q /hive/trunk/ql/src/test/queries/clientpositive/bucketcontext_4.q /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_1.q.out /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_2.q.out /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_3.q.out /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_4.q.out /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin5.q.out /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out /hive/trunk/ql/src/test/results/clientpositive/stats11.q.out
          Namit Jain made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Resolution Fixed [ 1 ]
          Hide
          Namit Jain added a comment -

          Committed. Thanks Navis

          Show
          Namit Jain added a comment - Committed. Thanks Navis
          Namit Jain made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Namit Jain made changes -
          Attachment hive.3218.2.patch [ 12538305 ]
          Hide
          Namit Jain added a comment -

          +1

          Show
          Namit Jain added a comment - +1
          Hide
          Navis added a comment -

          @Namin Jain, updated test file just now.

          Show
          Navis added a comment - @Namin Jain, updated test file just now.
          Hide
          Namit Jain added a comment -

          @Navis, can you update the test file ?
          Let us try to get this in.

          Show
          Namit Jain added a comment - @Navis, can you update the test file ? Let us try to get this in.
          Namit Jain made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Namit Jain added a comment -

          minor comments on phabricator

          Show
          Namit Jain added a comment - minor comments on phabricator
          Hide
          Navis added a comment -

          addressed comments

          Show
          Navis added a comment - addressed comments
          Hide
          Namit Jain added a comment -

          comments on phabricator

          Show
          Namit Jain added a comment - comments on phabricator
          Hide
          Namit Jain added a comment -

          I am missing something:

          srcbucket20.txt=[srcbucket20.txt]
          srcbucket21.txt=[srcbucket21.txt]
          srcbucket22.txt=[srcbucket20.txt]
          srcbucket23.txt=[srcbucket21.txt]
          ds=2008-04-09/srcbucket20.txt=[srcbucket20.txt]
          ds=2008-04-09/srcbucket21.txt=[srcbucket21.txt]
          ds=2008-04-09/srcbucket22.txt=[srcbucket20.txt]
          ds=2008-04-09/srcbucket23.txt=[srcbucket21.txt]

          The mapping is:

          small table alias -> big file table name -> list of small table file names

          Shouldn't the big table file name and the small table file names be fully qualified ?
          In the above example, bigtable file name is srcbucket20.txt and ds=2008-04-09/srcbucket20.txt. Why is it sometimes qualified by partition name and sometimes not ?

          Similarly, shouldn't the small table file name be fully qualified ?

          Show
          Namit Jain added a comment - I am missing something: srcbucket20.txt= [srcbucket20.txt] srcbucket21.txt= [srcbucket21.txt] srcbucket22.txt= [srcbucket20.txt] srcbucket23.txt= [srcbucket21.txt] ds=2008-04-09/srcbucket20.txt= [srcbucket20.txt] ds=2008-04-09/srcbucket21.txt= [srcbucket21.txt] ds=2008-04-09/srcbucket22.txt= [srcbucket20.txt] ds=2008-04-09/srcbucket23.txt= [srcbucket21.txt] The mapping is: small table alias -> big file table name -> list of small table file names Shouldn't the big table file name and the small table file names be fully qualified ? In the above example, bigtable file name is srcbucket20.txt and ds=2008-04-09/srcbucket20.txt. Why is it sometimes qualified by partition name and sometimes not ? Similarly, shouldn't the small table file name be fully qualified ?
          Hide
          Namit Jain added a comment -

          A M data/files/srcsbucket20.txt (118 lines) - -
          A M data/files/srcsbucket21.txt (120 lines) - -
          A M data/files/srcsbucket22.txt (124 lines) - -
          A M data/files/srcsbucket23.txt (138 lines)

          Is it sorted and bucketed ?

          If yes, can you change the names of these files to

          srcsortbucket1outof4.txt
          srcsortbucket2outof4.txt ..

          These files might be used for a lot of tests, so it might be a good idea to be clear about the names of these files.
          Sorry about being picky on the name of these files.

          Show
          Namit Jain added a comment - A M data/files/srcsbucket20.txt (118 lines) - - A M data/files/srcsbucket21.txt (120 lines) - - A M data/files/srcsbucket22.txt (124 lines) - - A M data/files/srcsbucket23.txt (138 lines) Is it sorted and bucketed ? If yes, can you change the names of these files to srcsortbucket1outof4.txt srcsortbucket2outof4.txt .. These files might be used for a lot of tests, so it might be a good idea to be clear about the names of these files. Sorry about being picky on the name of these files.
          Navis made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Navis added a comment -

          Rebased to trunk and added more comments

          Show
          Navis added a comment - Rebased to trunk and added more comments
          Namit Jain made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Namit Jain added a comment -

          comments

          Show
          Namit Jain added a comment - comments
          Hide
          Navis added a comment -

          For quries handling many partitions with many buckets, it would be possibly needed to use option 3 parallelly. I'm thinking it for another issue.

          Show
          Navis added a comment - For quries handling many partitions with many buckets, it would be possibly needed to use option 3 parallelly. I'm thinking it for another issue.
          Hide
          Namit Jain added a comment -

          I was thinking about the approaches 1,2,3.

          2 seems better, since 3 would mean 1 mapper would be processing multiple files.

          Show
          Namit Jain added a comment - I was thinking about the approaches 1,2,3. 2 seems better, since 3 would mean 1 mapper would be processing multiple files.
          Navis made changes -
          Link This issue blocks HIVE-3171 [ HIVE-3171 ]
          Navis made changes -
          Attachment HIVE-3218.1.patch.txt [ 12534088 ]
          Navis made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Navis made changes -
          Attachment HIVE-3218.1.patch.txt [ 12534484 ]
          Hide
          Navis added a comment -

          For BucketedMapJoin it just returns invalid result, which is worse than SMBJoin case. (bucketedmapjoin5.q test is broken)

          Show
          Navis added a comment - For BucketedMapJoin it just returns invalid result, which is worse than SMBJoin case. (bucketedmapjoin5.q test is broken)
          Navis made changes -
          Summary When big table has two or more partitions on SMBJoin it fails at runtime Stream table of SMBJoin/BucketMapJoin with two or more partitions is not handled properly
          Priority Minor [ 4 ] Critical [ 2 ]
          Navis made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Navis added a comment -

          This happens with bucket mapjoin, too. It would be better to fix it along with smb join.

          Show
          Navis added a comment - This happens with bucket mapjoin, too. It would be better to fix it along with smb join.
          Navis made changes -
          Attachment HIVE-3218.1.patch.txt [ 12534088 ]
          Navis made changes -
          Field Original Value New Value
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Navis added a comment -

          https://reviews.facebook.net/D3933

          Passed all ql/clientpositive tests. Complete test result will be updated next week.

          Show
          Navis added a comment - https://reviews.facebook.net/D3933 Passed all ql/clientpositive tests. Complete test result will be updated next week.
          Hide
          Navis added a comment -

          It's caused by having duplicated taskID for each partition of stream table. Three option is possible.
          1. Do not allow this
          2. Augment task-ID with partition spec
          3. Combine files with same bucket ID in BucketizedHiveInputFormat

          I've implemented option-2 and on testing. But option-3 seemed to be more safe (and easier).

          I think this is possibly happen with bucket mapjoin.

          Show
          Navis added a comment - It's caused by having duplicated taskID for each partition of stream table. Three option is possible. 1. Do not allow this 2. Augment task-ID with partition spec 3. Combine files with same bucket ID in BucketizedHiveInputFormat I've implemented option-2 and on testing. But option-3 seemed to be more safe (and easier). I think this is possibly happen with bucket mapjoin.
          Navis created issue -

            People

            • Assignee:
              Navis
              Reporter:
              Navis
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development