Hive
  1. Hive
  2. HIVE-6968

list bucketing feature does not update the location map for unpartitioned tables

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.0, 0.12.0, 0.13.0, 0.14.0
    • Fix Version/s: 0.14.0
    • Component/s: None
    • Labels:
      None

      Description

      list bucketing feature maintains a map of skewed columns/values to location in metastore. This map is not getting updated for unpartitioned tables. For partitioned tables the location map gets updated properly. To reproduce the issue

      hive>set hive.mapred.supports.subdirectories=true;
      hive>set mapred.input.dir.recursive=true;
      
      hive>create table t(col1 string, col2 string);
      hive>load  data local inpath '/home/hadoop/a.txt' into table t; 
      hive> select * from t;                                                                   
      OK
      1	a
      2	b
      3	c
      4	a
      5	b
      6	a
      
      hive>create tablet1(r1 string, r2 string) skewed by (r2) on (‘a’) stored as directories;
      hive>insert into table t1 select * from t;
      hive>desc extended t1;
      OK
      r1                  	string              	                    
      r2                  	string              	                    
      	 	 
      Detailed Table Information	Table(tableName:t1, dbName:default, owner:pjayachandran, createTime:1398295903, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:r1, type:string, comment:null), FieldSchema(name:r2, type:string, comment:null)], location:file:/app/warehouse/t1, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[r2], skewedColValues:[[a]], skewedColValueLocationMaps:{}), storedAsSubDirectories:true), partitionKeys:[], parameters:{numFiles=6, COLUMN_STATS_ACCURATE=true, transient_lastDdlTime=1398297887, numRows=6, totalSize=72, rawDataSize=18}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)	
      Time taken: 0.119 seconds, Fetched: 4 row(s)
      

      as seen from describe output skewedColValueLocationMaps is empty

      1. HIVE-6968.1.patch
        24 kB
        Prasanth Jayachandran
      2. HIVE-6968.2.patch
        24 kB
        Prasanth Jayachandran

        Issue Links

          Activity

          Hide
          Prasanth Jayachandran added a comment -

          Attaching RB link

          Show
          Prasanth Jayachandran added a comment - Attaching RB link
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12641642/HIVE-6968.1.patch

          ERROR: -1 due to 41 failed/errored test(s), 5419 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_numeric
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_list_bucket
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_14
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_6
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_7
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_8
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_test_outer
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullgroup3
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join4
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_dummy_source
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_symlink_text_input_format
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_column_list_bucket
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_current_database
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_1
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_10
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_12
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_13
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_14
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_19
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_2
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_20
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_21
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_22
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_23
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_24
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_4
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_5
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_7
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_8
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_9
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/32/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/32/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 41 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12641642

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12641642/HIVE-6968.1.patch ERROR: -1 due to 41 failed/errored test(s), 5419 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_numeric org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_map_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_test_outer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullgroup3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_dummy_source org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_symlink_text_input_format org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_column_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_current_database org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_19 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_21 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_24 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_9 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketizedhiveinputformat org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_dynamic_partitions_with_whitelist org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/32/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/32/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 41 tests failed This message is automatically generated. ATTACHMENT ID: 12641642
          Hide
          Prasanth Jayachandran added a comment -

          test failures are not related.

          Show
          Prasanth Jayachandran added a comment - test failures are not related.
          Hide
          Ashutosh Chauhan added a comment -

          +1

          Show
          Ashutosh Chauhan added a comment - +1
          Hide
          Prasanth Jayachandran added a comment -

          Refreshed patch against trunk and fix diff after HIVE-6979 changes.

          Show
          Prasanth Jayachandran added a comment - Refreshed patch against trunk and fix diff after HIVE-6979 changes.
          Hide
          Prasanth Jayachandran added a comment -

          Committed to trunk.

          Show
          Prasanth Jayachandran added a comment - Committed to trunk.
          Hide
          Thejas M Nair added a comment -

          This has been fixed in 0.14 release. Please open new jira if you see any issues.

          Show
          Thejas M Nair added a comment - This has been fixed in 0.14 release. Please open new jira if you see any issues.

            People

            • Assignee:
              Prasanth Jayachandran
              Reporter:
              Prasanth Jayachandran
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development