Details
Description
Create a text formatted table with a int column partitioned by a string column.
After replacing the columns of int with string and inserting a new row with the existing partition ('horton' in this case), null is inserted as a value in the altered column:
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> drop table if exists test; No rows affected (0.249 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> create table test (i int) partitioned by (s string); No rows affected (0.116 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test partition (s) values(1, 'horton'); INFO : Session is already open INFO : Dag name: insert into table test partition...'horton')(Stage-1) INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1464727816747_0759) INFO : Map 1: 0/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 1/1 INFO : Loading data to table default.test partition (s=null) from hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-03_21-09-46_619_6747668726633461908-940/-ext-10000 INFO : Time taken for load dynamic partitions : 122 INFO : Loading partition {s=horton} INFO : Time taken for adding to write entity : 0 INFO : Partition default.test{s=horton} stats: [numFiles=1, numRows=1, totalSize=2, rawDataSize=1] No rows affected (8.301 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test; +---------+---------+--+ | test.i | test.s | +---------+---------+--+ | 1 | horton | +---------+---------+--+ 1 row selected (0.184 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> alter table test replace columns (i string); No rows affected (0.138 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test partition (s) values ('horton', 'horton'); INFO : Session is already open INFO : Dag name: insert into table test partition...'horton')(Stage-1) INFO : INFO : Map 1: 1/1 INFO : Loading data to table default.test partition (s=null) from hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-03_21-09-55_295_8631413609124947877-940/-ext-10000 INFO : Time taken for load dynamic partitions : 190 INFO : Loading partition {s=horton} INFO : Time taken for adding to write entity : 0 INFO : Partition default.test{s=horton} stats: [numFiles=2, numRows=2, totalSize=9, rawDataSize=7] No rows affected (1.35 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test; +---------+---------+--+ | test.i | test.s | +---------+---------+--+ | 1 | horton | | NULL | horton | +---------+---------+--+ 2 rows selected (0.08 seconds)
The below is explain of insertion:
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> explain insert into table test partition (s) values ('horton', 'horton'); +-------------------------------------------------------------------------------------------------------+--+ | Explain | +-------------------------------------------------------------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-1 is a root stage | | Stage-2 depends on stages: Stage-1 | | Stage-0 depends on stages: Stage-2 | | Stage-3 depends on stages: Stage-0 | | | | STAGE PLANS: | | Stage: Stage-1 | | Tez | | DagId: hive_20160603211130_4262d739-5bc1-4be0-95c3-2b666f5db7b8:1323 | | Vertices: | | Map 1 | | Map Operator Tree: | | TableScan | | alias: values__tmp__table__17 | | Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: tmp_values_col1 (type: string), tmp_values_col2 (type: string) | | outputColumnNames: _col0, _col1 | | Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE | | File Output Operator | | compressed: false | | Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE | | table: | | input format: org.apache.hadoop.mapred.TextInputFormat | | output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | name: default.test | | | | Stage: Stage-2 | | Dependency Collection | | | | Stage: Stage-0 | | Move Operator | | tables: | | partition: | | s | | replace: false | | table: | | input format: org.apache.hadoop.mapred.TextInputFormat | | output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | name: default.test | | | | Stage: Stage-3 | | Stats-Aggr Operator | | | +-------------------------------------------------------------------------------------------------------+--+
The issue is NOT seen with orc table.
Also I tried with another table partitioned by timestamp, but the issue was NOT seen:
0: jdbc:hive2://ts-0531-1.openstacklocal:2181> drop table if exists test; No rows affected (0.266 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> create table test (i int) partitioned by (ts timestamp); No rows affected (0.16 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test partition (ts) values (1, '2034-08-04 17:42:59.0'); INFO : Session is already open INFO : Dag name: insert into table test partit...17:42:59.0')(Stage-1) INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1464727816747_0760) INFO : Map 1: 0/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 1/1 INFO : Loading data to table default.test partition (ts=null) from hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-03_21-44-11_737_2024697703048602688-940/-ext-10000 INFO : Time taken for load dynamic partitions : 170 INFO : Loading partition {ts=2034-08-04 17:42:59.0} INFO : Time taken for adding to write entity : 1 INFO : Partition default.test{ts=2034-08-04 17:42:59.0} stats: [numFiles=1, numRows=1, totalSize=2, rawDataSize=1] No rows affected (5.029 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test; +---------+------------------------+--+ | test.i | test.ts | +---------+------------------------+--+ | 1 | 2034-08-04 17:42:59.0 | +---------+------------------------+--+ 1 row selected (0.158 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> alter table test replace columns (s string); No rows affected (0.14 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test partition (ts) values ('2', '2034-08-04 17:42:59.0'); INFO : Session is already open INFO : Dag name: insert into table test partit...17:42:59.0')(Stage-1) INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1464727816747_0760) INFO : Map 1: 0/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 0/1 INFO : Map 1: 1/1 INFO : Loading data to table default.test partition (ts=null) from hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-03_21-44-45_488_1120953773256187058-940/-ext-10000 INFO : Time taken for load dynamic partitions : 254 INFO : Loading partition {ts=2034-08-04 17:42:59.0} INFO : Time taken for adding to write entity : 1 INFO : Partition default.test{ts=2034-08-04 17:42:59.0} stats: [numFiles=2, numRows=2, totalSize=4, rawDataSize=2] No rows affected (7.127 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test; +---------+------------------------+--+ | test.s | test.ts | +---------+------------------------+--+ | 1 | 2034-08-04 17:42:59.0 | | 2 | 2034-08-04 17:42:59.0 | +---------+------------------------+--+ 2 rows selected (0.115 seconds)