Details
Description
How the problem happens:
- Create a non-ACID table
- Before non-ACID to ACID table conversion, we inserted row one
- After non-ACID to ACID table conversion, we inserted row two
- Both rows can be retrieved before MAJOR compaction
- After MAJOR compaction, row one is lost
hive> USE acidtest; OK Time taken: 0.77 seconds hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment STRING) > CLUSTERED BY (regionkey) INTO 2 BUCKETS > STORED AS ORC; OK Time taken: 0.179 seconds hive> DESC FORMATTED t1; OK # col_name data_type comment nationkey int name string regionkey int comment string # Detailed Table Information Database: acidtest Owner: wzheng CreateTime: Mon Dec 14 15:50:40 PST 2015 LastAccessTime: UNKNOWN Retention: 0 Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 Table Type: MANAGED_TABLE Table Parameters: transient_lastDdlTime 1450137040 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets: 2 Bucket Columns: [regionkey] Sort Columns: [] Storage Desc Params: serialization.format 1 Time taken: 0.198 seconds, Fetched: 28 row(s) hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db; Found 1 items drwxr-xr-x - wzheng staff 68 2015-12-14 15:50 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1 hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states'); WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases. Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 2 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Job running in-process (local Hadoop) 2015-12-14 15:51:58,070 Stage-1 map = 100%, reduce = 100% Ended Job = job_local73977356_0001 Loading data to table acidtest.t1 MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 2.825 seconds hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; Found 2 items -rwxr-xr-x 1 wzheng staff 112 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000000_0 -rwxr-xr-x 1 wzheng staff 472 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000001_0 hive> SELECT * FROM t1; OK 1 USA 1 united states Time taken: 0.434 seconds, Fetched: 1 row(s) hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true'); OK Time taken: 0.071 seconds hive> DESC FORMATTED t1; OK # col_name data_type comment nationkey int name string regionkey int comment string # Detailed Table Information Database: acidtest Owner: wzheng CreateTime: Mon Dec 14 15:50:40 PST 2015 LastAccessTime: UNKNOWN Retention: 0 Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE false last_modified_by wzheng last_modified_time 1450137141 numFiles 2 numRows -1 rawDataSize -1 totalSize 584 transactional true transient_lastDdlTime 1450137141 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets: 2 Bucket Columns: [regionkey] Sort Columns: [] Storage Desc Params: serialization.format 1 Time taken: 0.049 seconds, Fetched: 36 row(s) hive> set hive.support.concurrency=true; hive> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; hive> set hive.compactor.initiator.on=true; hive> set hive.compactor.worker.threads=5; hive> set hive.exec.dynamic.partition.mode=nonstrict; hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; Found 2 items -rwxr-xr-x 1 wzheng staff 112 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000000_0 -rwxr-xr-x 1 wzheng staff 472 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000001_0 hive> INSERT INTO TABLE t1 VALUES (2, 'Canada', 1, 'maple leaf'); WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases. Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 2 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Job running in-process (local Hadoop) 2015-12-14 15:54:18,943 Stage-1 map = 100%, reduce = 100% Ended Job = job_local1674014367_0002 Loading data to table acidtest.t1 MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 1.995 seconds hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; Found 3 items -rwxr-xr-x 1 wzheng staff 112 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000000_0 -rwxr-xr-x 1 wzheng staff 472 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000001_0 drwxr-xr-x - wzheng staff 204 2015-12-14 15:54 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/delta_0000007_0000007_0000 hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/delta_0000007_0000007_0000; Found 2 items -rw-r--r-- 1 wzheng staff 214 2015-12-14 15:54 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/delta_0000007_0000007_0000/bucket_00000 -rw-r--r-- 1 wzheng staff 797 2015-12-14 15:54 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/delta_0000007_0000007_0000/bucket_00001 hive> SELECT * FROM t1; OK 1 USA 1 united states 2 Canada 1 maple leaf Time taken: 0.1 seconds, Fetched: 2 row(s) hive> ALTER TABLE t1 COMPACT 'MAJOR'; Compaction enqueued. OK Time taken: 0.026 seconds hive> show compactions; OK Database Table Partition Type State Worker Start Time Time taken: 0.022 seconds, Fetched: 1 row(s) hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/; Found 3 items -rwxr-xr-x 1 wzheng staff 112 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000000_0 -rwxr-xr-x 1 wzheng staff 472 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000001_0 drwxr-xr-x - wzheng staff 204 2015-12-14 15:55 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/base_0000007 hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/base_0000007; Found 2 items -rw-r--r-- 1 wzheng staff 222 2015-12-14 15:55 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/base_0000007/bucket_00000 -rw-r--r-- 1 wzheng staff 802 2015-12-14 15:55 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/base_0000007/bucket_00001 hive> select * from t1; OK 2 Canada 1 maple leaf Time taken: 0.396 seconds, Fetched: 1 row(s) hive> select count(*) from t1; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases. Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Job running in-process (local Hadoop) 2015-12-14 15:56:20,277 Stage-1 map = 100%, reduce = 100% Ended Job = job_local1720993786_0003 MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 1 Time taken: 1.623 seconds, Fetched: 1 row(s)
Note, the cleanup doesn't kick in because the compaction fails already. The cleanup itself doesn't have any problem (at least not that we know of for this case).
Attachments
Attachments
Issue Links
- is related to
-
HIVE-14366 Conversion of a Non-ACID table to an ACID table produces non-unique primary keys
- Closed
-
HIVE-16177 non Acid to acid conversion doesn't handle _copy_N files
- Closed
- relates to
-
HIVE-13961 ACID: Major compaction fails to include the original bucket files if there's no delta directory
- Closed