[HIVE-12724] ACID: Major compaction fails to include the original bucket files into MR job - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 1.3.0, 2.0.0
Fix Version/s: 1.3.0, 2.0.0
Component/s: Hive, Transactions
Labels:
None

Target Version/s:

1.3.0, 2.0.0

Description

How the problem happens:

Create a non-ACID table
Before non-ACID to ACID table conversion, we inserted row one
After non-ACID to ACID table conversion, we inserted row two
Both rows can be retrieved before MAJOR compaction

After MAJOR compaction, row one is lost

hive> USE acidtest;
OK
Time taken: 0.77 seconds
hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment STRING)
    > CLUSTERED BY (regionkey) INTO 2 BUCKETS
    > STORED AS ORC;
OK
Time taken: 0.179 seconds
hive> DESC FORMATTED t1;
OK
# col_name            	data_type           	comment

nationkey           	int
name                	string
regionkey           	int
comment             	string

# Detailed Table Information
Database:           	acidtest
Owner:              	wzheng
CreateTime:         	Mon Dec 14 15:50:40 PST 2015
LastAccessTime:     	UNKNOWN
Retention:          	0
Location:           	file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
Table Type:         	MANAGED_TABLE
Table Parameters:
	transient_lastDdlTime	1450137040

# Storage Information
SerDe Library:      	org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:        	org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:       	org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed:         	No
Num Buckets:        	2
Bucket Columns:     	[regionkey]
Sort Columns:       	[]
Storage Desc Params:
	serialization.format	1
Time taken: 0.198 seconds, Fetched: 28 row(s)
hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db;
Found 1 items
drwxr-xr-x   - wzheng staff         68 2015-12-14 15:50 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1
hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2015-12-14 15:51:58,070 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local73977356_0001
Loading data to table acidtest.t1
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 2.825 seconds
hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
Found 2 items
-rwxr-xr-x   1 wzheng staff        112 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000000_0
-rwxr-xr-x   1 wzheng staff        472 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000001_0
hive> SELECT * FROM t1;
OK
1	USA	1	united states
Time taken: 0.434 seconds, Fetched: 1 row(s)
hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true');
OK
Time taken: 0.071 seconds
hive> DESC FORMATTED t1;
OK
# col_name            	data_type           	comment

nationkey           	int
name                	string
regionkey           	int
comment             	string

# Detailed Table Information
Database:           	acidtest
Owner:              	wzheng
CreateTime:         	Mon Dec 14 15:50:40 PST 2015
LastAccessTime:     	UNKNOWN
Retention:          	0
Location:           	file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
Table Type:         	MANAGED_TABLE
Table Parameters:
	COLUMN_STATS_ACCURATE	false
	last_modified_by    	wzheng
	last_modified_time  	1450137141
	numFiles            	2
	numRows             	-1
	rawDataSize         	-1
	totalSize           	584
	transactional       	true
	transient_lastDdlTime	1450137141

# Storage Information
SerDe Library:      	org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:        	org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:       	org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed:         	No
Num Buckets:        	2
Bucket Columns:     	[regionkey]
Sort Columns:       	[]
Storage Desc Params:
	serialization.format	1
Time taken: 0.049 seconds, Fetched: 36 row(s)
hive> set hive.support.concurrency=true;
hive> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
hive> set hive.compactor.initiator.on=true;
hive> set hive.compactor.worker.threads=5;
hive> set hive.exec.dynamic.partition.mode=nonstrict;
hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
Found 2 items
-rwxr-xr-x   1 wzheng staff        112 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000000_0
-rwxr-xr-x   1 wzheng staff        472 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000001_0
hive> INSERT INTO TABLE t1 VALUES (2, 'Canada', 1, 'maple leaf');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2015-12-14 15:54:18,943 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1674014367_0002
Loading data to table acidtest.t1
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 1.995 seconds
hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
Found 3 items
-rwxr-xr-x   1 wzheng staff        112 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000000_0
-rwxr-xr-x   1 wzheng staff        472 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000001_0
drwxr-xr-x   - wzheng staff        204 2015-12-14 15:54 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/delta_0000007_0000007_0000
hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/delta_0000007_0000007_0000;
Found 2 items
-rw-r--r--   1 wzheng staff        214 2015-12-14 15:54 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/delta_0000007_0000007_0000/bucket_00000
-rw-r--r--   1 wzheng staff        797 2015-12-14 15:54 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/delta_0000007_0000007_0000/bucket_00001
hive> SELECT * FROM t1;
OK
1	USA	1	united states
2	Canada	1	maple leaf
Time taken: 0.1 seconds, Fetched: 2 row(s)
hive> ALTER TABLE t1 COMPACT 'MAJOR';
Compaction enqueued.
OK
Time taken: 0.026 seconds
hive> show compactions;
OK
Database	Table	Partition	Type	State	Worker	Start Time
Time taken: 0.022 seconds, Fetched: 1 row(s)
hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/;
Found 3 items
-rwxr-xr-x   1 wzheng staff        112 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000000_0
-rwxr-xr-x   1 wzheng staff        472 2015-12-14 15:51 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000001_0
drwxr-xr-x   - wzheng staff        204 2015-12-14 15:55 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/base_0000007
hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/base_0000007;
Found 2 items
-rw-r--r--   1 wzheng staff        222 2015-12-14 15:55 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/base_0000007/bucket_00000
-rw-r--r--   1 wzheng staff        802 2015-12-14 15:55 /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/base_0000007/bucket_00001
hive> select * from t1;
OK
2	Canada	1	maple leaf
Time taken: 0.396 seconds, Fetched: 1 row(s)
hive> select count(*) from t1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2015-12-14 15:56:20,277 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1720993786_0003
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
1
Time taken: 1.623 seconds, Fetched: 1 row(s)

Note, the cleanup doesn't kick in because the compaction fails already. The cleanup itself doesn't have any problem (at least not that we know of for this case).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-12724.1.patch
21/Dec/15 23:56
4 kB
Wei Zheng
HIVE-12724.2.patch
23/Dec/15 23:39
12 kB
Wei Zheng
HIVE-12724.3.patch
07/Jan/16 18:32
13 kB
Wei Zheng
HIVE-12724.branch-1.patch
13/Jan/16 20:49
13 kB
Wei Zheng
HIVE-12724.ADDENDUM.1.patch
14/Jan/16 19:18
2 kB
Wei Zheng
HIVE-12724.branch-1.2.patch
14/Jan/16 19:35
15 kB
Wei Zheng
HIVE-12724.4.patch
14/Jan/16 23:19
2 kB
Wei Zheng

Issue Links

is related to

HIVE-14366 Conversion of a Non-ACID table to an ACID table produces non-unique primary keys

Closed

HIVE-16177 non Acid to acid conversion doesn't handle _copy_N files

Closed

relates to

HIVE-13961 ACID: Major compaction fails to include the original bucket files if there's no delta directory

Closed

ACID: Major compaction fails to include the original bucket files into MR job

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates