[HIVE-18429] Compaction should handle a case when it produces no output - ASF JIRA

Log work

Agile Board

Rank to Top

Rank to Bottom

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Move

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.0
Fix Version/s: 3.0.0
Component/s: Transactions
Labels:
None

Target Version/s:

3.0.0

Description

Suppose we start with empty delta_8_8 and delta_9_9 and compaction runs.
It will currently produce an MR job with 0 splits and so CompactorMR.TMP_LOCATION never gets created. This causes CompactorOutputCommitted.commitJob() to fail when it tries to do
FileStatus[] contents = fs.listStatus(tmpLocation); since tmpLocation doesn't exist.

If compactor fails to produce delta_8_9 here it will fail to do further compaction unless new delta with data is created.

If the number of empty deltas is > than HiveConf.ConfVars.COMPACTOR_MAX_NUM_DELTA, compaction will not be able to proceed at all.

It should produce a delta_8_9 in this case even if it's empty.

The error (in the log of standalone metastore process) would look like this

2017-12-27 17:19:28,850 ERROR CommitterEvent Processor #1 org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Could not commit job
java.io.FileNotFoundException: File hdfs://OTCHaaS/apps/hive/warehouse/momi.db/sensor_data/babyid=5911806ebf69640100004257/_tmp_b4c5a3f3-44e5-4d45-86af-5b773bf0fc96 does not exist.
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:923)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:985)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:981)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:992)
at rg.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorOutputCommitter.commitJob(CompactorMR.java:785)
at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291)
at  org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-18429.03.patch
12/Jan/18 01:21
7 kB
Eugene Koifman
HIVE-18429.02.patch
11/Jan/18 02:44
7 kB
Eugene Koifman
HIVE-18429.01.patch
11/Jan/18 02:23
6 kB
Eugene Koifman

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Eugene Koifman Assign to me

Reporter:: Eugene Koifman

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 10/Jan/18 23:45

Updated:: 23/May/18 02:13

Resolved:: 16/Jan/18 20:48

Agile

View on Board

Compaction should handle a case when it produces no output

Details

Description

Attachments

Attachments

Activity

People

Dates

Agile

Slack

Issue deployment