[HBASE-21211] Can't Read Partitions File - Partitions File deleted - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Critical
Resolution: Unresolved
Affects Version/s: 1.5.0
Fix Version/s: None
Component/s: None
Labels:
- bugfix
- patch
Environment:
Hide

HBase Version: 1.2.0-cdh5.11.1 (the line that deletes the file still exists)

hadoop version

Hadoop 2.6.0-cdh5.11.1

Subversion http://github.com/cloudera/hadoop -r b581c269ca3610c603b6d7d1da0d14dfb6684aa3

From source with checksum c6cbc4f20a8a571dd7c9f743984da1

This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.11.1.jar
Show
HBase Version: 1.2.0-cdh5.11.1 (the line that deletes the file still exists) hadoop version Hadoop 2.6.0-cdh5.11.1 Subversion http://github.com/cloudera/hadoop -r b581c269ca3610c603b6d7d1da0d14dfb6684aa3 From source with checksum c6cbc4f20a8a571dd7c9f743984da1 This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.11.1.jar

Flags:

Patch, Important
Docs Text:

Hide
Source URL:
https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java#L827

Steps to reproduce: There are multiple ways we can reproduce this problem:
Requirements:
A MapReduce job uses `HFileOutputFormat2.configureIncrementalLoad(job, table, locator);` for bulk loading instead of direct puts to HBase.
Steps:
Start the MapReduce Job from the terminal.
Press (Control + C) or cancel the terminal session.
We do this because it is not practical to leave the terminal session running the MapReduce job, especially when running on a dev machine.
Any of the mappers that start after this point will fail.
` Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)`
Process which kicks off the MapReduce job does the delete, We expect this because `filesystem.deleteOnExit` will delete the file when the FileSystem is closed.

Show
Source URL: https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java#L827 Steps to reproduce: There are multiple ways we can reproduce this problem: Requirements: A MapReduce job uses `HFileOutputFormat2.configureIncrementalLoad(job, table, locator);` for bulk loading instead of direct puts to HBase. Steps: Start the MapReduce Job from the terminal. Press (Control + C) or cancel the terminal session. We do this because it is not practical to leave the terminal session running the MapReduce job, especially when running on a dev machine. Any of the mappers that start after this point will fail. ` Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)` Process which kicks off the MapReduce job does the delete, We expect this because `filesystem.deleteOnExit` will delete the file when the FileSystem is closed.

Description

Hi team, we have a MapReduce job that uses the bulkload option instead of direct puts to import data e.g.,

HFileOutputFormat2.configureIncrementalLoad(job, table, locator);

However we have been running into a situation where partitions file is deleted by the termination of the JVM process, where JVM process kicks off the MapReduce job but it's also waiting to run the `configureIncrementalLoad` that executes the configurePartitioner.

Error: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)

We think the line#827 of HFileOutputFormat2 could be the root cause.

fs.deleteOnExit(partitionsPath);

We have created our custom HFileOutputFormat that doesn't delete the partitions file and have fixed the problem for our cluster. We propose that a cleanup method could be created which deletes the partitions file once all the mappers have finished.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

0001-do-not-delete-the-partitions-file-if-the-session-is-.patch
19/Sep/18 22:31
1 kB
KSHITIJ GAUTAM

Activity

People

Assignee:: Unassigned

Reporter:: KSHITIJ GAUTAM

Votes:: 1 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 19/Sep/18 22:34

Updated:: 11/Apr/19 16:45

Time Tracking

Estimated:

Remaining:

Logged:

Not Specified