Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21211

Can't Read Partitions File - Partitions File deleted

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 1.5.0
    • None
    • None
    • Patch, Important
    • Hide
      Source URL:
      https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java#L827


      Steps to reproduce: There are multiple ways we can reproduce this problem:
      Requirements:
      A MapReduce job uses `HFileOutputFormat2.configureIncrementalLoad(job, table, locator);` for bulk loading instead of direct puts to HBase.
      Steps:
      Start the MapReduce Job from the terminal.
      Press (Control + C) or cancel the terminal session.
      We do this because it is not practical to leave the terminal session running the MapReduce job, especially when running on a dev machine.
      Any of the mappers that start after this point will fail.
      ` Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)`
      Process which kicks off the MapReduce job does the delete, We expect this because `filesystem.deleteOnExit` will delete the file when the FileSystem is closed.

      Show
      Source URL: https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java#L827 Steps to reproduce: There are multiple ways we can reproduce this problem: Requirements: A MapReduce job uses `HFileOutputFormat2.configureIncrementalLoad(job, table, locator);` for bulk loading instead of direct puts to HBase. Steps: Start the MapReduce Job from the terminal. Press (Control + C) or cancel the terminal session. We do this because it is not practical to leave the terminal session running the MapReduce job, especially when running on a dev machine. Any of the mappers that start after this point will fail. ` Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)` Process which kicks off the MapReduce job does the delete, We expect this because `filesystem.deleteOnExit` will delete the file when the FileSystem is closed.

    Description

      Hi team, we have a MapReduce job that uses the bulkload option instead of direct puts to import data e.g.,

      HFileOutputFormat2.configureIncrementalLoad(job, table, locator);

      However we have been running into a situation where partitions file is deleted by the termination of the JVM process, where JVM process kicks off the MapReduce job but it's also waiting to run the `configureIncrementalLoad` that executes the configurePartitioner.

       

      Error: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)

       

      We think the line#827 of HFileOutputFormat2 could be the root cause.

       

      fs.deleteOnExit(partitionsPath);

       

      We have created our custom HFileOutputFormat that doesn't delete the partitions file and have fixed the problem for our cluster. We propose that a cleanup method could be created which deletes the partitions file once all the mappers have finished.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            gautamkshitij KSHITIJ GAUTAM

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 1h
                1h
                Remaining:
                Remaining Estimate - 1h
                1h
                Logged:
                Time Spent - Not Specified
                Not Specified

                Slack

                  Issue deployment