Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21211

Can't Read Partitions File - Partitions File deleted

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 1.5.0
    • None
    • None
    • Patch, Important
    • Hide
      Source URL:
      https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java#L827


      Steps to reproduce: There are multiple ways we can reproduce this problem:
      Requirements:
      A MapReduce job uses `HFileOutputFormat2.configureIncrementalLoad(job, table, locator);` for bulk loading instead of direct puts to HBase.
      Steps:
      Start the MapReduce Job from the terminal.
      Press (Control + C) or cancel the terminal session.
      We do this because it is not practical to leave the terminal session running the MapReduce job, especially when running on a dev machine.
      Any of the mappers that start after this point will fail.
      ` Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)`
      Process which kicks off the MapReduce job does the delete, We expect this because `filesystem.deleteOnExit` will delete the file when the FileSystem is closed.

      Show
      Source URL: https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java#L827 Steps to reproduce: There are multiple ways we can reproduce this problem: Requirements: A MapReduce job uses `HFileOutputFormat2.configureIncrementalLoad(job, table, locator);` for bulk loading instead of direct puts to HBase. Steps: Start the MapReduce Job from the terminal. Press (Control + C) or cancel the terminal session. We do this because it is not practical to leave the terminal session running the MapReduce job, especially when running on a dev machine. Any of the mappers that start after this point will fail. ` Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)` Process which kicks off the MapReduce job does the delete, We expect this because `filesystem.deleteOnExit` will delete the file when the FileSystem is closed.

    Description

      Hi team, we have a MapReduce job that uses the bulkload option instead of direct puts to import data e.g.,

      HFileOutputFormat2.configureIncrementalLoad(job, table, locator);

      However we have been running into a situation where partitions file is deleted by the termination of the JVM process, where JVM process kicks off the MapReduce job but it's also waiting to run the `configureIncrementalLoad` that executes the configurePartitioner.

       

      Error: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)

       

      We think the line#827 of HFileOutputFormat2 could be the root cause.

       

      fs.deleteOnExit(partitionsPath);

       

      We have created our custom HFileOutputFormat that doesn't delete the partitions file and have fixed the problem for our cluster. We propose that a cleanup method could be created which deletes the partitions file once all the mappers have finished.

      Attachments

        Activity

          People

            Unassigned Unassigned
            gautamkshitij KSHITIJ GAUTAM
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 1h
                1h
                Remaining:
                Remaining Estimate - 1h
                1h
                Logged:
                Time Spent - Not Specified
                Not Specified