HBase
  1. HBase
  2. HBASE-7744

Tools creating HFiles (Import, ImportTsv) don't run in local mode

    Details

    • Type: Improvement Improvement
    • Status: Patch Available
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: mapreduce
    • Labels:
      None

      Description

      This is mostly a developer pain point.

      HFileOutputFormat#configureIncrementalLoad depends on DistributedCache#createSymlink to find the splits file when configuring TOP. This symlink doesn't work when run in local mode.

      1. HBASE-7744-v1.patch
        3 kB
        Alexandre Normand
      2. HBASE-7744-trunk.patch
        0.9 kB
        Alexandre Normand
      3. HBASE-7744-0.94.patch
        4 kB
        Alexandre Normand
      4. HBASE-7744-0.94.6-v2.patch
        1 kB
        Alexandre Normand

        Issue Links

          Activity

          Hide
          Nick Dimiduk added a comment -

          Indeed, it looks like we still depend on symlinks in 0.94 but no longer on 0.96/trunk. What do you think about backporting HBASE-4285 to 0.94 as a first step – then there's no need for the localmode check?

          I agree, it would be nice to have a test explicitly for this.

          Show
          Nick Dimiduk added a comment - Indeed, it looks like we still depend on symlinks in 0.94 but no longer on 0.96/trunk. What do you think about backporting HBASE-4285 to 0.94 as a first step – then there's no need for the localmode check? I agree, it would be nice to have a test explicitly for this.
          Hide
          Alexandre Normand added a comment -

          Sorry for the delay. I spent some time yesterday trying to add a test that would confirm that bulk load is broken in local mode on trunk. It's a hack but it forces mapred.job.tracker to local and sets the filesystem to be local too. All that done, it still seems to me that this was fixed indirectly on trunk by some other commits. Specifically, the key to the 0.94 fix is the fact that this gets executed when running in local mode:

                boolean localMode = "local".equals(conf.get("mapred.job.tracker"));
                if (localMode) {
                  conf.set(TotalOrderPartitioner.PARTITIONER_PATH, partitionsPath.toString());
                }
          

          On trunk, we have a similar behavior except that it's executed in all cases (which is much cleaner, in my opinion):

            static void configurePartitioner(Job job, List<ImmutableBytesWritable> splitPoints)
                throws IOException {
          
              // create the partitions file
              FileSystem fs = FileSystem.get(job.getConfiguration());
              Path partitionsPath = new Path("/tmp", "partitions_" + UUID.randomUUID());
              ...
              job.setPartitionerClass(TotalOrderPartitioner.class);
              TotalOrderPartitioner.setPartitionFile(job.getConfiguration(), partitionsPath);
            }
          

          I'd mark this as already fixed on trunk. I can be convinced to attach my hacked together test (it's basically just extending the existing IntegrationTestBulkLoad while overriding a few configurations to run in local mode) if anyone is curious.

          Can anyone double check me?

          Show
          Alexandre Normand added a comment - Sorry for the delay. I spent some time yesterday trying to add a test that would confirm that bulk load is broken in local mode on trunk. It's a hack but it forces mapred.job.tracker to local and sets the filesystem to be local too. All that done, it still seems to me that this was fixed indirectly on trunk by some other commits. Specifically, the key to the 0.94 fix is the fact that this gets executed when running in local mode: boolean localMode = "local" .equals(conf.get( "mapred.job.tracker" )); if (localMode) { conf.set(TotalOrderPartitioner.PARTITIONER_PATH, partitionsPath.toString()); } On trunk, we have a similar behavior except that it's executed in all cases (which is much cleaner, in my opinion): static void configurePartitioner(Job job, List<ImmutableBytesWritable> splitPoints) throws IOException { // create the partitions file FileSystem fs = FileSystem.get(job.getConfiguration()); Path partitionsPath = new Path( "/tmp" , "partitions_" + UUID.randomUUID()); ... job.setPartitionerClass(TotalOrderPartitioner.class); TotalOrderPartitioner.setPartitionFile(job.getConfiguration(), partitionsPath); } I'd mark this as already fixed on trunk. I can be convinced to attach my hacked together test (it's basically just extending the existing IntegrationTestBulkLoad while overriding a few configurations to run in local mode) if anyone is curious. Can anyone double check me?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12607729/HBASE-7744-trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607729/HBASE-7744-trunk.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7510//console This message is automatically generated.
          Hide
          Alexandre Normand added a comment -

          I'm attaching the patch for trunk. I would have liked to include a test that showed this was broken before the patch but I need a bit more time to familiarize myself with that code first. I'm going to try and take a deeper look at the testing tomorrow.

          Show
          Alexandre Normand added a comment - I'm attaching the patch for trunk. I would have liked to include a test that showed this was broken before the patch but I need a bit more time to familiarize myself with that code first. I'm going to try and take a deeper look at the testing tomorrow.
          Hide
          Nick Dimiduk added a comment -

          IntegrationTestBulkLoad runs on a minicluster by default, and a real cluster otherwise. Hence, it won't be using the local filesystem. I think you're patch is still valid and would be nice for trunk. Will review in a day or two once things on my end settle.

          Show
          Nick Dimiduk added a comment - IntegrationTestBulkLoad runs on a minicluster by default, and a real cluster otherwise. Hence, it won't be using the local filesystem. I think you're patch is still valid and would be nice for trunk. Will review in a day or two once things on my end settle.
          Hide
          Alexandre Normand added a comment -

          Ted Yu, I think this might already be fixed on trunk. Elliot wrote hbase-it/src/test/java/org/apache/hadoop/hbase/mapreduce/IntegrationTestBulkLoad.java which exercises it as far as I can tell. I'll defer the confirmation to Elliot since I'm still pretty unfamiliar with the code base and the integration test setup. Elliott Clark, can you confirm this?

          Show
          Alexandre Normand added a comment - Ted Yu , I think this might already be fixed on trunk. Elliot wrote hbase-it/src/test/java/org/apache/hadoop/hbase/mapreduce/IntegrationTestBulkLoad.java which exercises it as far as I can tell. I'll defer the confirmation to Elliot since I'm still pretty unfamiliar with the code base and the integration test setup. Elliott Clark , can you confirm this?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12607636/HBASE-7744-0.94.6-v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7506//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607636/HBASE-7744-0.94.6-v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7506//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          @Alex:
          QA bot wouldn't be able to apply 0.94 patch on trunk.

          Can you provide trunk patch ?

          Thanks

          Show
          Ted Yu added a comment - @Alex: QA bot wouldn't be able to apply 0.94 patch on trunk. Can you provide trunk patch ? Thanks
          Hide
          Alexandre Normand added a comment -

          Third time's the charm.

          Show
          Alexandre Normand added a comment - Third time's the charm.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12607628/HBASE-7744-v1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7503//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607628/HBASE-7744-v1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7503//console This message is automatically generated.
          Hide
          Alexandre Normand added a comment -

          Sorry, there were remnants of my testing in the first patch. Here's a cleaned up one.

          Show
          Alexandre Normand added a comment - Sorry, there were remnants of my testing in the first patch. Here's a cleaned up one.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12607625/HBASE-7744-0.94.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7502//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607625/HBASE-7744-0.94.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7502//console This message is automatically generated.
          Hide
          Alexandre Normand added a comment -

          I tried this patch applied on https://github.com/cloudera/hbase/tree/cdh4-0.94.6_4.3.0 and I successfully ran bulk load tests locally with it. It's a slightly simplified version of the handling done in ```ImportSequenceFile```.

          Show
          Alexandre Normand added a comment - I tried this patch applied on https://github.com/cloudera/hbase/tree/cdh4-0.94.6_4.3.0 and I successfully ran bulk load tests locally with it. It's a slightly simplified version of the handling done in ```ImportSequenceFile```.
          Hide
          Nick Dimiduk added a comment -

          Also keep in mind your integration tests should run on a pseudo-distributed and mini-cluster deployment – basically, anything with real DFS.

          Show
          Nick Dimiduk added a comment - Also keep in mind your integration tests should run on a pseudo-distributed and mini-cluster deployment – basically, anything with real DFS.
          Hide
          Nick Dimiduk added a comment -

          Do let us know how this goes! I'd prefer to see our dependency on the symlink removed, but the hcat approach should work in a pinch. While you're poking around in there, mind see if you can pass in the actual partitions file path via config objects?

          Thanks Alexandre Normand.

          Show
          Nick Dimiduk added a comment - Do let us know how this goes! I'd prefer to see our dependency on the symlink removed, but the hcat approach should work in a pinch. While you're poking around in there, mind see if you can pass in the actual partitions file path via config objects? Thanks Alexandre Normand .
          Hide
          Alexandre Normand added a comment -

          We'd really like to be able to integration tests our jobs that use bulk loading and this is our current blocker. I'm going to use the approach that Nick Dimiduk referred to in his comment above and post back the results. I'm crossing my fingers that this will be enough to do bulk loading successfully in local.

          Show
          Alexandre Normand added a comment - We'd really like to be able to integration tests our jobs that use bulk loading and this is our current blocker. I'm going to use the approach that Nick Dimiduk referred to in his comment above and post back the results. I'm crossing my fingers that this will be enough to do bulk loading successfully in local.
          Show
          Nick Dimiduk added a comment - HCat's bulk import feature gets around this via some detection logic: https://github.com/apache/hcatalog/blob/branch-0.5/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/ImportSequenceFile.java#L175-L185

            People

            • Assignee:
              Alexandre Normand
              Reporter:
              Nick Dimiduk
            • Votes:
              5 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:

                Development