Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1028

INode.getPathNames could split more efficiently

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      INode.getPathnames uses String.split(String) which actually uses the full Java regex implementation. Since we're always splitting on a single char, we could implement a faster one like StringUtils.split() (except without the escape character). This takes a significant amount of CPU during FSImage loading so should be a worthwhile speedup.

      1. HDFS-split.2.patch
        0.8 kB
        Dmytro Molkov
      2. HDFS-split.patch
        2 kB
        Dmytro Molkov

        Issue Links

          Activity

          Hide
          ryan rawson added a comment -

          +1 - tsuna on our staff found that in a string intensive processing mini-app, he was able to speed his app up by a factor of 2 by not using that API. Also PrintWriter is many times faster than PrintStream.

          Show
          ryan rawson added a comment - +1 - tsuna on our staff found that in a string intensive processing mini-app, he was able to speed his app up by a factor of 2 by not using that API. Also PrintWriter is many times faster than PrintStream.
          Hide
          dhruba borthakur added a comment -

          +1 seems like a good idea.

          Show
          dhruba borthakur added a comment - +1 seems like a good idea.
          Hide
          Todd Lipcon added a comment -

          +1 to the body of the patch. Regarding the test change, how long does the test case take now? Can we make a better benchmark that is independent of the unit tests (I assume this change was to show a speed improvement)? I don't think it makes sense to overload the unit tests for the purposes of benchmarking.

          Personally I'd be satisfied to just have simple timings of loading one of your production fsimages with/without the change.

          Show
          Todd Lipcon added a comment - +1 to the body of the patch. Regarding the test change, how long does the test case take now? Can we make a better benchmark that is independent of the unit tests (I assume this change was to show a speed improvement)? I don't think it makes sense to overload the unit tests for the purposes of benchmarking. Personally I'd be satisfied to just have simple timings of loading one of your production fsimages with/without the change.
          Hide
          Dmytro Molkov added a comment -

          Good point, I will remove the changes from the UnitTest. One image I tested it on with 3.5 million files got a speedup of ~18% 36 second vs 30 seconds. We were not testing the change on bigger images yet.

          Show
          Dmytro Molkov added a comment - Good point, I will remove the changes from the UnitTest. One image I tested it on with 3.5 million files got a speedup of ~18% 36 second vs 30 seconds. We were not testing the change on bigger images yet.
          Hide
          Todd Lipcon added a comment -

          Cool, looks good to me!

          Show
          Todd Lipcon added a comment - Cool, looks good to me!
          Hide
          Dmytro Molkov added a comment -

          Submitting to hudson

          Show
          Dmytro Molkov added a comment - Submitting to hudson
          Hide
          dhruba borthakur added a comment -

          +1. code looks good

          Show
          dhruba borthakur added a comment - +1. code looks good
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12443789/HDFS-split.2.patch
          against trunk revision 939918.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/346/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/346/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/346/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/346/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12443789/HDFS-split.2.patch against trunk revision 939918. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/346/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/346/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/346/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/346/console This message is automatically generated.
          Hide
          dhruba borthakur added a comment -

          I just committed this. Thanks Dmytro.

          Show
          dhruba borthakur added a comment - I just committed this. Thanks Dmytro.

            People

            • Assignee:
              Dmytro Molkov
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development