Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-614

HFileUtils.writeToHFilesForIncrementalLoad slowed dramatically by copying KeyValue byte array

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.14.0
    • 0.15.0
    • None
    • None

    Description

      I raised this issue on the mailing list:
      http://mail-archives.apache.org/mod_mbox/crunch-user/201607.mbox/%3CCANBdsh01qaQRCNdQdtqytP%2BWAhT_NVGHyQAdDS8H%2BPPMfi9bkw%40mail.gmail.com%3E

      HFileUtils was changed in such a way that it makes a copy of the KeyValue byte array in the compare() method of the KeyValueComparator. The change was made with the following commit:

      https://github.com/apache/crunch/commit/a959ee6c7fc400d1f455b0742641c54de1dec0bf#diff-bc76ce0b41704c9c4efbfa1aab53588d

      The change causes HFileUtils.writeToHFilesForIncrementalLoad to be dramatically slower in at least some cases.

      The code changed from using the KeyValue(byte[], int, int) constructor to using KeyValue.create(). KeyValue.create() does a byte array copy. The fix is likely as simple as changing the code back to using the KeyValue constructor.

      I will do some testing an attach a PR for the fix.

      Attachments

        1. CRUNCH-614-1.patch
          2 kB
          Ben Roling

        Activity

          People

            jwills Josh Wills
            ben.roling Ben Roling
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: