Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3809

The buffer size allocated for InMemoryMapOutput can be optimized

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Related jiras: TEZ-3752 and TEZ-3732.

      -When shuffling input to memory, the decompressed length is used to create the InMemoryMapOutput object. However, IFile.Reader's readToMemory reads 4 bytes less (the IFile header). These 4 bytes can optimized and, in an extreme case of 10,000,000 fetches, can save ~38 MB (TEZ-3732).

      -Memory-to-memory merge sums up the sizes of input InMemoryMapOutput buffers to allocate the new InMemoryMapOutput. However, each input has two EOF_MARKERs while only two are needed at the end.

      -InMemoryWriter wraps the output BoundedByteArrayOutputStream in IFileOutputStream which will write checksum at close. This creates an inconsistency between the primary input buffers which don't have checksum and the merged buffers which do. IFileOutputStream wrap can be removed to save 4 bytes per merged buffers.

      -InMemoryWriter does not account for two EOF_MARKERs written at close() in its accounting so that the getRawLength() method is off by two bytes.

      Attachments

        1. TEZ-3809.002.patch
          34 kB
          Muhammad Samir Khan
        2. TEZ-3809.001.patch
          34 kB
          Muhammad Samir Khan

        Activity

          People

            samkhan Muhammad Samir Khan
            samkhan Muhammad Samir Khan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: