Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-2256

Avoid use of BufferTooSmallException to signal end of buffer in UnorderedPartitionedKVWriter

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.6.0, 0.7.0
    • Fix Version/s: 0.6.1
    • Component/s: None
    • Labels:
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      UnorderedPartitionedKVWriter delegates serialization to the application, passing it a private ByteArrayOutputStream. In case the buffer is exhausted, ByteArrayOutputStream signals that with a private BufferTooSmallException, which can be seen but not dealt with by the application. As Chris Wensel pointed out, when the application is in fact a complex framework, there is no way to distinguish this exception from a real failure, which compels logging the full stack even for reasonable events such as "buffer complete".

      Suggested approach: set a "complete" flag in ByteArrayOutputStream that disables any further output, and replace BufferTooSmallException (BTSE) handling by checking that flag.

      Siddharth Seth suggested checking out SortedOutput as well, as the mechanisms there should be similar.

      I'll give this a go this week.

        Attachments

        1. remove-btse-1-rfc.patch
          4 kB
          Cyrille Chépélov
        2. remove-btse-1-MASTER.patch
          4 kB
          Cyrille Chépélov

          Activity

            People

            • Assignee:
              cchepelov Cyrille Chépélov
              Reporter:
              cchepelov Cyrille Chépélov
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 6h
                6h
                Remaining:
                Remaining Estimate - 6h
                6h
                Logged:
                Time Spent - Not Specified
                Not Specified