Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-4075

Tez: Reimplement tez.runtime.transfer.data-via-events.enabled

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.10.0
    • None
    • None

    Description

      This was factored out by TEZ-2196, which does skip buffers for 1-partition data exchanges (therefore goes to disk directly).

          if (shufflePayload.hasData()) {	    shuffleManager.addKnownInput(shufflePayload.getHost(),
            DataProto dataProto = shufflePayload.getData();	        shufflePayload.getPort(), srcAttemptIdentifier, srcIndex);
            FetchedInput fetchedInput = inputAllocator.allocate(dataProto.getRawLength(),	
                dataProto.getCompressedLength(), srcAttemptIdentifier);	
            moveDataToFetchedInput(dataProto, fetchedInput, hostIdentifier);	
            shuffleManager.addCompletedInputWithData(srcAttemptIdentifier, fetchedInput);	
          } else {	
            shuffleManager.addKnownInput(shufflePayload.getHost(),	
                shufflePayload.getPort(), srcAttemptIdentifier, srcIndex);	
          }	
      

      got removed in

      https://github.com/apache/tez/commit/1ba1f927c16a1d7c273b6cd1a8553e5269d1541a

      It would be better to buffer up the 512Byte limit for the event size before writing to disk, since creating a new file always incurs disk traffic, even if the file is eventually being served out of the buffer cache.

      The total overhead of receiving an event, then firing an HTTP call to fetch the data etc adds approx 100-150ms to a query - the data xfer through the event will skip the disk entirely for this & also remove the extra IOPS incurred.

      This channel is not suitable for large-scale event transport, but specifically the workload here deals with 1-row control tables which consume more bandwidth with HTTP headers and hostnames than the 93 byte payload.

      Attachments

        1. TEZ-4075.10.patch
          34 kB
          Rajesh Balamohan
        2. TEZ-4075.15.patch
          50 kB
          Rajesh Balamohan
        3. TEZ-4075.16.patch
          50 kB
          Rajesh Balamohan
        4. Tez-4075.5.patch
          64 kB
          Richard Zhang
        5. Tez-4075.8.patch
          64 kB
          Richard Zhang
        6. TEZ-4075.enable-dme.16.patch
          34 kB
          Rajesh Balamohan

        Issue Links

          Activity

            People

              rzhappy Richard Zhang
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h