Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1271

Improve memory usage of Hash-shuffle

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.12.0, 0.11.1
    • Component/s: Data Shuffle
    • Labels:

      Description

      Currently, Hash-shuffle keeps intermediate file appender and tuple list in memory and the required memory will be in proportion to the input size
      If input size is 10GB, the hash-join key partition count will be 78125 (10TB / 128MB) and the required memory is 10GB (78125 * 128KB).

      We should improve the hash-shuffle file writer as following :

      • Separate the buffer from the file writer
      • Keep the tuples in off-heap buffer and reuse the buffer
      • Flush the buffers, if total buffer capacity is required more than maxBufferSize
      • Write the partition files asynchronously

        Attachments

          Activity

            People

            • Assignee:
              jhkim Jinho Kim
              Reporter:
              jhkim Jinho Kim
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: