Uploaded image for project: 'Tajo (Retired)'
  1. Tajo (Retired)
  2. TAJO-374

Investigate more efficient intermediate shuffle methods

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • Data Shuffle
    • None

    Description

      Motivation

      Currently, Tajo materializes intermediate data on local disks. Tajo stores one file for each partition. It becomes inefficient and not scalable as data volume and increase. In MR, this challenge was resolved by sorting intermediate key-values, grouping the same key data, and indexing on keys. But, It requires unnecessary sort and disk I/O. This is not feasible in Tajo.

      References

      Attachments

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              Unassigned Unassigned
              hyunsik Hyunsik Choi
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: