Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-374

Investigate more efficient intermediate shuffle methods

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Data Shuffle
    • Labels:
      None

      Description

      Motivation

      Currently, Tajo materializes intermediate data on local disks. Tajo stores one file for each partition. It becomes inefficient and not scalable as data volume and increase. In MR, this challenge was resolved by sorting intermediate key-values, grouping the same key data, and indexing on keys. But, It requires unnecessary sort and disk I/O. This is not feasible in Tajo.

      References

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Hide
          hyunsik Hyunsik Choi added a comment -

          Currently, this issue was resolved by two sub issues. So, I close this issue.

          Show
          hyunsik Hyunsik Choi added a comment - Currently, this issue was resolved by two sub issues. So, I close this issue.
          Hide
          jihoonson Jihoon Son added a comment -

          +1 for this issue.
          This work is mandatory.

          Show
          jihoonson Jihoon Son added a comment - +1 for this issue. This work is mandatory.

            People

            • Assignee:
              Unassigned
              Reporter:
              hyunsik Hyunsik Choi
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development