Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0
    • Data Shuffle
    • None

    Description

      Currently Tajo creates too many intermediate files in the case of hash shuffle. A execution block(SubQuery) on a TajoWorker creates intermediate files as following rule:

      1. intermediate files in a worker = # tasks / # workers * # partitions

      This may cause 'too many file opens' error and makes it difficult to scale out. To solve this problem, We should reduce number of hash shuffle output file.

      Attachments

        Activity

          People

            hjkim Hyoungjun Kim
            hjkim Hyoungjun Kim
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: