Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Currently Tajo creates too many intermediate files in the case of hash shuffle. A execution block(SubQuery) on a TajoWorker creates intermediate files as following rule:
- intermediate files in a worker = # tasks / # workers * # partitions
This may cause 'too many file opens' error and makes it difficult to scale out. To solve this problem, We should reduce number of hash shuffle output file.