Currently Tajo creates too many intermediate files in the case of hash shuffle. A execution block(SubQuery) on a TajoWorker creates intermediate files as following rule:
- intermediate files in a worker = # tasks / # workers * # partitions
This may cause 'too many file opens' error and makes it difficult to scale out. To solve this problem, We should reduce number of hash shuffle output file.