Uploaded image for project: 'Tajo (Retired)'
  1. Tajo (Retired)
  2. TAJO-931

Output file can be punctuated depending on the file size.

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0
    • Physical Operator
    • None

    Description

      There are some file formats (e.g., Parquet) which are not splittable. They can usually span multiple HDFS blocks if one file is very large. It causes remote HDFS access and limits the parallel degree, resulting in significant performance degradation.

      We can solve this problem if StoreTableExec or

      {Col|SortBased}

      PartitionStoreExec can punctuate the final output file according to the written size.

      In addition, we need to support a session variable to determine the per file size of final output files. So, TAJO-928 blocks this issue.

      Attachments

        Issue Links

          Activity

            People

              hyunsik Hyunsik Choi
              hyunsik Hyunsik Choi
              Votes:
              0 Vote for this issue
              Watchers:
              2 Stop watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Slack

                  Issue deployment