Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-9

Change the default intermediate data file format for hash repartitioning

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: Data Shuffle
    • Labels:
      None

      Description

      For easy debugging, the hash repartitioning have used CSV as the default intermediate data format. CSV file format may cause parsing overhead, and it may cause relatively large intermediate data to be transmitted via networks. We need to change it to RawFile or another efficient file format.

      Digging PartitionedStoredExec class is a good starting point for this issue.

        Issue Links

          Activity

          Hide
          hyunsik Hyunsik Choi added a comment -

          This issue is duplicated to TAJO-435.

          Show
          hyunsik Hyunsik Choi added a comment - This issue is duplicated to TAJO-435 .

            People

            • Assignee:
              hyunsik Hyunsik Choi
              Reporter:
              hyunsik Hyunsik Choi
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development