Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-257

Parquet writer uses excessive memory with partitions

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 0.7
    • Impala 1.1
    • None
    • None

    Description

      The parquet table writer uses a lot of memory and this grows linearly with the number of output partitions. We'd like to write large files (~512MB-1GB) and these need to be buffered per partition. If the output has 100 partitions, this is 50GB+ RAM that is required.

      The buffering will be resolved if we can write multiple columns to different hdfs files, in which case we don't need to buffer at all.

      An alternative solution is to write the cols to local disk and then at the end stitch up the files.

      Attachments

        Activity

          People

            nong_impala_60e1 Nong Li
            nong_impala_60e1 Nong Li
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: