Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1803

Avoid hitting OOM in HdfsTableSink when inserting to Parquet

    XMLWordPrintableJSON

Details

    Description

      Impala's memory consumption is very high when it writes to Parquet and there is a large number of partitions, primarily because we try to buffer data per partition. That however can lead to OOM, see attached profile. Instead we can either spill the buffered data to disk or write to Parquet files.

      Attachments

        1. hdfstablesink-oom.txt
          29 kB
          Ippokratis Pandis

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ippokratis Ippokratis Pandis
              Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: