Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5707

HDFS table sinks should operate within a memory constraint

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 2.10.0
    • None
    • Backend

    Description

      We should modify the Parquet table sink so that it reserves memory upfront and allocates the bulk of its memory (column buffers, etc) from the buffer pool.

      The reservation calculation should be fairly straightforward for clustered inserts or single-partition inserts, but currently the memory consumption of dynamically partitioned non-cluster inserts is not bounded. In that case we would need to flush partitions to disk to free memory.

      One possibly tricky edge case is inserting large string values, since the values could be larger than the regular 2MB buffer size.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: