Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-1918

Optimize smart resizing for ORC Writer converter buffer

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • gobblin-core
    • None

    Description

      The GobblinOrcWriter contains a converter and a buffer rowbatch. The buffer holds the converted Avro -> Orc records before adding them to the native orc writer.

      Since it can contain multiple records, it constantly needs to resize the columns of the rowbatch in order to hold multiple records. This problem affects both performance and memory when resizing is done either too often (enlarge factor is too low) or not often enough (enlarge factor is too high and thus the buffer dominates the container memory).

      Because there is a bounded number of records that can persist in the buffer before getting flushed, we want to reduce the aggressiveness of the resizing algorithm the more records that have been processed.

      Attachments

        Activity

          People

            abti Abhishek Tiwari
            wlo William Lo
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h
                1h