Uploaded image for project: 'Apache Hop (Retired)'
  1. Apache Hop (Retired)
  2. HOP-4226

Beam rows serialize too much data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 2.0.0
    • 2.1.0
    • API, Beam
    • None

    Description

      Hop rows internally over-allocate room in the Object[] to prevent excessive re-creation of arrays when a field is added in the next transform for example.

      However, in the context of Beam every transform serializes and de-serialized data so we take the re-creation hit regardless.  On top of that, more data is being serialized than strictly required, null values in this cases.

      The default over-allocation is 10 fields.  If we can avoid serializing those it could make a difference.

      One of the things we can do is not use RowDataUtil.allocateRowData() in the Beam transforms and functions.

      Another is 

      Attachments

        Activity

          People

            mcasters Matt Casters
            mcasters Matt Casters
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: