Uploaded image for project: 'Apache Nemo'
  1. Apache Nemo
  2. NEMO-209

Enable Local Combiner for BeamSQL

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None

    Description

      Combiner: KV<KeyType, Iterable<InputType>> => KV<KeyType, OutputType>

      We cannot insert a this combiner prior to data shuffle, if InputType != OutputType.

      Unfortunately, BeamSQL uses the 'Row' type for both InputType and OutputType, which does not reveal the row-internal types (e.g., what columns does it have).

      This creates a situation where from our perspective InputType == OutputType (both are 'Row'), but the row-internal types of the input and the output may differ. In this situation, inserting a combiner prior to data shuffle results in an error.

      For example, if the combiner removes a column, then when trying to apply a second round of Combiner on the 'Row' OutputType generated by the first round of Combiner will lead to an out of index error.

      As the 'Row' type is an experimental feature in Beam, we'll see how the API develops and find opportunity to enable local combiner for BeamSQL.

      Attachments

        Activity

          People

            Unassigned Unassigned
            johnyangk John Yang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: