Uploaded image for project: 'Apache Nemo'
  1. Apache Nemo
  2. NEMO-209

Enable Local Combiner for BeamSQL

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:

      Description

      Combiner: KV<KeyType, Iterable<InputType>> => KV<KeyType, OutputType>

      We cannot insert a this combiner prior to data shuffle, if InputType != OutputType.

      Unfortunately, BeamSQL uses the 'Row' type for both InputType and OutputType, which does not reveal the row-internal types (e.g., what columns does it have).

      This creates a situation where from our perspective InputType == OutputType (both are 'Row'), but the row-internal types of the input and the output may differ. In this situation, inserting a combiner prior to data shuffle results in an error.

      For example, if the combiner removes a column, then when trying to apply a second round of Combiner on the 'Row' OutputType generated by the first round of Combiner will lead to an out of index error.

      As the 'Row' type is an experimental feature in Beam, we'll see how the API develops and find opportunity to enable local combiner for BeamSQL.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              johnyangk John Yang
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: