Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-1443

Add replicated data source

    XMLWordPrintableJSON

Details

    Description

      This issue proposes to add support for data sources that read the same data in all parallel instances. This feature can be useful, if the data is replicated to all machines in a cluster and can be locally read.
      For example, a replicated input format can be used for a broadcast join without sending any data over the network.

      The following changes are necessary to achieve this:
      1) Add a replicating InputSplitAssigner which assigns all splits to the all parallel instances. This requires also to extend the InputSplitAssigner interface to identify the exact parallel instance that requests an InputSplit (currently only the hostname is provided).
      2) Make sure that the DOP of the replicated data source is identical to the DOP of its successor.
      3) Let the optimizer know that the data is replicated and ensure that plan enumeration works correctly.

      Attachments

        Activity

          People

            fhueske Fabian Hueske
            fhueske Fabian Hueske
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: