Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18150 Spark 2.* failes to create partitions for avro files
  3. SPARK-18154

CLONE - Change Source API so that sources do not need to keep unbounded state

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: 2.0.0, 2.0.1
    • Fix Version/s: 2.0.2, 2.1.0
    • Component/s: Structured Streaming
    • Labels:
      None

      Description

      The version of the Source API in Spark 2.0.0 defines a single getBatch() method for fetching records from the source, with the following Scaladoc comments defining the semantics:

      /**
       * Returns the data that is between the offsets (`start`, `end`]. When `start` is `None` then
       * the batch should begin with the first available record. This method must always return the
       * same data for a particular `start` and `end` pair.
       */
      def getBatch(start: Option[Offset], end: Offset): DataFrame
      

      These semantics mean that a Source must retain all past history for the stream that it backs. Further, a Source is also required to retain this data across restarts of the process where the Source is instantiated, even when the Source is restarted on a different machine.
      These restrictions make it difficult to implement the Source API, as any implementation requires potentially unbounded amounts of distributed storage.
      See the mailing list thread at http://apache-spark-developers-list.1001551.n3.nabble.com/Source-API-requires-unbounded-distributed-storage-td18551.html for more information.
      This JIRA will cover augmenting the Source API with an additional callback that will allow Structured Streaming scheduler to notify the source when it is safe to discard buffered data.

        Attachments

          Activity

            People

            • Assignee:
              freiss Frederick Reiss
              Reporter:
              sunilsbjoshi Sunil Kumar
              Shepherd:
              Michael Armbrust
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: