Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21926

Compatibility between ML Transformers and Structured Streaming

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • ML, Structured Streaming
    • None

    Description

      We've run into a few cases where ML components don't play nice with streaming dataframes (for prediction). This ticket is meant to help aggregate these known cases in one place and provide a place to discuss possible fixes.

      Failing cases:
      1) VectorAssembler where one of the inputs is a VectorUDT column with no metadata.
      Possible fixes:
      More details here SPARK-22346.

      2) OneHotEncoder where the input is a column with no metadata.
      Possible fixes:
      a) Make OneHotEncoder an estimator (SPARK-13030).
      b) Allow user to set the cardinality of OneHotEncoder.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            bago.amirbekian Bago Amirbekian
            Votes:
            3 Vote for this issue
            Watchers:
            17 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment