Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21926

Compatibility between ML Transformers and Structured Streaming

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.3.0
    • ML, Structured Streaming
    • None

    Description

      We've run into a few cases where ML components don't play nice with streaming dataframes (for prediction). This ticket is meant to help aggregate these known cases in one place and provide a place to discuss possible fixes.

      Failing cases:
      1) VectorAssembler where one of the inputs is a VectorUDT column with no metadata.
      Possible fixes:
      More details here SPARK-22346.

      2) OneHotEncoder where the input is a column with no metadata.
      Possible fixes:
      a) Make OneHotEncoder an estimator (SPARK-13030).
      b) Allow user to set the cardinality of OneHotEncoder.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bago.amirbekian Bago Amirbekian
              Votes:
              3 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: