Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14891

ALS in ML never validates input schema

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0
    • ML
    • None

    Description

      Currently, ALS.fit never validates the input schema. There is a transformSchema impl that calls validateAndTransformSchema, but it is never called in either ALS.fit or ALSModel.transform.

      This was highlighted in SPARK-13857 (and failing PySpark tests here)when adding a call to transformSchema in ALSModel.transform that actually validates the input schema. The PySpark docstring tests result in Long inputs by default, which fail validation as Int is required.

      Currently, the inputs for user and item ids are cast to Int, with no input type validation (or warning message). So users could pass in Long, Float, Double, etc. It's also not made clear anywhere in the docs that only Int types for user and item are supported.

      Enforcing validation seems the best option but might break user code that previously "just worked" especially in PySpark.

      Attachments

        Issue Links

          Activity

            People

              mlnick Nicholas Pentreath
              mlnick Nicholas Pentreath
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: