Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23562

RFormula handleInvalid should handle invalid values in non-string columns.

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.4.0
    • Component/s: ML
    • Labels:
      None
    • Target Version/s:

      Description

      Currently when handleInvalid is set to 'keep' or 'skip' this only applies to String fields. Numeric fields that are null will either cause the transformer to fail or might be null in the resulting label column.

      I'm not sure what the semantics of keep might be for numeric columns with null values, but we should be able to at least support skip for these types.
      --> Discussed offline: null values can be converted to NaN values for "keep"

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                bago.amirbekian Bago Amirbekian
                Shepherd:
                Joseph K. Bradley
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: