Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19734

OneHotEncoder __init__ uses dropLast but doc strings all say includeFirst

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete CommentsDelete
    XMLWordPrintableJSON

Details

    • Documentation
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.5.2, 1.6.3, 2.0.2, 2.1.0
    • 2.2.0
    • PySpark

    Description

      The OneHotEncoder._init_ doc string in PySpark has an input keyword listed as includeFirst, whereas the code actually uses dropLast.

      This especially confusing because the _init_ function accepts only keywords, and following the documentation on the web (https://spark.apache.org/docs/2.0.1/api/python/pyspark.ml.html#pyspark.ml.feature.OneHotEncoder) or of help in Python will result in the error:

      TypeError: _init_() got an unexpected keyword argument 'includeFirst'

      The error is immediately viewable in the source code:

          @keyword_only
          def __init__(self, dropLast=True, inputCol=None, outputCol=None):
              """
              __init__(self, includeFirst=True, inputCol=None, outputCol=None)
              """
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mgrover Mark Grover Assign to me
            correedsh Corey
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment