Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23291

SparkR : substr : In SparkR dataframe , starting and ending position arguments in "substr" is giving wrong result when the position is greater than 1

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.2, 2.2.0, 2.2.1, 2.3.0
    • 2.3.1, 2.4.0
    • SparkR
    • None

    Description

      Defect Description :

      -----------------------------

      For example ,an input string "2017-12-01" is read into a SparkR dataframe "df" with column name "col1".
      The target is to create a a new column named "col2" with the value "12" which is inside the string ."12" can be extracted with "starting position" as "6" and "Ending position" as "7"
      (the starting position of the first character is considered as "1" )

      But,the current code that needs to be written is :

      df <- withColumn(df,"col2",substr(df$col1,7,8)))

      Observe that the first argument in the "substr" API , which indicates the 'starting position', is mentioned as "7"
      Also, observe that the second argument in the "substr" API , which indicates the 'ending position', is mentioned as "8"

      i.e the number that should be mentioned to indicate the position should be the "actual position + 1"

      Expected behavior :

      ----------------------------

      The code that needs to be written is :

      df <- withColumn(df,"col2",substr(df$col1,6,7)))

      Note :

      -----------
      This defect is observed with only when the starting position is greater than 1.

      Attachments

        Activity

          People

            viirya L. C. Hsieh
            narendra_k Narendra
            Felix Cheung Felix Cheung
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: