Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29316

CLONE - schemaInference option not to convert strings with leading zeros to int/long

    XMLWordPrintableJSON

    Details

      Description

      It would be great to have an option in Spark's schema inference to not to convert to int/long datatype a column that has leading zeros. Think zip codes, for example.

      df = (sqlc.read.format('csv')
                    .option('inferSchema', True)
                    .option('header', True)
                    .option('delimiter', '|')
                    .option('leadingZeros', 'KEEP')       # this is the new proposed option
                    .option('mode', 'FAILFAST')
                    .load('csvfile_withzipcodes_to_ingest.csv')
                  )
      
      The general usage of data with trailing 0 is for Identifiers. If they are converted to int/long defeats the purpose of inferSchema. The conversion should be provided on the basis of a flag whether the data should be converted to int/long or not. 

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                ambar.raghuvanshi Ambar Raghuvanshi
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: