Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21978

schemaInference option not to convert strings with leading zeros to int/long

    XMLWordPrintableJSON

    Details

      Description

      It would be great to have an option in Spark's schema inference to not to convert to int/long datatype a column that has leading zeros. Think zip codes, for example.

      df = (sqlc.read.format('csv')
                    .option('inferSchema', True)
                    .option('header', True)
                    .option('delimiter', '|')
                    .option('leadingZeros', 'KEEP')       # this is the new proposed option
                    .option('mode', 'FAILFAST')
                    .load('csvfile_withzipcodes_to_ingest.csv')
                  )
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                Tagar Ruslan Dautkhanov
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: