Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21978

schemaInference option not to convert strings with leading zeros to int/long

    XMLWordPrintableJSON

Details

    Description

      It would be great to have an option in Spark's schema inference to not to convert to int/long datatype a column that has leading zeros. Think zip codes, for example.

      df = (sqlc.read.format('csv')
                    .option('inferSchema', True)
                    .option('header', True)
                    .option('delimiter', '|')
                    .option('leadingZeros', 'KEEP')       # this is the new proposed option
                    .option('mode', 'FAILFAST')
                    .load('csvfile_withzipcodes_to_ingest.csv')
                  )
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Tagar Ruslan Dautkhanov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: