[SPARK-21978] schemaInference option not to convert strings with leading zeros to int/long - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: 2.1.0, 2.1.1, 2.2.0, 2.3.0
Fix Version/s: None
Component/s: Spark Core
Labels:
- csv
- csvparser
- easy-fix
- inference
- ramp-up
- schema

Description

It would be great to have an option in Spark's schema inference to not to convert to int/long datatype a column that has leading zeros. Think zip codes, for example.

df = (sqlc.read.format('csv')
              .option('inferSchema', True)
              .option('header', True)
              .option('delimiter', '|')
              .option('leadingZeros', 'KEEP')       # this is the new proposed option
              .option('mode', 'FAILFAST')
              .load('csvfile_withzipcodes_to_ingest.csv')
            )

Attachments

Issue Links

is cloned by

SPARK-29316 CLONE - schemaInference option not to convert strings with leading zeros to int/long

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Ruslan Dautkhanov

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 12/Sep/17 05:41

Updated:: 12/Dec/22 18:11

Resolved:: 11/Oct/17 14:47