[SPARK-29316] CLONE - schemaInference option not to convert strings with leading zeros to int/long - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Won't Fix
Affects Version/s: 2.1.0, 2.1.1, 2.2.0, 2.3.0
Fix Version/s: None
Component/s: Spark Core
Labels:
- csv
- csvparser
- easy-fix
- inference
- ramp-up
- schema

Description

It would be great to have an option in Spark's schema inference to not to convert to int/long datatype a column that has leading zeros. Think zip codes, for example.

df = (sqlc.read.format('csv')
              .option('inferSchema', True)
              .option('header', True)
              .option('delimiter', '|')
              .option('leadingZeros', 'KEEP')       # this is the new proposed option
              .option('mode', 'FAILFAST')
              .load('csvfile_withzipcodes_to_ingest.csv')
            )

The general usage of data with trailing 0 is for Identifiers. If they are converted to int/long defeats the purpose of inferSchema. The conversion should be provided on the basis of a flag whether the data should be converted to int/long or not.

Attachments

Issue Links

is a clone of

SPARK-21978 schemaInference option not to convert strings with leading zeros to int/long

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Ambar Raghuvanshi

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 01/Oct/19 13:11

Updated:: 04/Oct/19 08:26

Resolved:: 04/Oct/19 08:26