[SPARK-26645] CSV infer schema bug infers decimal(9,-1) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.3.0, 2.4.7
Fix Version/s: 2.4.8, 3.0.0
Component/s: SQL
Labels:
None

Description

we have a file /tmp/t1/file.txt that contains only one line "1.18927098E9".
running:

df = spark.read.csv('/tmp/t1', header=False, inferSchema=True, sep='\t')
print df.dtypes

causes:

ValueError: Could not parse datatype: decimal(9,-1)

I'm not sure where the bug is - inferSchema or dtypes?
I saw it is legal to have a decimal with negative scale in the code (CSVInferSchema.scala):

if (bigDecimal.scale <= 0) {
        // `DecimalType` conversion can fail when
        //   1. The precision is bigger than 38.
        //   2. scale is bigger than precision.
        DecimalType(bigDecimal.precision, bigDecimal.scale)
      }

but what does it mean?

Attachments

Issue Links

relates to

SPARK-33445 Can't parse decimal type from csv file

Resolved

links to

[Github] Pull Request #30503 (dongjoon-hyun)

GitHub Pull Request #23575

Activity

People

Assignee:: Marco Gaido

Reporter:: Ohad Raviv

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 17/Jan/19 14:12

Updated:: 12/Dec/22 18:10

Resolved:: 20/Jan/19 09:46