Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26645

CSV infer schema bug infers decimal(9,-1)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.3.0, 2.4.7
    • 2.4.8, 3.0.0
    • SQL
    • None

    Description

      we have a file /tmp/t1/file.txt that contains only one line "1.18927098E9".
      running:

      df = spark.read.csv('/tmp/t1', header=False, inferSchema=True, sep='\t')
      print df.dtypes
      

      causes:

      ValueError: Could not parse datatype: decimal(9,-1)
      

      I'm not sure where the bug is - inferSchema or dtypes?
      I saw it is legal to have a decimal with negative scale in the code (CSVInferSchema.scala):

      if (bigDecimal.scale <= 0) {
              // `DecimalType` conversion can fail when
              //   1. The precision is bigger than 38.
              //   2. scale is bigger than precision.
              DecimalType(bigDecimal.precision, bigDecimal.scale)
            } 
      

      but what does it mean?

      Attachments

        Issue Links

          Activity

            People

              mgaido Marco Gaido
              uzadude Ohad Raviv
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: