[ZEPPELIN-4009] Large Numbers Truncated - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Critical
Resolution: Unresolved
Affects Version/s: 0.8.0
Fix Version/s: None
Component/s: build
Labels:
None

External issue URL:
https://issues.apache.org/jira/browse/SPARK-26693
External issue ID:
~~SPARK-26693~~

Description

(Copied from Apache Spark issue 26693 as it appears to be a Zeppelin issue rather than Spark)

We have a process that takes a file dumped from an external API and formats it for use in other processes. These API dumps are brought into Spark with all fields read in as strings. One of the fields is a 19 digit visitor ID. Since implementing Spark 2.4 a few weeks ago, we have noticed that dataframes read the 19 digits correctly but any function in SQL appears to truncate the last two digits and replace them with "00".

Our process is set up to convert these numbers to bigint, which worked before Spark 2.4. We looked into data types, and the possibility of changing to a "long" type with no luck. At that point we tried bringing in the string value as is, with the same result. I've added code that should replicate the issue with a few 19 digit test cases and demonstrating the type conversions I tried.


{

%pyspark

from pyspark.sql.functions import *
from pyspark.sql.types import *


sfTestValue = StructField("testValue",StringType(), True)
schemaTest = StructType([sfTestValue])

listTestValues = []
listTestValues.append(("4065453307562594031",))
listTestValues.append(("7659957277770523059",))
listTestValues.append(("1614560078712787995",))

dfTest = spark.createDataFrame(listTestValues, schemaTest)

dfTestExpanded = dfTest.selectExpr(\
"testValue as idAsString",\
"cast(testValue as bigint) as idAsBigint",\
"cast(testValue as long) as idAsLong")

dfTestExpanded.show() ## This will show three columns of data correctly.

dfTestExpanded.createOrReplaceGlobalTempView('testTable') ## When this table is viewed in a %sql paragraph, the truncated values are shown.

sqlContext.sql('select * from global_temp.testTable').show(3) ## shows correct values

 

}

 

%sql
select * from global_temp.testTable    --shows incorrectly truncated values

 

 

 
{code:java}

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Jason Ferrell

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 15/Feb/19 23:36

Updated:: 15/Feb/19 23:43