[PHOENIX-6321] Array of Shorts/Smallint returned as Array of Integers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 5.0.0
Fix Version/s: None
Component/s: spark-connector
Labels:
None

Description

When using spark connector to read a Phoenix table with at least a column defined as Array of Shorts, the resulting Dataset infers the schema as a Array of Integers.

I believe this is due to the following code:

phoenix/phoenix-spark/src/main/scala/org/apache/phoenix/spark/PhoenixRDD.scala:182

case t if t.isInstanceOf[PSmallintArray] || t.isInstanceOf[PUnsignedSmallintArray] => ArrayType(IntegerType, containsNull = true)

phoenix-connectors/phoenix-spark-base/src/main/scala/org/apache/phoenix/spark/SparkSchemaUtil.scala:82

case t if t.isInstanceOf[PSmallintArray] || t.isInstanceOf[PUnsignedSmallintArray] => ArrayType(IntegerType, containsNull = true)

Subsequent tries to programatically cast to Shorts will fail with a ClassCastException.

And it is also impossible to define the original schema within a DataFrameReader as it fails with:"org.apache.spark.sql.AnalysisException: org.apache.phoenix.spark does not allow user-specified schemas.;"

Making it impossible afaik to work with tables with this kind of data types.

Is there any reason to have this code intepreting SmallInts/Shorts as Integers?

Thanks

Attachments

Issue Links

duplicates

PHOENIX-6559 spark connector access to SmallintArray / UnsignedSmallintArray columns

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Alvaro Fernandez

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 17/Jan/21 17:53

Updated:: 18/Apr/23 06:24

Resolved:: 18/Apr/23 06:24