Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-6321

Array of Shorts/Smallint returned as Array of Integers

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 5.0.0
    • None
    • spark-connector
    • None

    Description

      When using spark connector to read a Phoenix table with at least a column defined as Array of Shorts, the resulting Dataset infers the schema as a Array of Integers.

      I believe this is due to the following code:

      phoenix/phoenix-spark/src/main/scala/org/apache/phoenix/spark/PhoenixRDD.scala:182

      case t if t.isInstanceOf[PSmallintArray] || t.isInstanceOf[PUnsignedSmallintArray] => ArrayType(IntegerType, containsNull = true)

       

      phoenix-connectors/phoenix-spark-base/src/main/scala/org/apache/phoenix/spark/SparkSchemaUtil.scala:82

      case t if t.isInstanceOf[PSmallintArray] || t.isInstanceOf[PUnsignedSmallintArray] => ArrayType(IntegerType, containsNull = true)

       

      Subsequent tries to programatically cast to Shorts will fail with a ClassCastException.

      And it is also impossible to define the original schema within a DataFrameReader as it fails with:"org.apache.spark.sql.AnalysisException: org.apache.phoenix.spark does not allow user-specified schemas.;"

      Making it impossible afaik to work with tables with this kind of data types.

      Is there any reason to have this code intepreting SmallInts/Shorts as Integers?

      Thanks

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              alferca Alvaro Fernandez
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: