Description
Given the following table definition:
CREATE TABLE "mytable" ( "id" VARCHAR NOT NULL CONSTRAINT pk PRIMARY KEY ("id") ) SALT_BUCKETS=16
And the following code setting up a PhoenixPigConfiguration:
val phoenixConf = new PhoenixPigConfiguration(new Configuration()) phoenixConf.setSelectStatement("SELECT \"id\" FROM \"mytable\"") phoenixConf.setSelectColumns("id") phoenixConf.setSchemaType(SchemaType.QUERY) phoenixConf.configure("127.0.0.1", "\"mytable\"", 100) val phoenixRDD = sc.newAPIHadoopRDD(phoenixConf.getConfiguration, classOf[PhoenixInputFormat], classOf[NullWritable], classOf[PhoenixRecord])
The above seems to work, but when I later call phoenixConf.getSelectColumnMetadataList, I get the following error:
java.sql.SQLException: Unable to resolve these column names: id Available columns with column families: _SALT,id at org.apache.phoenix.util.PhoenixRuntime.generateColumnInfo(PhoenixRuntime.java:354) at org.apache.phoenix.pig.PhoenixPigConfiguration$PhoenixPigConfigurationUtil.getSelectColumnMetadataList(PhoenixPigConfiguration.java:269) at org.apache.phoenix.pig.PhoenixPigConfiguration.getSelectColumnMetadataList(PhoenixPigConfiguration.java:157) at com.simplymeasured.spark.PhoenixRDD.toSchemaRDD(PhoenixRDD.scala:52) at com.simplymeasured.spark.PhoenixRDDTest$$anonfun$3.apply$mcV$sp(PhoenixRDDTest.scala:35) at com.simplymeasured.spark.PhoenixRDDTest$$anonfun$3.apply(PhoenixRDDTest.scala:31) at com.simplymeasured.spark.PhoenixRDDTest$$anonfun$3.apply(PhoenixRDDTest.scala:31) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
Looking at PhoenixRuntime, within getColumnInfo(), it's performing a trim().toUpperCase(), which doesn't seem valid: https://github.com/apache/phoenix/blob/3.0/phoenix-core/src/main/java/org/apache/phoenix/util/PhoenixRuntime.java#L374
I'm attempting to use this from within Spark, and I would like to rely on getSelectColumnMetadataList to build a Schema RDD.