Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-2328

"Unsupported filter" error for "like" when using Spark DataFrame API

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.5.3
    • Fix Version/s: 4.6.0
    • Component/s: None
    • Labels:
      None

      Description

      Hi, I'm using Spark Dataframe API to connect to Hbase 0.98 through Phoenix 4.5.3 & get a " Unsupported filter" error when the filter condition is 'like'. The error trail & the relevant lines from the source code code given below.
      Also I have another related query. Given that Phoenix can be accessed using the standard java jdbc api, Spark DataFrame can also be constructed using "jdbc" format string ( E.g. df = sqlContext.read().format("jdbc").options(params).load(); where params is a Map of Phoenix jdbc connection url and other relevant parameters). So of these 2 ways to work with Phoenix with Spark i.e. 1. as a Spark datasource plugin 2. as another rdbms source, which one would be the recommended way & why?

      Exception:
      -------------
      2015-10-16 17:25:42,944 DEBUG [main] com.dataken.utilities.DFHelper
      Filtering using expr: ID like 'RrcLog%'

      Exception in thread "main" java.lang.Exception: Unsupported filter
      at org.apache.phoenix.spark.PhoenixRelation$$anonfun$buildFilter$1.apply(PhoenixRelation.scala:83)
      at org.apache.phoenix.spark.PhoenixRelation$$anonfun$buildFilter$1.apply(PhoenixRelation.scala:70)
      at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
      at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
      at org.apache.phoenix.spark.PhoenixRelation.buildFilter(PhoenixRelation.scala:70)
      at org.apache.phoenix.spark.PhoenixRelation.buildScan(PhoenixRelation.scala:42)
      at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$3.apply(DataSourceStrategy.scala:53)
      at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$3.apply(DataSourceStrategy.scala:53)
      at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:279)
      at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:278)
      at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:310)
      at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:274)
      at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:49)
      at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
      at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
      at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
      at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
      at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
      at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:374)
      at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
      at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
      at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
      at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
      at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:920)
      at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:918)
      at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:924)
      at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:924)
      at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:53)
      at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904)
      at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385)
      at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1315)
      at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1378)
      at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:178)
      at org.apache.spark.sql.DataFrame.show(DataFrame.scala:402)
      at org.apache.spark.sql.DataFrame.show(DataFrame.scala:363)
      at org.apache.spark.sql.DataFrame.show(DataFrame.scala:371)
      at com.dataken.designer.analytical.pojo.EvaluableExpressionTest.main(EvaluableExpressionTest.java:177)

      SOURCE CODE
      -----------------------
      DataFrame df = sqlContext.read().format("org.apache.phoenix.spark").options(params).load();
      df.filter("ID like 'RrcLog%'");

      Thanks,
      Suhas

        Attachments

        1. PHOENIX-2328.patch
          3 kB
          Josh Mahonin

          Activity

            People

            • Assignee:
              jmahonin Josh Mahonin
              Reporter:
              snalapure@dataken.net Suhas Nalapure
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: