Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-3664

Pyspark: pushing filter by date against apache phoenix

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 4.7.0
    • Fix Version/s: 4.10.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Azure HDIndight (HDI 3.5) - pyspark using phoenix client. (Spark 1.6.3 - HBase 1.1.2 under HDP 2.5)

      Description

      I am trying to filter by date in apache phoenix from pyspark. The column in phoenix is created as Date and the filter is a datetime. When I use explain I see spark doesn't push the filter to phoenix. I have tried a lot of combinations without luck.

      Any way to do it?

      df = sqlContext.read \
      .format("org.apache.phoenix.spark") \
      .option("table", "TABLENAME") \
      .option("zkUrl",zookepperServer +":2181:/hbase-unsecure" ) \
      .load()
      print(df.printSchema())

      startValidation = datetime.datetime.now()

      print(df.filter(df['FH'] >startValidation).explain(True))

      Results:
      root

      – METER_ID: string (nullable = true)
      – FH: date (nullable = true)

      None
      == Parsed Logical Plan ==
      'Filter (FH#53 > 1486726683446150)
      +- RelationMETER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64 PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)

      == Analyzed Logical Plan ==
      METER_ID: string, FH: date, SUMMERTIME: string, MAGNITUDE: int, SOURCE: int, ENTRY_DATETIME: date, BC: string, T_VAL_AE: int, T_VAL_AI: int, T_VAL_R1: int, T_VAL_R2: int, T_VAL_R3: int, T_VAL_R4: int
      Filter (cast(FH#53 as string) > cast(1486726683446150 as string))
      +- RelationMETER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64 PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)

      == Optimized Logical Plan ==
      Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
      +- RelationMETER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64 PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)

      == Physical Plan ==
      Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
      +- Scan PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64
      None

      if I set the FH column as timestamp it pushes the filter but throws an exception:

      Caused by: org.apache.phoenix.exception.PhoenixParserException: ERROR 604 (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "12" at line 1, column 219.
      at org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33)
      at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:111)
      at org.apache.phoenix.jdbc.PhoenixStatement$PhoenixStatementParser.parseStatement(PhoenixStatement.java:1280)
      at org.apache.phoenix.jdbc.PhoenixStatement.parseStatement(PhoenixStatement.java:1363)
      at org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:1373)
      at org.apache.phoenix.jdbc.PhoenixStatement.optimizeQuery(PhoenixStatement.java:1368)
      at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:122)
      ... 102 more
      Caused by: MismatchedTokenException(106!=129)
      at org.apache.phoenix.parse.PhoenixSQLParser.recoverFromMismatchedToken(PhoenixSQLParser.java:360)
      at org.apache.phoenix.shaded.org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
      at org.apache.phoenix.parse.PhoenixSQLParser.not_expression(PhoenixSQLParser.java:6862)
      at org.apache.phoenix.parse.PhoenixSQLParser.and_expression(PhoenixSQLParser.java:6677)
      at org.apache.phoenix.parse.PhoenixSQLParser.or_expression(PhoenixSQLParser.java:6614)
      at org.apache.phoenix.parse.PhoenixSQLParser.expression(PhoenixSQLParser.java:6579)
      at org.apache.phoenix.parse.PhoenixSQLParser.single_select(PhoenixSQLParser.java:4615)
      at org.apache.phoenix.parse.PhoenixSQLParser.unioned_selects(PhoenixSQLParser.java:4697)
      at org.apache.phoenix.parse.PhoenixSQLParser.select_node(PhoenixSQLParser.java:4763)
      at org.apache.phoenix.parse.PhoenixSQLParser.oneStatement(PhoenixSQLParser.java:789)
      at org.apache.phoenix.parse.PhoenixSQLParser.statement(PhoenixSQLParser.java:508)
      at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:108)
      ... 107 more

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                pablocastilla Pablo Castilla
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: