Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12231

Failed to generate predicate Error when using dropna

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.2, 1.6.0
    • Fix Version/s: 2.0.0
    • Component/s: PySpark, SQL
    • Labels:
      None
    • Environment:

      python version: 2.7.9
      os: ubuntu 14.04

      Description

      code to reproduce error

      1. write.py
        import pyspark
        sc = pyspark.SparkContext()
        sqlc = pyspark.SQLContext(sc)
        df = sqlc.range(10)
        df1 = df.withColumn('a', df['id'] * 2)
        df1.write.partitionBy('id').parquet('./data')
        
      1. read.py
        import pyspark
        sc = pyspark.SparkContext()
        sqlc = pyspark.SQLContext(sc)
        df2 = sqlc.read.parquet('./data')
        df2.dropna().count()
        

      $ spark-submit write.py
      $ spark-submit read.py

      1. error message
        15/12/08 17:20:34 ERROR Filter: Failed to generate predicate, fallback to interpreted org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: a#0L
        ...
        

      If write data without partitionBy, the error won't happen

        Attachments

          Activity

            People

            • Assignee:
              kevinyu98 kevin yu
              Reporter:
              yahsuan yahsuan, chang
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: