Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36696

spark.read.parquet loads empty dataset

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.2.0
    • 3.2.0
    • SQL
    • None

    Description

      Here's a parquet file Spark 3.2/master can't read properly.

      The file was stored by pandas and must contain 3650 rows, but Spark 3.2/master returns an empty dataset.

      >>> import pandas as pd
      >>> len(pd.read_parquet('/path/to/example.parquet'))
      3650
      
      >>> spark.read.parquet('/path/to/example.parquet').count()
      0
      

      I guess it's caused by the parquet 1.12.0.

      When I reverted two commits related to the parquet 1.12.0 from branch-3.2:

      it reads the data successfully.

      We need to add some workaround, or revert the commits.

      Attachments

        1. example.parquet
          37 kB
          Takuya Ueshin

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ueshin Takuya Ueshin
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: