Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6581

Metadata is missing when saving parquet file using hadoop 1.0.4

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Won't Fix
    • 1.3.0
    • None
    • SQL
    • None
    • hadoop 1.0.4

    Description

      When saving parquet file with

      df.save("foo", "parquet")

      It generates only _common_data while _metadata is missing:

      -rwxrwxrwx  1 peilunlee  staff    0 Mar 27 11:29 _SUCCESS*
      -rwxrwxrwx  1 peilunlee  staff  250 Mar 27 11:29 _common_metadata*
      -rwxrwxrwx  1 peilunlee  staff  272 Mar 27 11:29 part-r-00001.parquet*
      -rwxrwxrwx  1 peilunlee  staff  272 Mar 27 11:29 part-r-00002.parquet*
      -rwxrwxrwx  1 peilunlee  staff  272 Mar 27 11:29 part-r-00003.parquet*
      -rwxrwxrwx  1 peilunlee  staff  488 Mar 27 11:29 part-r-00004.parquet*
      

      If saving with

      df.save("foo", "parquet", SaveMode.Overwrite)

      Both _metadata and _common_metadata are missing:

      -rwxrwxrwx  1 peilunlee  staff    0 Mar 27 11:29 _SUCCESS*
      -rwxrwxrwx  1 peilunlee  staff  272 Mar 27 11:29 part-r-00001.parquet*
      -rwxrwxrwx  1 peilunlee  staff  272 Mar 27 11:29 part-r-00002.parquet*
      -rwxrwxrwx  1 peilunlee  staff  272 Mar 27 11:29 part-r-00003.parquet*
      -rwxrwxrwx  1 peilunlee  staff  488 Mar 27 11:29 part-r-00004.parquet*
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pllee Pei-Lun Lee
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: