Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3037

Add ArrayType containing null value support to Parquet.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 1.1.0
    • SQL
    • None

    Description

      Parquet support should handle ArrayType when containsNull is true.

      When containsNull is true, the schema should be as follows:

      message root {
        optional group a (LIST) {
          repeated group bag {
            optional int32 array_element;
          }
        }
      }
      

      FYI:
      Hive's Parquet writer always uses this schema, and reader can read only from this schema, i.e. current Parquet support of SparkSQL is not compatible with Hive.

      NOTICE:
      If Hive compatiblity is top priority, we also have to use this schma regardless of containsNull, which will break backward compatibility.
      But using this schema could affect performance.

      Attachments

        Activity

          People

            ueshin Takuya Ueshin
            ueshin Takuya Ueshin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: