Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23772

Provide an option to ignore column of all null values or empty map/array during JSON schema inference

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.4.0
    • SQL
    • None

    Description

      It is common that we convert data from JSON source to structured format periodically. In the initial batch of JSON data, if a field's values are always null, Spark infers this field as StringType. However, in the second batch, one non-null value appears in this field and its type turns out to be not StringType. Then merge schema failed because schema inconsistency.

      This also applies to empty arrays and empty objects. My proposal is providing an option in Spark JSON source to omit those fields until we see a non-null value.

      This is similar to SPARK-12436 but the proposed solution is different.

      cc: rxin smilegator

      Attachments

        Issue Links

          Activity

            People

              maropu Takeshi Yamamuro
              mengxr Xiangrui Meng
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: