Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38599

support load json file in case-insensitive way

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.1
    • None
    • Input/Output, SQL
    • None

    Description

      The task is to load json files into dataFrame.

       

      Currently we use this method:

      // textfile is rdd[string], read from json files

      val table = spark.table(hiveTableName)
      val hiveSchema = table.schema
      var df = spark.read.option("mode", "DROPMALFORMED").schema(hiveSchema).json(textfile)

       

      The problem is that the field in hiveSchema is all in lower-case,  however the field of json string have upper case. 

      For example:

      hive schema:

      (id  bigint,  name string)

       

      json string

      {"Id":123, "Name":"Tom"}

       

      in this case,  the json string will not be loaded into dataFrame

      I have to use the schema of hive table, due to business requirement, that's the pre-condition.

      currently I have to transform the key in json string to lower case, like {"id":123, "name":"Tom"}

       

      but I was wondering if there's any better solution for this issue?

      Attachments

        Activity

          People

            Unassigned Unassigned
            makeboluo TANG ZHAO
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: