Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31139

Fileformat datasources (ORC, Json) case sensitivity regressions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Invalid
    • 3.0.0
    • None
    • SQL
    • None

    Description

      In addition to https://issues.apache.org/jira/browse/SPARK-31116

      Not only parquet, json and orc also have case sensitivity issues.

      Following demonstrate test failure based SPARK-31116's test cases. (diff of FileBasedDataSourceSuite is in attachement)


       

      [info] - SPARK-31116: Select simple columns correctly in case insensitive manner *** FAILED *** (4 seconds, 277 milliseconds) [info] Results do not match for query: [info] Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] [info] Timezone Env: [info] [info] == Parsed Logical Plan == [info] Relation[camelcase#56] json [info] [info] == Analyzed Logical Plan == [info] camelcase: string [info] Relation[camelcase#56] json [info] [info] == Optimized Logical Plan == [info] Relation[camelcase#56] json [info] [info] == Physical Plan == [info] FileScan json [camelcase#56] Batched: false, DataFilters: [], Format: JSON, Location: InMemoryFileIndex[file:/Users/kimtkyeom/Dev/spark_devel/target/tmp/spark-95f1357a-85c9-444f-bdcc-..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<camelcase:string> [info] [info] == Results == [info] [info] == Results == [info] !== Correct Answer - 1 == == Spark Answer - 1 == [info] !struct<> struct<camelcase:string> [info] ![A] [null] (QueryTest.scala:248)
      
      
      
      [info] - SPARK-31116: Select nested columns correctly in case insensitive manner *** FAILED *** (2 seconds, 117 milliseconds) [info] Results do not match for query: [info] Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] [info] Timezone Env: [info] [info] == Parsed Logical Plan == [info] Relation[StructColumn#147] json [info] [info] == Analyzed Logical Plan == [info] StructColumn: struct<LowerCase:bigint,camelcase:bigint> [info] Relation[StructColumn#147] json [info] [info] == Optimized Logical Plan == [info] Relation[StructColumn#147] json [info] [info] == Physical Plan == [info] FileScan json [StructColumn#147] Batched: false, DataFilters: [], Format: JSON, Location: InMemoryFileIndex[file:/Users/kimtkyeom/Dev/spark_devel/target/tmp/spark-f9ecd1a4-e5aa-4dd7-bdfd-..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<StructColumn:struct<LowerCase:bigint,camelcase:bigint>> [info] [info] == Results == [info] [info] == Results == [info] !== Correct Answer - 1 == == Spark Answer - 1 == [info] !struct<> struct<StructColumn:struct<LowerCase:bigint,camelcase:bigint>> [info] ![[0,1]] [[null,null]] (QueryTest.scala:248)
      
      
      
      [info] - SPARK-31116: Select nested columns correctly in case sensitive manner *** FAILED *** (871 milliseconds) [info] Results do not match for query: [info] Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] [info] Timezone Env: [info] [info] == Parsed Logical Plan == [info] Relation[StructColumn#329] json [info] [info] == Analyzed Logical Plan == [info] StructColumn: struct<LowerCase:bigint,camelcase:bigint> [info] Relation[StructColumn#329] json [info] [info] == Optimized Logical Plan == [info] Relation[StructColumn#329] json [info] [info] == Physical Plan == [info] FileScan json [StructColumn#329] Batched: false, DataFilters: [], Format: JSON, Location: InMemoryFileIndex[file:/Users/kimtkyeom/Dev/spark_devel/target/tmp/spark-612baf76-a9d0-41e5-89f4-..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<StructColumn:struct<LowerCase:bigint,camelcase:bigint>> [info] [info] == Results == [info] [info] == Results == [info] !== Correct Answer - 1 == == Spark Answer - 1 == [info] !struct<> struct<StructColumn:struct<LowerCase:bigint,camelcase:bigint>> [info] ![null] [[null,null]] (QueryTest.scala:248)
      

       

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kimtkyeom Tae-kyeom, Kim
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: