Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-19580

Hive 2.3.2 with ORC files & stored on S3 are case sensitive on EMR

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 2.3.2
    • Fix Version/s: 2.3.2
    • Component/s: None
    • Labels:
      None
    • Environment:

      EMR s3:// connector

      Spark 2.3 but also true for lower versions

      Hive 2.3.2

    • Target Version/s:

      Description

      Original file is csv:

      COL1,COL2
      1,2

      ORC file are created with Spark 2.3:

      scala> val df = spark.read.option("header","true").csv("/user/hadoop/file")

      scala> df.printSchema
      root

      – COL1: string (nullable = true)
      – COL2: string (nullable = true)

      scala> df.write.orc("s3://bucket/prefix")

      In Hive:

      hive> CREATE EXTERNAL TABLE test_orc(COL1 STRING, COL2 STRING) STORED AS ORC LOCATION ("s3://bucket/prefix");

      hive> SELECT * FROM test_orc;
      OK
      NULL NULL

      Everyfield is null. However if fields are generated using lower case in Spark schemas then everything works.

      The reason why I'm raising this bug is that we have customers using Hive 2.3.2 to read files we generate through Spark and all our code base is addressing fields using upper case while this is incompatible with their Hive instance.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              artb Arthur Baudry
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: