Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-502

Hive ORC read INT, BIGINT as NULL for Data created by Spark

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None
    • None

    Description

      Preconditions

      Create file ratings.csv and put it to HDFS /user/test/rating/ratings.csv.

      userId,movieId,rating,timestamp
      1,2,4.5,1784325658
      

      See appropriate data.orc file in attachment.

      STR:

      1. Using spark (tested on version 2.2.1 and 2.3.1) created dataframe(df) of using interSchema from a CSV file

      val df =spark.read.format("csv").option("header","true").option("inferSchema","true").load("/user/test/rating/ratings.csv")
      

      2. Now save the df into ORC format file.

      df.write.format("orc").save("/user/test/spark_rating_orc_typesafe")
      

      3. Using hive 2.3. Try creating hive external table respective.

      create external table rating_orc_hive_type_1(userId int,movieId int,rating double, `timestamp` int) stored as ORC location "/user/test/spark_orc_rating_typesafe/";
      

      4. Do query

      select * from rating_orc_hive_type_1;
      

      Only double value is printed. Null for integer and even for BIGINT.

      OK
      NULL    NULL    4.5     1784325658
      

      Attachments

        1. data.orc
          0.5 kB
          Oleksiy Sayankin

        Issue Links

          Activity

            People

              Unassigned Unassigned
              osayankin Oleksiy Sayankin
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: