Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3333

Specified SerDe does not get used when executing a query over JSON data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • None
    • None
    • None

    Description

      I found a JSON SerDe that I wanted to try out, and I ran into some issues attempting to use it. The script I was executing looks like this:

      ADD JAR /home/natty/hive-test-case/hive-json-serde-0.2.jar;
      CREATE TABLE bar (
      id INT,
      integers ARRAY<INT>,
      datum STRING
      ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde';

      LOAD DATA LOCAL INPATH '/home/natty/sample_data/json.sample' OVERWRITE INTO TABLE bar;

      SELECT * FROM bar;

      The data I loaded in looks like this:

      { "id": 1, "integers": [ 1, 2, 3 ], "datum": "hello" }

      ,

      When the "SELECT * FROM bar" query executes, it returns with a failure:

      hive> ADD JAR /home/natty/hive-test-case/hive-json-serde-0.2.jar;
      Added /home/natty/hive-test-case/hive-json-serde-0.2.jar to class path
      Added resource: /home/natty/hive-test-case/hive-json-serde-0.2.jar
      hive> SELECT * FROM bar;
      OK
      Failed with exception java.io.IOException:java.lang.ClassCastException: org.json.JSONArray cannot be cast to [Ljava.lang.Object;
      Time taken: 2.335 seconds

      Now, this alone doesn't bother me. What bothers me is that, if I look at the log file, I see the following exception:

      2012-08-03 13:12:11,407 ERROR CliDriver (SessionState.java:printError(380)) - Failed with exception java.io.IOException:java.lang.ClassCastException: org.json.JSONArray cannot be cast to [Ljava.lang.Object;
      java.io.IOException: java.lang.ClassCastException: org.json.JSONArray cannot be cast to [Ljava.lang.Object;
      at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:173)
      at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1383)
      at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:266)
      at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
      at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
      at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
      Caused by: java.lang.ClassCastException: org.json.JSONArray cannot be cast to [Ljava.lang.Object;
      at org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:98)
      at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:287)
      at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:213)
      at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:59)
      at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:365)
      at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:163)
      ... 11 more

      Note that this exception indicates that Hive is executing code for the DelimitedJSONSerDe, rather than the one that I specified (JsonSerde from the jar file). Seems incorrect.

      Attachments

        1. hive-test-case.tar.gz
          107 kB
          Jonathan Natkins

        Activity

          People

            Unassigned Unassigned
            natty Jonathan Natkins
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: