Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6670

ClassNotFound with Serde

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.12.0
    • 0.13.0
    • None
    • None

    Description

      We are finding a ClassNotFound exception when we use CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table.
      This is happening because MapredLocalTask does not pass the local added jars to ExecDriver when that is launched.
      ExecDriver's classpath does not include the added jars. Therefore, when the plan is deserialized, it throws a ClassNotFoundException in the deserialization code, and results in a TableDesc object with a Null DeserializerClass.
      This results in an NPE during Fetch.
      Steps to reproduce:
      wget https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar into somewhere local eg. /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar.
      Place some sample SCV files in HDFS as follows:
      hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/
      hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/
      hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/
      hdfs dfs -put /home/soam/sampleJoinTarget.csv /user/soam/HiveSerdeIssue/sampleJoinTarget/
      ====
      create the tables in hive:
      ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
      create external table sampleCSV (md5hash string, filepath string)
      row format serde 'com.bizo.hive.serde.csv.CSVSerde'
      stored as textfile
      location '/user/soam/HiveSerdeIssue/sampleCSV/'
      ;
      create external table sampleJoinTarget (md5hash string, filepath string, datestamp string, nblines string, nberrors string)
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
      LINES TERMINATED BY '\n'
      STORED AS TEXTFILE
      LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/'
      ;
      ===============
      Now, try the following JOIN:
      ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
      SELECT
      sampleCSV.md5hash,
      sampleCSV.filepath
      FROM sampleCSV
      JOIN sampleJoinTarget
      ON (sampleCSV.md5hash = sampleJoinTarget.md5hash)
      ;

      This will fail with the error:
      Execution log at: /tmp/soam/.log
      java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde
      Continuing ...
      2014-03-11 10:35:03 Starting to launch local task to process map join; maximum memory = 238551040
      Execution failed with exit status: 2
      Obtaining error information
      Task failed!
      Task ID:
      Stage-4
      Logs:
      /var/log/hive/soam/hive.log
      FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
      Try the following LEFT JOIN. This will work:
      SELECT
      sampleCSV.md5hash,
      sampleCSV.filepath
      FROM sampleCSV
      LEFT JOIN sampleJoinTarget
      ON (sampleCSV.md5hash = sampleJoinTarget.md5hash)
      ;
      ==

      Attachments

        1. HIVE-6670-branch-0.12.patch
          2 kB
          Abin Shahab
        2. HIVE-6670.patch
          2 kB
          Abin Shahab
        3. HIVE-6670.1.patch
          5 kB
          Ashutosh Chauhan

        Issue Links

          Activity

            People

              ashahab Abin Shahab
              ashahab Abin Shahab
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: