Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-2289

NumberFormatException with respect to _offsets when running a query with index

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 0.7.0
    • None
    • Indexing
    • None
    • RedHat 5

    • indexing hive

    Description

      I am having a table named foo with columns origin, destination and information.

      Steps I followed to create index named foosample for foo,

      1)create index foosample on table foo(origin) as 'compact' with deferred rebuild;
      2)alter index foosample on foo rebuild;
      3)insert overwrite directory "/tmp/index_result" select 'bucketname','_offsets' from defaultfoo_foosample_ where origin='WAW';
      4)set hive.index.compact.file=/tmp/index_result;
      5)set hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
      6)select * from foo where origin='WAW';

      Total MapReduce jobs = 1
      Launching Job 1 out of 1
      Number of reduce tasks is set to 0 since there's no reduce operator
      java.lang.NumberFormatException: For input string: "_offsets"
      at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
      at java.lang.Long.parseLong(Long.java:410)
      at java.lang.Long.parseLong(Long.java:468)
      at org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexResult.add(HiveCompactIndexResult.java:158)
      at org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexResult.<init>(HiveCompactIndexResult.java:107)
      at org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat.getSplits(HiveCompactIndexInputFormat.java:89)
      at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
      at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
      at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
      at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:657)
      at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)
      at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
      at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
      at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
      at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
      at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
      at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
      at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
      Job Submission failed with exception 'java.lang.NumberFormatException(For input string: "_offsets")'
      FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask

      Steps 2 and 3 ran a successful mapreduce job and also the table default_foo_foosample_ (index table) has data with three columns origin, _bucketname and _offsets.

      Thanks,
      Siddharth

      Attachments

        Activity

          People

            Unassigned Unassigned
            siddharth siddharth ramanan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: