Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: 0.11.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Release Note:
      I'm not able to reproduce this behavior. I now suspect the input data set was empty, causing this runtime exception.

      Description

      I'd like to use Hive to generate HFiles for HBase. I started off by following the instructions on the wiki, but that took me only so far. TotalOrderPartitioning didn't work. That took me to this post which points out that Hive partitions on value instead of key. A patched TOP brings me to this error:

      2013-05-17 21:00:47,781 WARN org.apache.hadoop.mapred.Child: Error running child
      java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.IOException: No files found in hdfs://ip-10-191-3-134.ec2.internal:8020/tmp/hive-hrt_qa/hive_2013-05-17_20-58-58_357_6896546413926013201/_task_tmp.-ext-10000/_tmp.000000_0
      	at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:317)
      	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:532)
      	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
      	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
      	at org.apache.hadoop.mapred.Child.main(Child.java:249)
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: No files found in hdfs://ip-10-191-3-134.ec2.internal:8020/tmp/hive-hrt_qa/hive_2013-05-17_20-58-58_357_6896546413926013201/_task_tmp.-ext-10000/_tmp.000000_0
      	at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:183)
      	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:865)
      	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
      	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
      	at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:309)
      	... 7 more
      Caused by: java.io.IOException: No files found in hdfs://ip-10-191-3-134.ec2.internal:8020/tmp/hive-hrt_qa/hive_2013-05-17_20-58-58_357_6896546413926013201/_task_tmp.-ext-10000/_tmp.000000_0
      	at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:142)
      	at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:180)
      	... 11 more
      
      1. 00_tables.ddl
        1 kB
        Nick Dimiduk
      2. 01_sample.hql
        0.6 kB
        Nick Dimiduk
      3. 02_hfiles.hql
        0.4 kB
        Nick Dimiduk
      4. hive-partitioner.patch
        0.7 kB
        Nick Dimiduk

        Activity

        Hide
        Nick Dimiduk added a comment -

        bump. updating wiki link.

        Show
        Nick Dimiduk added a comment - bump. updating wiki link.
        Hide
        Nick Dimiduk added a comment -

        hadoop-1.2.0

        Show
        Nick Dimiduk added a comment - hadoop-1.2.0
        Hide
        Navis added a comment -

        Can I ask the hadoop version you've used?

        Show
        Navis added a comment - Can I ask the hadoop version you've used?
        Hide
        Nick Dimiduk added a comment -

        This is the patch to TOP to support Hive.

        Show
        Nick Dimiduk added a comment - This is the patch to TOP to support Hive.
        Hide
        Nick Dimiduk added a comment -

        These are my steps to reproduce:

        ## load the input data
        $ wget http://dumps.wikimedia.org/other/pagecounts-raw/2008/2008-10/pagecounts-20081001-000000.gz
        $ hadoop fs -mkdir /tmp/wikistats
        $ hadoop fs -put pagecounts-20081001-000000.gz /tmp/wikistats/
        
        ## create the necessary tables.
        $ hcat -f /tmp/00_tables.ddl
        OK
        Time taken: 1.886 seconds
        OK
        Time taken: 0.654 seconds
        OK
        Time taken: 0.047 seconds
        OK
        Time taken: 0.115 seconds
        
        ## verify
        $ hive -e "select * from pagecounts limit 10;"
        ...
        OK
        aa      Main_Page       4       41431
        aa      Special:ListUsers       1       5555
        aa      Special:Listusers       1       1052
        ...
        $ hive -e "select * from pgc limit 10;"
        ...
        OK
        aa/Main_Page/20081001-000000    4       41431
        aa/Special:ListUsers/20081001-000000    1       5555
        aa/Special:Listusers/20081001-000000    1       1052
        ...
        
        ## produce the hfile splits file
        $ hive -f /tmp/01_sample.hql
        ...
        OK
        Time taken: 54.681 seconds
        [hrt_qa] $ hadoop fs -ls /tmp/hbase_splits
        Found 1 items
        -rwx------   3 hrt_qa hdfs        270 2013-05-17 19:05 /tmp/hbase_splits
        
        ## verify
        $ hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-streaming-1.2.0.1.3.0.0-104.jar -libjars /usr/lib/hive/lib/hive-exec-0.11.0.1.3.0.0-104.jar -input /tmp/hbase_splits -output /tmp/hbase_splits_txt -inputformat SequenceFileAsTextInputFormat
        ...
        13/05/17 19:08:38 INFO streaming.StreamJob: Output: /tmp/hbase_splits_txt
        $ hadoop fs -cat /tmp/hbase_splits_txt/*
        01 61 66 2e 71 2f 4d 61 69 6e 5f 50 61 67 65 2f 32 30 30 38 31 30 30 31 2d 30 30 30 30 30 30 00 (null)
        01 61 66 2f 31 35 35 30 2f 32 30 30 38 31 30 30 31 2d 30 30 30 30 30 30 00      (null)
        01 61 66 2f 32 38 5f 4d 61 61 72 74 2f 32 30 30 38 31 30 30 31 2d 30 30 30 30 30 30 00  (null)
        01 61 66 2f 42 65 65 6c 64 3a 31 30 30 5f 31 38 33 30 2e 4a 50 47 2f 32 30 30 38 31 30 30 31 2d 30 30 30 30 30 30 00    (null)
        
        ## decoding the first line from utf8 bytes to String yields "af.q/Main_Page/20081001-000000," which is correct
        
        ## generate the hfiles
        $ HADOOP_CLASSPATH=/usr/lib/hbase/hbase-0.94.6.1.3.0.0-104-security.jar hive -f /tmp/02_hfiles.hql
        
        Show
        Nick Dimiduk added a comment - These are my steps to reproduce: ## load the input data $ wget http://dumps.wikimedia.org/other/pagecounts-raw/2008/2008-10/pagecounts-20081001-000000.gz $ hadoop fs -mkdir /tmp/wikistats $ hadoop fs -put pagecounts-20081001-000000.gz /tmp/wikistats/ ## create the necessary tables. $ hcat -f /tmp/00_tables.ddl OK Time taken: 1.886 seconds OK Time taken: 0.654 seconds OK Time taken: 0.047 seconds OK Time taken: 0.115 seconds ## verify $ hive -e "select * from pagecounts limit 10;" ... OK aa Main_Page 4 41431 aa Special:ListUsers 1 5555 aa Special:Listusers 1 1052 ... $ hive -e "select * from pgc limit 10;" ... OK aa/Main_Page/20081001-000000 4 41431 aa/Special:ListUsers/20081001-000000 1 5555 aa/Special:Listusers/20081001-000000 1 1052 ... ## produce the hfile splits file $ hive -f /tmp/01_sample.hql ... OK Time taken: 54.681 seconds [hrt_qa] $ hadoop fs -ls /tmp/hbase_splits Found 1 items -rwx------ 3 hrt_qa hdfs 270 2013-05-17 19:05 /tmp/hbase_splits ## verify $ hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-streaming-1.2.0.1.3.0.0-104.jar -libjars /usr/lib/hive/lib/hive-exec-0.11.0.1.3.0.0-104.jar -input /tmp/hbase_splits -output /tmp/hbase_splits_txt -inputformat SequenceFileAsTextInputFormat ... 13/05/17 19:08:38 INFO streaming.StreamJob: Output: /tmp/hbase_splits_txt $ hadoop fs -cat /tmp/hbase_splits_txt/* 01 61 66 2e 71 2f 4d 61 69 6e 5f 50 61 67 65 2f 32 30 30 38 31 30 30 31 2d 30 30 30 30 30 30 00 (null) 01 61 66 2f 31 35 35 30 2f 32 30 30 38 31 30 30 31 2d 30 30 30 30 30 30 00 (null) 01 61 66 2f 32 38 5f 4d 61 61 72 74 2f 32 30 30 38 31 30 30 31 2d 30 30 30 30 30 30 00 (null) 01 61 66 2f 42 65 65 6c 64 3a 31 30 30 5f 31 38 33 30 2e 4a 50 47 2f 32 30 30 38 31 30 30 31 2d 30 30 30 30 30 30 00 (null) ## decoding the first line from utf8 bytes to String yields "af.q/Main_Page/20081001-000000," which is correct ## generate the hfiles $ HADOOP_CLASSPATH=/usr/lib/hbase/hbase-0.94.6.1.3.0.0-104-security.jar hive -f /tmp/02_hfiles.hql

          People

          • Assignee:
            Unassigned
            Reporter:
            Nick Dimiduk
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development