Pig
  1. Pig
  2. PIG-2085

HBaseStorage fails with multiple STORE statements

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Scripts with multiple STORE statements using HBaseStorage fail when run against a cluster (they succeed in local mode). Below is an example script:

      raw = LOAD 'hbase_split_load_bug.txt' AS
            (f1: chararray, f2:chararray);
      
      SPLIT raw INTO apples IF (f2 == 'apple'), oranges IF (f2 == 'orange');
      
      STORE apples INTO 'hbase://test_table'
         USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:apple');
      
      STORE oranges INTO 'hbase://test_table'
         USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:orange');
      

      The server throws the following exception after apples is successfully stored:

      Backend error message
      ---------------------
      java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@6273305c closed
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:566)
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1113)
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchOfPuts(HConnectionManager.java:1233)
              at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:819)
              at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.close(TableOutputFormat.java:106)
              at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReducePOStoreImpl.tearDown(MapReducePOStoreImpl.java:96)
              at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.tearDown(POStore.java:122)
              at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.cleanup(PigMapBase.java:128)
              at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
              at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
              at org.apache.hadoop.mapred.Child.main(Child.java:170)
      
      1. PIG-2085_example_input.txt
        0.1 kB
        Bill Graham
      2. PIG-2085_example_script.pig
        0.5 kB
        Bill Graham
      3. PIG-2085_schema.hbase
        0.1 kB
        Bill Graham

        Activity

        Hide
        Bill Graham added a comment -

        Attaching scripts to create the HBase table and to reproduce, along with sample input data.

        Show
        Bill Graham added a comment - Attaching scripts to create the HBase table and to reproduce, along with sample input data.
        Hide
        Dmitriy V. Ryaboy added a comment -

        I bet HBaseOutputFormat gets confused when the Pig does its optimizations and tries to do 2 stores in 1 reduce phase.

        Show
        Dmitriy V. Ryaboy added a comment - I bet HBaseOutputFormat gets confused when the Pig does its optimizations and tries to do 2 stores in 1 reduce phase.
        Hide
        Bill Graham added a comment -

        From discussions on the HBase list, I think this could be an issue in TableOutputFormat in 0.90, where closing the connection on one table killed the connections for all tables:

        http://mail-archives.apache.org/mod_mbox/hbase-user/201105.mbox/%3cBANLkTimCXKvtPAqi-HY2uT-h434xub8SNA@mail.gmail.com%3e

        If anyone has an HBase cluster running off the trunk to test this theory on (we're still on 0.90), please do so with the attached scripts and report back. HBASE-3777 is the relevant fix.

        Show
        Bill Graham added a comment - From discussions on the HBase list, I think this could be an issue in TableOutputFormat in 0.90, where closing the connection on one table killed the connections for all tables: http://mail-archives.apache.org/mod_mbox/hbase-user/201105.mbox/%3cBANLkTimCXKvtPAqi-HY2uT-h434xub8SNA@mail.gmail.com%3e If anyone has an HBase cluster running off the trunk to test this theory on (we're still on 0.90), please do so with the attached scripts and report back. HBASE-3777 is the relevant fix.
        Hide
        Dmitriy V. Ryaboy added a comment -

        We will likely be upgrading to 0.93 this week, I'll test once we do.

        Show
        Dmitriy V. Ryaboy added a comment - We will likely be upgrading to 0.93 this week, I'll test once we do.
        Hide
        Royston Sellman added a comment -

        9 months since last comment but in case it's still relevant: we are running HBase off trunk and this test PASSES using Pig 0.9.2.

        Show
        Royston Sellman added a comment - 9 months since last comment but in case it's still relevant: we are running HBase off trunk and this test PASSES using Pig 0.9.2.
        Hide
        Dmitriy V. Ryaboy added a comment -

        Just doing some housecleaning.

        Show
        Dmitriy V. Ryaboy added a comment - Just doing some housecleaning.
        Hide
        Kevin Lion added a comment -

        Using HBase 0.90.3 and Pig 0.9.2 : the bug is still here.

        Show
        Kevin Lion added a comment - Using HBase 0.90.3 and Pig 0.9.2 : the bug is still here.
        Hide
        Dmitriy V. Ryaboy added a comment -

        Kevin,
        This is an HBase bug as described above. The HBase bug was fixed in 0.92 (not 0.90.2).

        Show
        Dmitriy V. Ryaboy added a comment - Kevin, This is an HBase bug as described above. The HBase bug was fixed in 0.92 (not 0.90.2).

          People

          • Assignee:
            Bill Graham
            Reporter:
            Bill Graham
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development