Pig
  1. Pig
  2. PIG-2193

Using HBaseStorage to scan 2 tables in the same Map job produces bad data

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.1
    • Fix Version/s: 0.9.1, 0.10.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      HBase 0.90.3, Hadoop 0.20-append

    • Release Note:
      Joining two HBase tables sometimes failed because mappers got confused about which of the tables they were supposed to scan.

      Description

      I've some data in HBase 0.90.3 and I run a simple script on them.

      This script badly returns 0 records. From time to time, under yet undefined conditions, the same script on the same data works (it return correct data).

      When data are loaded from HDFS instead of HBase, the script runs perfectly.

      Here is the script loading from HDFS (works):

      start_sessions = LOAD 'start_sessions' AS (sid:chararray, infoid:chararray, imei:chararray, start:long);
      end_sessions = LOAD 'end_sessions' AS (sid:chararray, end:long, locid:chararray);
      infos = LOAD 'infos' AS (infoid:chararray, network_type:chararray, network_subtype:chararray, locale:chararray, version_name:chararray, carrier_country:chararray, carrier_name:chararray, phone_manufacturer:chararray, phone_model:chararray, firmware_version:chararray, firmware_name:chararray);
      sessions = JOIN start_sessions BY sid, end_sessions BY sid;
      sessions = FILTER sessions BY end > start AND end - start < 86400000L;
      sessions = JOIN sessions BY infoid, infos BY infoid;
      sessions = LIMIT sessions 100;
      dump sessions;

      The same script loading from HBase (don't work):

      start_sessions = LOAD 'startSession' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:infoid meta:imei meta:timestamp') AS (sid:chararray, infoid:chararray, imei:chararray, start:long);
      end_sessions = LOAD 'endSession' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:timestamp meta:locid') AS (sid:chararray, end:long, locid:chararray);
      infos = LOAD 'info' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:infoid data:networkType data:networkSubtype data:locale data:applicationVersionName data:carrierCountry data:carrierName data:phoneManufacturer data:phoneModel data:firmwareVersion data:firmwareName') AS (infoid:chararray, network_type:chararray, network_subtype:chararray, locale:chararray, version_name:chararray, carrier_country:chararray, carrier_name:chararray, phone_manufacturer:chararray, phone_model:chararray, firmware_version:chararray, firmware_name:chararray);
      sessions = JOIN start_sessions BY sid, end_sessions BY sid;
      sessions = FILTER sessions BY end > start AND end - start < 86400000L;
      sessions = JOIN sessions BY infoid, infos BY infoid;
      sessions = LIMIT sessions 100;
      dump sessions;

      I guess it definitively means there is a nasty bug in the HBase loader.

      1. PIG-2193.patch
        9 kB
        Raghu Angadi
      2. PIG-2193.patch
        1 kB
        Raghu Angadi

        Activity

        Hide
        Vincent BARAT added a comment -

        Interesting:

        I've reduced my request to:

        start_sessions = LOAD 'start_sessions' AS (sid:chararray, infoid:chararray, imei:chararray, start:long);
        end_sessions = LOAD 'end_sessions' AS (sid:chararray, end:long, locid:chararray);
        sessions = JOIN start_sessions BY sid, end_sessions BY sid;
        sessions = FILTER sessions BY end > start AND end - start < 86400000L;
        dump sessions;

        If I dump 'sessions' before the FILTER statement, I correctly values for the 'infoid' and 'locid' field.

        (00000A2A33254B8FAE1E9AEAB2428EBE,b87ac86bcf1d4cb44202aa826554a7b2,4e77d62e1839a470ec8386d42b85a076,1310649832970,00000A2A33254B8FAE1E9AEAB2428EBE,1310649838390,)
        (00001DCECDC842C0A745C151B9EC295F,4a4bb0fff26e368c8209f1e480fdf70b,db3924d2e4b88bd103fa19aaa30a9af4,1310628836846,00001DCECDC842C0A745C151B9EC295F,1310628839075,)
        (00001F8F2B3148D393963928188C72B6,5d7e58f68366b55d55862815f863a996,79e1ba90aa555e3e1041df4be657a11d,1310681918742,00001F8F2B3148D393963928188C72B6,1310681949182,)
        

        But if I dump 'sessions' after the FILTER statement, I have no values for the 'infoid' and 'locid' fields.

        (00000A2A33254B8FAE1E9AEAB2428EBE,,,1310649832970,00000A2A33254B8FAE1E9AEAB2428EBE,1310649838390,)
        (00001DCECDC842C0A745C151B9EC295F,,,1310628836846,00001DCECDC842C0A745C151B9EC295F,1310628839075,)
        (00001F8F2B3148D393963928188C72B6,,,1310681918742,00001F8F2B3148D393963928188C72B6,1310681949182,)
        

        so the next JOIN return an empty table.

        Show
        Vincent BARAT added a comment - Interesting: I've reduced my request to: start_sessions = LOAD 'start_sessions' AS (sid:chararray, infoid:chararray, imei:chararray, start:long); end_sessions = LOAD 'end_sessions' AS (sid:chararray, end:long, locid:chararray); sessions = JOIN start_sessions BY sid, end_sessions BY sid; sessions = FILTER sessions BY end > start AND end - start < 86400000L; dump sessions; If I dump 'sessions' before the FILTER statement, I correctly values for the 'infoid' and 'locid' field. (00000A2A33254B8FAE1E9AEAB2428EBE,b87ac86bcf1d4cb44202aa826554a7b2,4e77d62e1839a470ec8386d42b85a076,1310649832970,00000A2A33254B8FAE1E9AEAB2428EBE,1310649838390,) (00001DCECDC842C0A745C151B9EC295F,4a4bb0fff26e368c8209f1e480fdf70b,db3924d2e4b88bd103fa19aaa30a9af4,1310628836846,00001DCECDC842C0A745C151B9EC295F,1310628839075,) (00001F8F2B3148D393963928188C72B6,5d7e58f68366b55d55862815f863a996,79e1ba90aa555e3e1041df4be657a11d,1310681918742,00001F8F2B3148D393963928188C72B6,1310681949182,) But if I dump 'sessions' after the FILTER statement, I have no values for the 'infoid' and 'locid' fields. (00000A2A33254B8FAE1E9AEAB2428EBE,,,1310649832970,00000A2A33254B8FAE1E9AEAB2428EBE,1310649838390,) (00001DCECDC842C0A745C151B9EC295F,,,1310628836846,00001DCECDC842C0A745C151B9EC295F,1310628839075,) (00001F8F2B3148D393963928188C72B6,,,1310681918742,00001F8F2B3148D393963928188C72B6,1310681949182,) so the next JOIN return an empty table.
        Hide
        Vincent BARAT added a comment -

        Of course if I do the same test reading data from HDFS instead of HBase, I got the good values for the 'infoid' and 'locid' fields in both case.

        Show
        Vincent BARAT added a comment - Of course if I do the same test reading data from HDFS instead of HBase, I got the good values for the 'infoid' and 'locid' fields in both case.
        Hide
        Bill Graham added a comment -

        @Vincent, can you please try applying the patch in PIG-2174 and report back if it makes a difference?

        Show
        Bill Graham added a comment - @Vincent, can you please try applying the patch in PIG-2174 and report back if it makes a difference?
        Hide
        Raghu Angadi added a comment -

        I am able to reproduced this with PIG-0.8.1. But in my case commenting out FILTER does not make a difference. I think it is because of the how HBase scanner is initialized on the backend. This script results in 2 mappers which are supposed to have two different scanners, but both are initialized from the same config. I will verify.

        Show
        Raghu Angadi added a comment - I am able to reproduced this with PIG-0.8.1. But in my case commenting out FILTER does not make a difference. I think it is because of the how HBase scanner is initialized on the backend. This script results in 2 mappers which are supposed to have two different scanners, but both are initialized from the same config. I will verify.
        Hide
        Raghu Angadi added a comment -

        The patch ignores the scanner conf in jobConf. This seems to fix my script.

        Show
        Raghu Angadi added a comment - The patch ignores the scanner conf in jobConf. This seems to fix my script.
        Hide
        Vincent BARAT added a comment -

        Do you have this patch adapted for PIG 0.8.1, apparently it does not apply directly

        Show
        Vincent BARAT added a comment - Do you have this patch adapted for PIG 0.8.1, apparently it does not apply directly
        Hide
        Vincent BARAT added a comment -

        We have tried PIG trunk patched with PIG-2174 without success: same behavior.
        The patch does not apply on PIG 0.8.1, did you adapt it ?

        Show
        Vincent BARAT added a comment - We have tried PIG trunk patched with PIG-2174 without success: same behavior. The patch does not apply on PIG 0.8.1, did you adapt it ?
        Hide
        Raghu Angadi added a comment -

        PIG-2193.patch I attached is for 0.8 branch. It applies fine. Did you try it on clean 0.8.1? If you want to try on your code, your just need to comment out 3 lines in HBaseStorage::setLocation().

        Show
        Raghu Angadi added a comment - PIG-2193 .patch I attached is for 0.8 branch. It applies fine. Did you try it on clean 0.8.1? If you want to try on your code, your just need to comment out 3 lines in HBaseStorage::setLocation().
        Hide
        Dmitriy V. Ryaboy added a comment -

        Moving the trace from ticket description to a comment (long descriptions make the ticket hard to work with in jira).

        Here is the PIG dump for the HBase version:

        aws09:~# pig
        2011-07-28 08:17:36,329 [main] INFO org.apache.pig.Main - Logging error messages to: /root/pig_1311841056328.log
        2011-07-28 08:17:36,641 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://aws09.preprod.ubithere.com:9000
        2011-07-28 08:17:36,923 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: aws09.preprod.ubithere.com:9001
        grunt> start_sessions = LOAD 'startSession.mde253811.preprod.ubithere.com' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:infoid meta:imei meta:timestamp') AS (sid:chararray, infoid:chararray, imei:chararray, start:long);
        grunt> end_sessions = LOAD 'endSession.mde253811.preprod.ubithere.com' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:timestamp meta:locid') AS (sid:chararray, end:long, locid:chararray);
        grunt> infos = LOAD 'info.mde253811.preprod.ubithere.com' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:infoid data:networkType data:networkSubtype data:locale data:applicationVersionName data:carrierCountry data:carrierName data:phoneManufacturer data:phoneModel data:firmwareVersion data:firmwareName') AS (infoid:chararray, network_type:chararray, network_subtype:chararray, locale:chararray, version_name:chararray, carrier_country:chararray, carrier_name:chararray, phone_manufacturer:chararray, phone_model:chararray, firmware_version:chararray, firmware_name:chararray);
        grunt> sessions = JOIN start_sessions BY sid, end_sessions BY sid;
        grunt> sessions = FILTER sessions BY end > start AND end - start < 86400000L;
        grunt> sessions = JOIN sessions BY infoid, infos BY infoid;
        grunt> sessions = LIMIT sessions 100;
        grunt> dump sessions;
        2011-07-28 08:17:50,275 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: HASH_JOIN,FILTER,LIMIT
        2011-07-28 08:17:50,275 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
        2011-07-28 08:17:51,213 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: sessions: Store(hdfs://aws09.preprod.ubithere.com:9000/tmp/temp-1404953096/tmp819396740:org.apache.pig.impl.io.InterStorage) - scope-93 Operator Key: scope-93)
        2011-07-28 08:17:51,225 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
        2011-07-28 08:17:51,281 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer - Rewrite: POPackage->POForEach to POJoinPackage
        2011-07-28 08:17:51,281 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer - Rewrite: POPackage->POForEach to POJoinPackage
        2011-07-28 08:17:51,350 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
        2011-07-28 08:17:51,350 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
        2011-07-28 08:17:51,402 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
        2011-07-28 08:17:51,411 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
        2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT
        2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:host.name=aws09.machine.com
        2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.version=1.6.0_22
        2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Sun Microsystems Inc.
        2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.home=/usr/lib/jvm/java-6-sun-1.6.0.22/jre
        2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.class.path=/opt/pig/bin/../conf:/usr/lib/jvm/java-6-sun/jre/lib/tools.jar:/opt/pig/bin/../pig-0.8.1-core.jar:/opt/pig/bin/../build/pig-*-SNAPSHOT.jar:/opt/pig/bin/../lib/commons-el-1.0.jar:/opt/pig/bin/../lib/commons-lang-2.4.jar:/opt/pig/bin/../lib/commons-logging-1.1.1.jar:/opt/pig/bin/../lib/guava-r06.jar:/opt/pig/bin/../lib/hbase-0.90.3.jar:/opt/pig/bin/../lib/hsqldb-1.8.0.10.jar:/opt/pig/bin/../lib/jackson-core-asl-1.0.1.jar:/opt/pig/bin/../lib/jackson-mapper-asl-1.0.1.jar:/opt/pig/bin/../lib/javacc-4.2.jar:/opt/pig/bin/../lib/javacc.jar:/opt/pig/bin/../lib/jetty-util-6.1.14.jar:/opt/pig/bin/../lib/jline-0.9.94.jar:/opt/pig/bin/../lib/joda-time-1.6.jar:/opt/pig/bin/../lib/jsch-0.1.38.jar:/opt/pig/bin/../lib/junit-4.5.jar:/opt/pig/bin/../lib/jython-2.5.0.jar:/opt/pig/bin/../lib/log4j-1.2.14.jar:/opt/pig/bin/../lib/pigudfs.jar:/opt/pig/bin/../lib/slf4j-log4j12-1.4.3.jar:/opt/pig/bin/../lib/zookeeper-3.3.2.jar:/opt/hadoop/conf_computation:/opt/hbase/conf:/opt/pig/lib/hadoop-0.20-append-core.jar
        2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.library.path=/usr/lib/jvm/java-6-sun-1.6.0.22/jre/lib/amd64/server:/usr/lib/jvm/java-6-sun-1.6.0.22/jre/lib/amd64:/usr/lib/jvm/java-6-sun-1.6.0.22/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
        2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp
        2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=<NA>
        2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.name=Linux
        2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.arch=amd64
        2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.version=2.6.21.7-2.fc8xen-ec2-v1.0
        2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.name=root
        2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.home=/root
        2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.dir=/root
        2011-07-28 08:17:51,471 [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=aws09.machine.com:2222 sessionTimeout=60000 watcher=hconnection
        2011-07-28 08:17:51,493 [main-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server aws09.machine.com/10.83.1.244:2222
        2011-07-28 08:17:51,499 [main-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to aws09.machine.com/10.83.1.244:2222, initiating session
        2011-07-28 08:17:51,508 [main-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server aws09.machine.com/10.83.1.244:2222, sessionid = 0x131617dada6054b, negotiated timeout = 60000
        2011-07-28 08:17:51,575 [main] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@ef894ce; hsa=aws03.machine.com:60020
        2011-07-28 08:17:51,687 [main] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
        2011-07-28 08:17:51,696 [main] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=endSession.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows
        2011-07-28 08:17:51,700 [main] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for endSession.mde253811.preprod.ubithere.com,,1311086199483.706685579 is aws03.machine.com:60020
        2011-07-28 08:17:51,726 [main] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=startSession.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows
        2011-07-28 08:17:51,729 [main] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for startSession.mde253811.preprod.ubithere.com,,1311086198252.1334391323 is aws03.machine.com:60020
        2011-07-28 08:17:53,328 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
        2011-07-28 08:17:53,335 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
        2011-07-28 08:17:53,335 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
        2011-07-28 08:17:53,442 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
        2011-07-28 08:17:53,944 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
        2011-07-28 08:17:53,989 [Thread-13] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=aws09.machine.com:2222 sessionTimeout=60000 watcher=hconnection
        2011-07-28 08:17:53,990 [Thread-13-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server aws09.machine.com/10.83.1.244:2222
        2011-07-28 08:17:53,991 [Thread-13-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to aws09.machine.com/10.83.1.244:2222, initiating session
        2011-07-28 08:17:53,996 [Thread-13-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server aws09.machine.com/10.83.1.244:2222, sessionid = 0x131617dada6054c, negotiated timeout = 60000
        2011-07-28 08:17:54,000 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2d9f90e3; hsa=aws03.machine.com:60020
        2011-07-28 08:17:54,005 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
        2011-07-28 08:17:54,006 [Thread-13] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=endSession.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows
        2011-07-28 08:17:54,011 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for endSession.mde253811.preprod.ubithere.com,,1311086199483.706685579 is aws03.machine.com:60020
        2011-07-28 08:17:54,017 [Thread-13] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=aws09.machine.com:2222 sessionTimeout=60000 watcher=hconnection
        2011-07-28 08:17:54,017 [Thread-13-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server aws09.machine.com/10.83.1.244:2222
        2011-07-28 08:17:54,018 [Thread-13-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to aws09.machine.com/10.83.1.244:2222, initiating session
        2011-07-28 08:17:54,025 [Thread-13-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server aws09.machine.com/10.83.1.244:2222, sessionid = 0x131617dada6054d, negotiated timeout = 60000
        2011-07-28 08:17:54,029 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@41f6321; hsa=aws03.machine.com:60020
        2011-07-28 08:17:54,032 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
        2011-07-28 08:17:54,033 [Thread-13] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=endSession.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows
        2011-07-28 08:17:54,037 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for endSession.mde253811.preprod.ubithere.com,,1311086199483.706685579 is aws03.machine.com:60020
        2011-07-28 08:17:54,039 [Thread-13] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=endSession.mde253811.preprod.ubithere.com,,00000000000000 for max=2147483647 rows
        2011-07-28 08:17:54,067 [Thread-13] DEBUG org.apache.hadoop.hbase.mapreduce.TableInputFormatBase - getSplits: split -> 0 -> aws03.machine.com:,
        2011-07-28 08:17:54,068 [Thread-13] INFO org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Got 1 splits.
        2011-07-28 08:17:54,068 [Thread-13] INFO org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Returning 1 splits.
        2011-07-28 08:17:54,109 [Thread-13] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=aws09.machine.com:2222 sessionTimeout=60000 watcher=hconnection
        2011-07-28 08:17:54,110 [Thread-13-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server aws09.machine.com/10.83.1.244:2222
        2011-07-28 08:17:54,111 [Thread-13-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to aws09.machine.com/10.83.1.244:2222, initiating session
        2011-07-28 08:17:54,119 [Thread-13-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server aws09.machine.com/10.83.1.244:2222, sessionid = 0x131617dada6054e, negotiated timeout = 60000
        2011-07-28 08:17:54,123 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@20c3e967; hsa=aws03.machine.com:60020
        2011-07-28 08:17:54,140 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
        2011-07-28 08:17:54,142 [Thread-13] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=startSession.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows
        2011-07-28 08:17:54,148 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for startSession.mde253811.preprod.ubithere.com,,1311086198252.1334391323 is aws03.machine.com:60020
        2011-07-28 08:17:54,154 [Thread-13] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=aws09.machine.com:2222 sessionTimeout=60000 watcher=hconnection
        2011-07-28 08:17:54,158 [Thread-13-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server aws09.machine.com/10.83.1.244:2222
        2011-07-28 08:17:54,159 [Thread-13-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to aws09.machine.com/10.83.1.244:2222, initiating session
        2011-07-28 08:17:54,161 [Thread-13-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server aws09.machine.com/10.83.1.244:2222, sessionid = 0x131617dada6054f, negotiated timeout = 60000
        2011-07-28 08:17:54,164 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@5ee771f3; hsa=aws03.machine.com:60020
        2011-07-28 08:17:54,167 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
        2011-07-28 08:17:54,169 [Thread-13] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=startSession.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows
        2011-07-28 08:17:54,172 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for startSession.mde253811.preprod.ubithere.com,,1311086198252.1334391323 is aws03.machine.com:60020
        2011-07-28 08:17:54,173 [Thread-13] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=startSession.mde253811.preprod.ubithere.com,,00000000000000 for max=2147483647 rows
        2011-07-28 08:17:54,180 [Thread-13] DEBUG org.apache.hadoop.hbase.mapreduce.TableInputFormatBase - getSplits: split -> 0 -> aws03.machine.com:,
        2011-07-28 08:17:54,180 [Thread-13] INFO org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Got 1 splits.
        2011-07-28 08:17:54,180 [Thread-13] INFO org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Returning 1 splits.
        2011-07-28 08:17:55,037 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201107251336_0314
        2011-07-28 08:17:55,037 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://aws09.preprod.ubithere.com:50030/jobdetails.jsp?jobid=job_201107251336_0314
        2011-07-28 08:19:06,924 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 8% complete
        2011-07-28 08:19:15,971 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 16% complete
        2011-07-28 08:19:18,985 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 19% complete
        2011-07-28 08:19:25,035 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 27% complete
        2011-07-28 08:20:14,810 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
        2011-07-28 08:20:14,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
        2011-07-28 08:20:14,830 [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=aws09.machine.com:2222 sessionTimeout=60000 watcher=hconnection
        2011-07-28 08:20:14,831 [main-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server aws09.machine.com/10.83.1.244:2222
        2011-07-28 08:20:14,832 [main-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to aws09.machine.com/10.83.1.244:2222, initiating session
        2011-07-28 08:20:14,838 [main-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server aws09.machine.com/10.83.1.244:2222, sessionid = 0x131617dada60556, negotiated timeout = 60000
        2011-07-28 08:20:14,842 [main] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@5d4fa79d; hsa=aws03.machine.com:60020
        2011-07-28 08:20:14,847 [main] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
        2011-07-28 08:20:14,849 [main] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=info.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows
        2011-07-28 08:20:14,852 [main] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for info.mde253811.preprod.ubithere.com,,1311086202955.1975990008 is aws03.machine.com:60020
        2011-07-28 08:20:16,311 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
        2011-07-28 08:20:16,324 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=198330658
        2011-07-28 08:20:16,324 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
        2011-07-28 08:20:16,341 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
        2011-07-28 08:20:16,656 [Thread-32] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
        2011-07-28 08:20:16,656 [Thread-32] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
        2011-07-28 08:20:16,693 [Thread-32] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=aws09.machine.com:2222 sessionTimeout=60000 watcher=hconnection
        2011-07-28 08:20:16,694 [Thread-32-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server aws09.machine.com/10.83.1.244:2222
        2011-07-28 08:20:16,695 [Thread-32-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to aws09.machine.com/10.83.1.244:2222, initiating session
        2011-07-28 08:20:16,702 [Thread-32-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server aws09.machine.com/10.83.1.244:2222, sessionid = 0x131617dada60557, negotiated timeout = 60000
        2011-07-28 08:20:16,705 [Thread-32] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@37285252; hsa=aws03.machine.com:60020
        2011-07-28 08:20:16,709 [Thread-32] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
        2011-07-28 08:20:16,710 [Thread-32] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=info.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows
        2011-07-28 08:20:16,714 [Thread-32] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for info.mde253811.preprod.ubithere.com,,1311086202955.1975990008 is aws03.machine.com:60020
        2011-07-28 08:20:16,716 [Thread-32] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=aws09.machine.com:2222 sessionTimeout=60000 watcher=hconnection
        2011-07-28 08:20:16,717 [Thread-32-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server aws09.machine.com/10.83.1.244:2222
        2011-07-28 08:20:16,718 [Thread-32-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to aws09.machine.com/10.83.1.244:2222, initiating session
        2011-07-28 08:20:16,720 [Thread-32-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server aws09.machine.com/10.83.1.244:2222, sessionid = 0x131617dada60558, negotiated timeout = 60000
        2011-07-28 08:20:16,723 [Thread-32] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7418e252; hsa=aws03.machine.com:60020
        2011-07-28 08:20:16,726 [Thread-32] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
        2011-07-28 08:20:16,727 [Thread-32] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=info.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows
        2011-07-28 08:20:16,730 [Thread-32] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for info.mde253811.preprod.ubithere.com,,1311086202955.1975990008 is aws03.machine.com:60020
        2011-07-28 08:20:16,732 [Thread-32] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=info.mde253811.preprod.ubithere.com,,00000000000000 for max=2147483647 rows
        2011-07-28 08:20:16,772 [Thread-32] DEBUG org.apache.hadoop.hbase.mapreduce.TableInputFormatBase - getSplits: split -> 0 -> aws03.machine.com:,
        2011-07-28 08:20:16,772 [Thread-32] INFO org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Got 1 splits.
        2011-07-28 08:20:16,772 [Thread-32] INFO org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Returning 1 splits.
        2011-07-28 08:20:17,500 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201107251336_0315
        2011-07-28 08:20:17,500 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://aws09.preprod.ubithere.com:50030/jobdetails.jsp?jobid=job_201107251336_0315
        2011-07-28 08:20:28,075 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 37% complete
        2011-07-28 08:20:34,106 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 41% complete
        2011-07-28 08:20:37,124 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
        2011-07-28 08:20:46,168 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 51% complete
        2011-07-28 08:20:49,183 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 61% complete
        2011-07-28 08:20:52,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 62% complete
        2011-07-28 08:20:55,214 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 64% complete
        2011-07-28 08:21:01,244 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
        2011-07-28 08:21:07,311 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
        2011-07-28 08:21:07,312 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
        2011-07-28 08:21:08,770 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
        2011-07-28 08:21:08,778 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
        2011-07-28 08:21:08,910 [Thread-47] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
        2011-07-28 08:21:08,910 [Thread-47] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
        2011-07-28 08:21:08,911 [Thread-47] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
        2011-07-28 08:21:09,280 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201107251336_0316
        2011-07-28 08:21:09,280 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://aws09.preprod.ubithere.com:50030/jobdetails.jsp?jobid=job_201107251336_0316
        2011-07-28 08:21:16,321 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 83% complete
        2011-07-28 08:21:34,439 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
        2011-07-28 08:21:34,441 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:

        HadoopVersion PigVersion UserId StartedAt FinishedAt Features
        0.20-append 0.8.1-SNAPSHOT root 2011-07-28 08:17:51 2011-07-28 08:21:34 HASH_JOIN,FILTER,LIMIT

        Success!

        Job Stats (time in seconds):
        JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs
        job_201107251336_0314 2 1 75 66 70 63 63 63 end_sessions,sessions,start_sessions HASH_JOIN
        job_201107251336_0315 4 1 15 6 12 24 24 24 infos,sessions HASH_JOIN
        job_201107251336_0316 1 1 3 3 3 12 12 12 hdfs://aws09.preprod.ubithere.com:9000/tmp/temp-1404953096/tmp819396740,

        Input(s):
        Successfully read 2069446 records from: "endSession.mde253811.preprod.ubithere.com"
        Successfully read 2072419 records from: "startSession.mde253811.preprod.ubithere.com"
        Successfully read 19441 records from: "info.mde253811.preprod.ubithere.com"

        Output(s):
        Successfully stored 0 records in: "hdfs://aws09.preprod.ubithere.com:9000/tmp/temp-1404953096/tmp819396740"

        Counters:
        Total records written : 0
        Total bytes written : 0
        Spillable Memory Manager spill count : 0
        Total bags proactively spilled: 1
        Total records proactively spilled: 1944943

        Job DAG:
        job_201107251336_0314 -> job_201107251336_0315,
        job_201107251336_0315 -> job_201107251336_0316,
        job_201107251336_0316

        2011-07-28 08:21:34,472 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
        2011-07-28 08:21:34,500 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
        2011-07-28 08:21:34,501 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
        grunt>

        Show
        Dmitriy V. Ryaboy added a comment - Moving the trace from ticket description to a comment (long descriptions make the ticket hard to work with in jira). Here is the PIG dump for the HBase version: aws09:~# pig 2011-07-28 08:17:36,329 [main] INFO org.apache.pig.Main - Logging error messages to: /root/pig_1311841056328.log 2011-07-28 08:17:36,641 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://aws09.preprod.ubithere.com:9000 2011-07-28 08:17:36,923 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: aws09.preprod.ubithere.com:9001 grunt> start_sessions = LOAD 'startSession.mde253811.preprod.ubithere.com' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:infoid meta:imei meta:timestamp') AS (sid:chararray, infoid:chararray, imei:chararray, start:long); grunt> end_sessions = LOAD 'endSession.mde253811.preprod.ubithere.com' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:timestamp meta:locid') AS (sid:chararray, end:long, locid:chararray); grunt> infos = LOAD 'info.mde253811.preprod.ubithere.com' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:infoid data:networkType data:networkSubtype data:locale data:applicationVersionName data:carrierCountry data:carrierName data:phoneManufacturer data:phoneModel data:firmwareVersion data:firmwareName') AS (infoid:chararray, network_type:chararray, network_subtype:chararray, locale:chararray, version_name:chararray, carrier_country:chararray, carrier_name:chararray, phone_manufacturer:chararray, phone_model:chararray, firmware_version:chararray, firmware_name:chararray); grunt> sessions = JOIN start_sessions BY sid, end_sessions BY sid; grunt> sessions = FILTER sessions BY end > start AND end - start < 86400000L; grunt> sessions = JOIN sessions BY infoid, infos BY infoid; grunt> sessions = LIMIT sessions 100; grunt> dump sessions; 2011-07-28 08:17:50,275 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: HASH_JOIN,FILTER,LIMIT 2011-07-28 08:17:50,275 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used. 2011-07-28 08:17:51,213 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: sessions: Store(hdfs://aws09.preprod.ubithere.com:9000/tmp/temp-1404953096/tmp819396740:org.apache.pig.impl.io.InterStorage) - scope-93 Operator Key: scope-93) 2011-07-28 08:17:51,225 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2011-07-28 08:17:51,281 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer - Rewrite: POPackage->POForEach to POJoinPackage 2011-07-28 08:17:51,281 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer - Rewrite: POPackage->POForEach to POJoinPackage 2011-07-28 08:17:51,350 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3 2011-07-28 08:17:51,350 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3 2011-07-28 08:17:51,402 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2011-07-28 08:17:51,411 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT 2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:host.name=aws09.machine.com 2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.version=1.6.0_22 2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Sun Microsystems Inc. 2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.home=/usr/lib/jvm/java-6-sun-1.6.0.22/jre 2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.class.path=/opt/pig/bin/../conf:/usr/lib/jvm/java-6-sun/jre/lib/tools.jar:/opt/pig/bin/../pig-0.8.1-core.jar:/opt/pig/bin/../build/pig-*-SNAPSHOT.jar:/opt/pig/bin/../lib/commons-el-1.0.jar:/opt/pig/bin/../lib/commons-lang-2.4.jar:/opt/pig/bin/../lib/commons-logging-1.1.1.jar:/opt/pig/bin/../lib/guava-r06.jar:/opt/pig/bin/../lib/hbase-0.90.3.jar:/opt/pig/bin/../lib/hsqldb-1.8.0.10.jar:/opt/pig/bin/../lib/jackson-core-asl-1.0.1.jar:/opt/pig/bin/../lib/jackson-mapper-asl-1.0.1.jar:/opt/pig/bin/../lib/javacc-4.2.jar:/opt/pig/bin/../lib/javacc.jar:/opt/pig/bin/../lib/jetty-util-6.1.14.jar:/opt/pig/bin/../lib/jline-0.9.94.jar:/opt/pig/bin/../lib/joda-time-1.6.jar:/opt/pig/bin/../lib/jsch-0.1.38.jar:/opt/pig/bin/../lib/junit-4.5.jar:/opt/pig/bin/../lib/jython-2.5.0.jar:/opt/pig/bin/../lib/log4j-1.2.14.jar:/opt/pig/bin/../lib/pigudfs.jar:/opt/pig/bin/../lib/slf4j-log4j12-1.4.3.jar:/opt/pig/bin/../lib/zookeeper-3.3.2.jar:/opt/hadoop/conf_computation:/opt/hbase/conf:/opt/pig/lib/hadoop-0.20-append-core.jar 2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.library.path=/usr/lib/jvm/java-6-sun-1.6.0.22/jre/lib/amd64/server:/usr/lib/jvm/java-6-sun-1.6.0.22/jre/lib/amd64:/usr/lib/jvm/java-6-sun-1.6.0.22/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp 2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=<NA> 2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.name=Linux 2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.arch=amd64 2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:os.version=2.6.21.7-2.fc8xen-ec2-v1.0 2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.name=root 2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.home=/root 2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:user.dir=/root 2011-07-28 08:17:51,471 [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=aws09.machine.com:2222 sessionTimeout=60000 watcher=hconnection 2011-07-28 08:17:51,493 [main-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server aws09.machine.com/10.83.1.244:2222 2011-07-28 08:17:51,499 [main-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to aws09.machine.com/10.83.1.244:2222, initiating session 2011-07-28 08:17:51,508 [main-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server aws09.machine.com/10.83.1.244:2222, sessionid = 0x131617dada6054b, negotiated timeout = 60000 2011-07-28 08:17:51,575 [main] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@ef894ce; hsa=aws03.machine.com:60020 2011-07-28 08:17:51,687 [main] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020 2011-07-28 08:17:51,696 [main] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=endSession.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows 2011-07-28 08:17:51,700 [main] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for endSession.mde253811.preprod.ubithere.com,,1311086199483.706685579 is aws03.machine.com:60020 2011-07-28 08:17:51,726 [main] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=startSession.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows 2011-07-28 08:17:51,729 [main] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for startSession.mde253811.preprod.ubithere.com,,1311086198252.1334391323 is aws03.machine.com:60020 2011-07-28 08:17:53,328 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2011-07-28 08:17:53,335 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0 2011-07-28 08:17:53,335 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1 2011-07-28 08:17:53,442 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2011-07-28 08:17:53,944 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2011-07-28 08:17:53,989 [Thread-13] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=aws09.machine.com:2222 sessionTimeout=60000 watcher=hconnection 2011-07-28 08:17:53,990 [Thread-13-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server aws09.machine.com/10.83.1.244:2222 2011-07-28 08:17:53,991 [Thread-13-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to aws09.machine.com/10.83.1.244:2222, initiating session 2011-07-28 08:17:53,996 [Thread-13-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server aws09.machine.com/10.83.1.244:2222, sessionid = 0x131617dada6054c, negotiated timeout = 60000 2011-07-28 08:17:54,000 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2d9f90e3; hsa=aws03.machine.com:60020 2011-07-28 08:17:54,005 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020 2011-07-28 08:17:54,006 [Thread-13] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=endSession.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows 2011-07-28 08:17:54,011 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for endSession.mde253811.preprod.ubithere.com,,1311086199483.706685579 is aws03.machine.com:60020 2011-07-28 08:17:54,017 [Thread-13] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=aws09.machine.com:2222 sessionTimeout=60000 watcher=hconnection 2011-07-28 08:17:54,017 [Thread-13-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server aws09.machine.com/10.83.1.244:2222 2011-07-28 08:17:54,018 [Thread-13-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to aws09.machine.com/10.83.1.244:2222, initiating session 2011-07-28 08:17:54,025 [Thread-13-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server aws09.machine.com/10.83.1.244:2222, sessionid = 0x131617dada6054d, negotiated timeout = 60000 2011-07-28 08:17:54,029 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@41f6321; hsa=aws03.machine.com:60020 2011-07-28 08:17:54,032 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020 2011-07-28 08:17:54,033 [Thread-13] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=endSession.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows 2011-07-28 08:17:54,037 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for endSession.mde253811.preprod.ubithere.com,,1311086199483.706685579 is aws03.machine.com:60020 2011-07-28 08:17:54,039 [Thread-13] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=endSession.mde253811.preprod.ubithere.com,,00000000000000 for max=2147483647 rows 2011-07-28 08:17:54,067 [Thread-13] DEBUG org.apache.hadoop.hbase.mapreduce.TableInputFormatBase - getSplits: split -> 0 -> aws03.machine.com:, 2011-07-28 08:17:54,068 [Thread-13] INFO org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Got 1 splits. 2011-07-28 08:17:54,068 [Thread-13] INFO org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Returning 1 splits. 2011-07-28 08:17:54,109 [Thread-13] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=aws09.machine.com:2222 sessionTimeout=60000 watcher=hconnection 2011-07-28 08:17:54,110 [Thread-13-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server aws09.machine.com/10.83.1.244:2222 2011-07-28 08:17:54,111 [Thread-13-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to aws09.machine.com/10.83.1.244:2222, initiating session 2011-07-28 08:17:54,119 [Thread-13-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server aws09.machine.com/10.83.1.244:2222, sessionid = 0x131617dada6054e, negotiated timeout = 60000 2011-07-28 08:17:54,123 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@20c3e967; hsa=aws03.machine.com:60020 2011-07-28 08:17:54,140 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020 2011-07-28 08:17:54,142 [Thread-13] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=startSession.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows 2011-07-28 08:17:54,148 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for startSession.mde253811.preprod.ubithere.com,,1311086198252.1334391323 is aws03.machine.com:60020 2011-07-28 08:17:54,154 [Thread-13] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=aws09.machine.com:2222 sessionTimeout=60000 watcher=hconnection 2011-07-28 08:17:54,158 [Thread-13-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server aws09.machine.com/10.83.1.244:2222 2011-07-28 08:17:54,159 [Thread-13-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to aws09.machine.com/10.83.1.244:2222, initiating session 2011-07-28 08:17:54,161 [Thread-13-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server aws09.machine.com/10.83.1.244:2222, sessionid = 0x131617dada6054f, negotiated timeout = 60000 2011-07-28 08:17:54,164 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@5ee771f3; hsa=aws03.machine.com:60020 2011-07-28 08:17:54,167 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020 2011-07-28 08:17:54,169 [Thread-13] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=startSession.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows 2011-07-28 08:17:54,172 [Thread-13] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for startSession.mde253811.preprod.ubithere.com,,1311086198252.1334391323 is aws03.machine.com:60020 2011-07-28 08:17:54,173 [Thread-13] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=startSession.mde253811.preprod.ubithere.com,,00000000000000 for max=2147483647 rows 2011-07-28 08:17:54,180 [Thread-13] DEBUG org.apache.hadoop.hbase.mapreduce.TableInputFormatBase - getSplits: split -> 0 -> aws03.machine.com:, 2011-07-28 08:17:54,180 [Thread-13] INFO org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Got 1 splits. 2011-07-28 08:17:54,180 [Thread-13] INFO org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Returning 1 splits. 2011-07-28 08:17:55,037 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201107251336_0314 2011-07-28 08:17:55,037 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://aws09.preprod.ubithere.com:50030/jobdetails.jsp?jobid=job_201107251336_0314 2011-07-28 08:19:06,924 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 8% complete 2011-07-28 08:19:15,971 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 16% complete 2011-07-28 08:19:18,985 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 19% complete 2011-07-28 08:19:25,035 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 27% complete 2011-07-28 08:20:14,810 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2011-07-28 08:20:14,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2011-07-28 08:20:14,830 [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=aws09.machine.com:2222 sessionTimeout=60000 watcher=hconnection 2011-07-28 08:20:14,831 [main-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server aws09.machine.com/10.83.1.244:2222 2011-07-28 08:20:14,832 [main-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to aws09.machine.com/10.83.1.244:2222, initiating session 2011-07-28 08:20:14,838 [main-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server aws09.machine.com/10.83.1.244:2222, sessionid = 0x131617dada60556, negotiated timeout = 60000 2011-07-28 08:20:14,842 [main] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@5d4fa79d; hsa=aws03.machine.com:60020 2011-07-28 08:20:14,847 [main] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020 2011-07-28 08:20:14,849 [main] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=info.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows 2011-07-28 08:20:14,852 [main] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for info.mde253811.preprod.ubithere.com,,1311086202955.1975990008 is aws03.machine.com:60020 2011-07-28 08:20:16,311 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2011-07-28 08:20:16,324 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=198330658 2011-07-28 08:20:16,324 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1 2011-07-28 08:20:16,341 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2011-07-28 08:20:16,656 [Thread-32] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2011-07-28 08:20:16,656 [Thread-32] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2011-07-28 08:20:16,693 [Thread-32] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=aws09.machine.com:2222 sessionTimeout=60000 watcher=hconnection 2011-07-28 08:20:16,694 [Thread-32-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server aws09.machine.com/10.83.1.244:2222 2011-07-28 08:20:16,695 [Thread-32-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to aws09.machine.com/10.83.1.244:2222, initiating session 2011-07-28 08:20:16,702 [Thread-32-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server aws09.machine.com/10.83.1.244:2222, sessionid = 0x131617dada60557, negotiated timeout = 60000 2011-07-28 08:20:16,705 [Thread-32] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@37285252; hsa=aws03.machine.com:60020 2011-07-28 08:20:16,709 [Thread-32] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020 2011-07-28 08:20:16,710 [Thread-32] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=info.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows 2011-07-28 08:20:16,714 [Thread-32] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for info.mde253811.preprod.ubithere.com,,1311086202955.1975990008 is aws03.machine.com:60020 2011-07-28 08:20:16,716 [Thread-32] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=aws09.machine.com:2222 sessionTimeout=60000 watcher=hconnection 2011-07-28 08:20:16,717 [Thread-32-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server aws09.machine.com/10.83.1.244:2222 2011-07-28 08:20:16,718 [Thread-32-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to aws09.machine.com/10.83.1.244:2222, initiating session 2011-07-28 08:20:16,720 [Thread-32-SendThread(aws09.machine.com:2222)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server aws09.machine.com/10.83.1.244:2222, sessionid = 0x131617dada60558, negotiated timeout = 60000 2011-07-28 08:20:16,723 [Thread-32] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7418e252; hsa=aws03.machine.com:60020 2011-07-28 08:20:16,726 [Thread-32] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020 2011-07-28 08:20:16,727 [Thread-32] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=info.mde253811.preprod.ubithere.com,,00000000000000 for max=10 rows 2011-07-28 08:20:16,730 [Thread-32] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Cached location for info.mde253811.preprod.ubithere.com,,1311086202955.1975990008 is aws03.machine.com:60020 2011-07-28 08:20:16,732 [Thread-32] DEBUG org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at row=info.mde253811.preprod.ubithere.com,,00000000000000 for max=2147483647 rows 2011-07-28 08:20:16,772 [Thread-32] DEBUG org.apache.hadoop.hbase.mapreduce.TableInputFormatBase - getSplits: split -> 0 -> aws03.machine.com:, 2011-07-28 08:20:16,772 [Thread-32] INFO org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Got 1 splits. 2011-07-28 08:20:16,772 [Thread-32] INFO org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Returning 1 splits. 2011-07-28 08:20:17,500 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201107251336_0315 2011-07-28 08:20:17,500 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://aws09.preprod.ubithere.com:50030/jobdetails.jsp?jobid=job_201107251336_0315 2011-07-28 08:20:28,075 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 37% complete 2011-07-28 08:20:34,106 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 41% complete 2011-07-28 08:20:37,124 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2011-07-28 08:20:46,168 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 51% complete 2011-07-28 08:20:49,183 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 61% complete 2011-07-28 08:20:52,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 62% complete 2011-07-28 08:20:55,214 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 64% complete 2011-07-28 08:21:01,244 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete 2011-07-28 08:21:07,311 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2011-07-28 08:21:07,312 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2011-07-28 08:21:08,770 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2011-07-28 08:21:08,778 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2011-07-28 08:21:08,910 [Thread-47] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2011-07-28 08:21:08,910 [Thread-47] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2011-07-28 08:21:08,911 [Thread-47] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2011-07-28 08:21:09,280 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201107251336_0316 2011-07-28 08:21:09,280 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://aws09.preprod.ubithere.com:50030/jobdetails.jsp?jobid=job_201107251336_0316 2011-07-28 08:21:16,321 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 83% complete 2011-07-28 08:21:34,439 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2011-07-28 08:21:34,441 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 0.20-append 0.8.1-SNAPSHOT root 2011-07-28 08:17:51 2011-07-28 08:21:34 HASH_JOIN,FILTER,LIMIT Success! Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_201107251336_0314 2 1 75 66 70 63 63 63 end_sessions,sessions,start_sessions HASH_JOIN job_201107251336_0315 4 1 15 6 12 24 24 24 infos,sessions HASH_JOIN job_201107251336_0316 1 1 3 3 3 12 12 12 hdfs://aws09.preprod.ubithere.com:9000/tmp/temp-1404953096/tmp819396740, Input(s): Successfully read 2069446 records from: "endSession.mde253811.preprod.ubithere.com" Successfully read 2072419 records from: "startSession.mde253811.preprod.ubithere.com" Successfully read 19441 records from: "info.mde253811.preprod.ubithere.com" Output(s): Successfully stored 0 records in: "hdfs://aws09.preprod.ubithere.com:9000/tmp/temp-1404953096/tmp819396740" Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 1 Total records proactively spilled: 1944943 Job DAG: job_201107251336_0314 -> job_201107251336_0315, job_201107251336_0315 -> job_201107251336_0316, job_201107251336_0316 2011-07-28 08:21:34,472 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 2011-07-28 08:21:34,500 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2011-07-28 08:21:34,501 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 grunt>
        Hide
        Dmitriy V. Ryaboy added a comment -

        I have this nagging feeling that there was a reason I put the early termination in there (which is what appears to be causing the problem). Raghu, can you add a unit test for this?

        Show
        Dmitriy V. Ryaboy added a comment - I have this nagging feeling that there was a reason I put the early termination in there (which is what appears to be causing the problem). Raghu, can you add a unit test for this?
        Hide
        Raghu Angadi added a comment -

        This was introduced in PIG-1870.
        TestHBaseStorage passes with the patch.. though I expected projection tests to fail.

        Show
        Raghu Angadi added a comment - This was introduced in PIG-1870 . TestHBaseStorage passes with the patch.. though I expected projection tests to fail.
        Hide
        Raghu Angadi added a comment -

        Our tests verify that we return the correct columns to users with projections.
        How can we verify that HBase scanner actually fetches only the projected columns? It seems hard to get hold of the Scan object.

        Show
        Raghu Angadi added a comment - Our tests verify that we return the correct columns to users with projections. How can we verify that HBase scanner actually fetches only the projected columns? It seems hard to get hold of the Scan object.
        Hide
        Raghu Angadi added a comment -

        patch for trunk is attached.

        • This is the complete fix.
        • includes a unit test
        • earlier simple patch works, but does not set the project before the scan is initialized.
        • removed 'initialized' variable since we know when schema and projections are set:
          • store schema is set inside checkSchema()
          • projection is set in pushProjection()
          • on the backend, projection is handled in side setLocation().
          • on the backend store schema is set inside setStoreLocation().
        Show
        Raghu Angadi added a comment - patch for trunk is attached. This is the complete fix. includes a unit test earlier simple patch works, but does not set the project before the scan is initialized. removed 'initialized' variable since we know when schema and projections are set: store schema is set inside checkSchema() projection is set in pushProjection() on the backend, projection is handled in side setLocation(). on the backend store schema is set inside setStoreLocation().
        Hide
        Dmitriy V. Ryaboy added a comment -

        I will review.

        Show
        Dmitriy V. Ryaboy added a comment - I will review.
        Hide
        Vincent BARAT added a comment -

        We have tested this patch on the PIG trunk with success.
        Big thanks for your help guys!
        I guess we will use PIG 0.9 + this patch in production. Or maybe wait for PIG 0.9.1 if this patch is to be included in.

        Show
        Vincent BARAT added a comment - We have tested this patch on the PIG trunk with success. Big thanks for your help guys! I guess we will use PIG 0.9 + this patch in production. Or maybe wait for PIG 0.9.1 if this patch is to be included in.
        Hide
        Dmitriy V. Ryaboy added a comment -

        +1 will commit to 0.9 and trunk

        Show
        Dmitriy V. Ryaboy added a comment - +1 will commit to 0.9 and trunk

          People

          • Assignee:
            Raghu Angadi
            Reporter:
            Vincent BARAT
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development