Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1068

COGROUP fails with 'Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple'

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.4.0
    • 0.6.0
    • None
    • None

    Description

      The COGROUP in the following script fails in its map:

      logs = LOAD '$LOGS' USING PigStorage() AS (ts:int, id:chararray, command:chararray, comments:chararray);                                                                                                       
                                                                                                                                                                                                                     
      SPLIT logs INTO logins IF command == 'login', all_quits IF command == 'quit';                                                                                                                                  
                                                                                                                                                                                                                     
      -- Project login clients and count them by ID.                                                                                                                                                                 
      login_info = FOREACH logins {                                                                                                                                                                                  
          GENERATE id as id,                                                                                                                                                                                         
          comments AS client;                                                                                                                                                                                        
      };                                                                                                                                                                                                             
                                                                                                                                                                                                                     
      logins_grouped = GROUP login_info BY (id, client);                                                                                                                                                             
                                                                                                                                                                                                                     
      count_logins_by_client = FOREACH logins_grouped {                                                                                                                                                              
          generate group.id AS id, group.client AS client, COUNT($1) AS count;                                                                                                                                       
      }                                                                                                                                                                                                              
                                                                                                                                                                                                                     
      -- Get the first quit.                                                                                                                                                                                         
      all_quits_grouped = GROUP all_quits BY id;                                                                                                                                                                     
                                                                                                                                                                                                                     
      quits = FOREACH all_quits_grouped {                                                                                                                                                                            
          ordered = ORDER all_quits BY ts ASC;                                                                                                                                                                       
          last_quit = LIMIT ordered 1;                                                                                                                                                                               
          GENERATE FLATTEN(last_quit);                                                                                                                                                                               
      }                                                                                                                                                                                                              
                                                                                                                                                                                                                     
      -- Now, group all the info together.                                                                                                                                                                           
      joined_session_info = COGROUP quits BY id, count_logins_by_client BY id;                                                                                                                                       
                                                                                                                                                                                                                     
      DUMP joined_session_info;
      

      Here's the stack trace:

      java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple
              at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
              at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
              at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:229)
              at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
              at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
              at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:157)
      

      Attachments

        1. PIG-1068.patch
          20 kB
          Richard Ding
        2. log
          0.1 kB
          Vikram Oberoi
        3. cogroup-bug.pig
          0.8 kB
          Vikram Oberoi

        Activity

          People

            rding Richard Ding
            voberoi Vikram Oberoi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: