Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2611

HBaseStorage not casting correctly

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Invalid
    • Affects Version/s: 0.9.2
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Environment:

      Ubuntu 11.10, Hadoop 0.20.2, HBase 0.92.0

      Description

      When loading data into HBase with HBaseStorage, there is unexpected behavior regarding record schema and casting.

      Here is the relevant code snippet:

      B = group A by (time_tuple, some_scalar);
      C = foreach B {
      	-- UDF to generate id (bytearray)
      	generate id, flatten(group.$0), COUNT(A);
      }
      

      At this point the schema for C is unknown, so I declare a schema with a foreach statement

      D = foreach C generate $0 as id:bytearray, $1 as year:int, $2 as month:int, $3 as date:int, $4 as count:int;
      

      Even though I've declared C.$4 as an int, it is still a long (from the COUNT). When I go to insert into HBase I get a ClassCastException since the schema (int) does not match the actual tuple value (long). I can fix this by explicitly casting when I declare the schema.

      D = foreach C generate $0 as id:bytearray, $1 as year:int, $2 as month:int, $3 as date:int, (int)$4 as count:int;
      

      Is this expected behavior? If not, is this an HBaseStorage issue - not honoring the schema before going off casting things?

      Cheers,
      David

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mumrah David Arthur
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: