Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2611

HBaseStorage not casting correctly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Invalid
    • 0.9.2
    • None
    • None
    • Ubuntu 11.10, Hadoop 0.20.2, HBase 0.92.0

    Description

      When loading data into HBase with HBaseStorage, there is unexpected behavior regarding record schema and casting.

      Here is the relevant code snippet:

      B = group A by (time_tuple, some_scalar);
      C = foreach B {
      	-- UDF to generate id (bytearray)
      	generate id, flatten(group.$0), COUNT(A);
      }
      

      At this point the schema for C is unknown, so I declare a schema with a foreach statement

      D = foreach C generate $0 as id:bytearray, $1 as year:int, $2 as month:int, $3 as date:int, $4 as count:int;
      

      Even though I've declared C.$4 as an int, it is still a long (from the COUNT). When I go to insert into HBase I get a ClassCastException since the schema (int) does not match the actual tuple value (long). I can fix this by explicitly casting when I declare the schema.

      D = foreach C generate $0 as id:bytearray, $1 as year:int, $2 as month:int, $3 as date:int, (int)$4 as count:int;
      

      Is this expected behavior? If not, is this an HBaseStorage issue - not honoring the schema before going off casting things?

      Cheers,
      David

      Attachments

        Activity

          People

            Unassigned Unassigned
            mumrah David Arthur
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: