Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2611

HBaseStorage not casting correctly

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Invalid
    • 0.9.2
    • None
    • None
    • Ubuntu 11.10, Hadoop 0.20.2, HBase 0.92.0

    Description

      When loading data into HBase with HBaseStorage, there is unexpected behavior regarding record schema and casting.

      Here is the relevant code snippet:

      B = group A by (time_tuple, some_scalar);
      C = foreach B {
      	-- UDF to generate id (bytearray)
      	generate id, flatten(group.$0), COUNT(A);
      }
      

      At this point the schema for C is unknown, so I declare a schema with a foreach statement

      D = foreach C generate $0 as id:bytearray, $1 as year:int, $2 as month:int, $3 as date:int, $4 as count:int;
      

      Even though I've declared C.$4 as an int, it is still a long (from the COUNT). When I go to insert into HBase I get a ClassCastException since the schema (int) does not match the actual tuple value (long). I can fix this by explicitly casting when I declare the schema.

      D = foreach C generate $0 as id:bytearray, $1 as year:int, $2 as month:int, $3 as date:int, (int)$4 as count:int;
      

      Is this expected behavior? If not, is this an HBaseStorage issue - not honoring the schema before going off casting things?

      Cheers,
      David

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            mumrah David Arthur
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment