Pig
  1. Pig
  2. PIG-2611

HBaseStorage not casting correctly

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Invalid
    • Affects Version/s: 0.9.2
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Environment:

      Ubuntu 11.10, Hadoop 0.20.2, HBase 0.92.0

      Description

      When loading data into HBase with HBaseStorage, there is unexpected behavior regarding record schema and casting.

      Here is the relevant code snippet:

      B = group A by (time_tuple, some_scalar);
      C = foreach B {
      	-- UDF to generate id (bytearray)
      	generate id, flatten(group.$0), COUNT(A);
      }
      

      At this point the schema for C is unknown, so I declare a schema with a foreach statement

      D = foreach C generate $0 as id:bytearray, $1 as year:int, $2 as month:int, $3 as date:int, $4 as count:int;
      

      Even though I've declared C.$4 as an int, it is still a long (from the COUNT). When I go to insert into HBase I get a ClassCastException since the schema (int) does not match the actual tuple value (long). I can fix this by explicitly casting when I declare the schema.

      D = foreach C generate $0 as id:bytearray, $1 as year:int, $2 as month:int, $3 as date:int, (int)$4 as count:int;
      

      Is this expected behavior? If not, is this an HBaseStorage issue - not honoring the schema before going off casting things?

      Cheers,
      David

        Activity

        David Arthur created issue -
        Dmitriy V. Ryaboy made changes -
        Field Original Value New Value
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Invalid [ 6 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            David Arthur
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development