Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1031

PigStorage interpreting chararray/bytearray for a tuple element inside a bag as float or double

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.5.0
    • 0.7.0
    • impl
    • None

    Description

      I have a data stored in a text file as:

      {(4153E765)} {(AF533765)}


      I try reading it using PigStorage as:
      A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term:bytearray)});
      dump A;
      


      I get the following results:


      ({(Infinity)})
      ({(AF533765)}

      )

      The problem seems to be with the method: parseFromBytes(byte[] b) in class Utf8StorageConverter. This method uses the TextDataParser (class generated via jjt) to interpret the type of data from content, even though the schema tells it is a bytearray.

      TextDataParser.jjt sample code

      TOKEN :
      {
      ...
       < DOUBLENUMBER: (["-","+"])? <FLOATINGPOINT> ( ["e","E"] ([ "-","+"])? <FLOATINGPOINT> )?>
       < FLOATNUMBER: <DOUBLENUMBER> (["f","F"])? >
      ...
      }
      

      I tried the following options, but it will not work as we need to call bytesToBag(byte[] b) in the Utf8StorageConverter class.

      A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term)});
      A = load 'pigstoragebroken.dat' using PigStorage() as (intersectionBag:bag{T:tuple(term:chararray)});
      

      Viraj

      Attachments

        Activity

          People

            daijy Daniel Dai
            viraj Viraj Bhat
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: