Pig
  1. Pig
  2. PIG-798

Schema errors when using PigStorage and none when using BinStorage in FOREACH??

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0, 0.8.0
    • Fix Version/s: None
    • Component/s: impl
    • Labels:
      None

      Description

      In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()

      A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);
      
      B = group A by name;
      
      store B into '/user/viraj/binstoragecreateop' using BinStorage();
      
      dump B;
      

      I later load file 'binstoragecreateop' in the following way.

      
      A = load '/user/viraj/binstoragecreateop' using BinStorage();
      
      B = foreach A generate $0 as name:chararray;
      
      dump B;
      

      Result
      =======================================================================
      (Amy)
      (Fred)
      =======================================================================
      The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.

      A = load '/user/viraj/visits.txt' using PigStorage();
      
      B = foreach A generate $0 as name:chararray;
      
      dump B;
      
      

      =======================================================================

      2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
      Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
      

      =======================================================================
      So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.

      1. binstoragecreateop
        0.3 kB
        Viraj Bhat
      2. visits.txt
        0.2 kB
        Viraj Bhat
      3. schemaerr.pig
        0.5 kB
        Viraj Bhat

        Activity

        Alan Gates made changes -
        Fix Version/s 0.9.0 [ 12315191 ]
        Alan Gates made changes -
        Assignee Alan Gates [ alangates ]
        Olga Natkovich made changes -
        Fix Version/s 0.9.0 [ 12315191 ]
        Viraj Bhat made changes -
        Affects Version/s 0.6.0 [ 12314214 ]
        Affects Version/s 0.5.0 [ 12314213 ]
        Affects Version/s 0.4.0 [ 12314042 ]
        Affects Version/s 0.3.0 [ 12313785 ]
        Affects Version/s 0.7.0 [ 12314397 ]
        Affects Version/s 0.8.0 [ 12314562 ]
        Olga Natkovich made changes -
        Fix Version/s 0.2.0 [ 12313783 ]
        Viraj Bhat made changes -
        Summary Schema errors when using PigStorage and none when using BinStorage?? Schema errors when using PigStorage and none when using BinStorage in FOREACH??
        Viraj Bhat made changes -
        Attachment visits.txt [ 12407063 ]
        Attachment binstoragecreateop [ 12407064 ]
        Attachment schemaerr.pig [ 12407062 ]
        Viraj Bhat made changes -
        Field Original Value New Value
        Description In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
        {code}
        A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);

        B = group A by name;

        store B into '/user/viraj/binstoragecreateop' using BinStorage();

        dump B;
        {code}

        I later load file 'binstoragecreateop' in the following way.
        {code}

        A = load '/user/viraj/binstoragecreateop' using BinStorage();

        B = foreach A generate $0 as name:chararray;

        dump B;
        {code}
        Result
        =======================================================================
        (Amy)
        (Fred)
        =======================================================================
        The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
        {code}
        A = load '/user/viraj/visits.txt' using PigStorage();

        B = foreach A generate $0 as name:chararray;

        dump B;

        {code}
        =======================================================================
        2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
        Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
        =======================================================================
        So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.
        In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
        {code}
        A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);

        B = group A by name;

        store B into '/user/viraj/binstoragecreateop' using BinStorage();

        dump B;
        {code}

        I later load file 'binstoragecreateop' in the following way.
        {code}

        A = load '/user/viraj/binstoragecreateop' using BinStorage();

        B = foreach A generate $0 as name:chararray;

        dump B;
        {code}
        Result
        =======================================================================
        (Amy)
        (Fred)
        =======================================================================
        The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
        {code}
        A = load '/user/viraj/visits.txt' using PigStorage();

        B = foreach A generate $0 as name:chararray;

        dump B;

        {code}
        =======================================================================
        {code}
        2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
        Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
        {code}
        =======================================================================
        So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.
        Viraj Bhat created issue -

          People

          • Assignee:
            Alan Gates
            Reporter:
            Viraj Bhat
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development