Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0, 0.8.0
-
None
-
None
Description
In the following script I have a tab separated text file, which I load using PigStorage() and store using BinStorage()
A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray); B = group A by name; store B into '/user/viraj/binstoragecreateop' using BinStorage(); dump B;
I later load file 'binstoragecreateop' in the following way.
A = load '/user/viraj/binstoragecreateop' using BinStorage();
B = foreach A generate $0 as name:chararray;
dump B;
Result
=======================================================================
(Amy)
(Fred)
=======================================================================
The above code work properly and returns the right results. If I use PigStorage() to achieve the same, I get the following error.
A = load '/user/viraj/visits.txt' using PigStorage();
B = foreach A generate $0 as name:chararray;
dump B;
=======================================================================
2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
=======================================================================
So why should the semantics of BinStorage() be different from PigStorage() where is ok not to specify a schema??? Should it not be consistent across both.