Pig
  1. Pig
  2. PIG-1718

Cannot directly cast output of UDF

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.7.0
    • Fix Version/s: None
    • Component/s: impl
    • Labels:
      None
    • Environment:

      Macbook Pro 6.2, Ubuntu 10.04 AMD64, CDH3 beta 3

      Description

      I'm in the process of writing a suite of UDFs to deal with nested JSON data inside of Pig. In one case, I created a UDF of type EvalFunc<String> and wanted to use it like so:

      RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray );
      IN = foreach RAW generate id, ExtractString(json, 'count') as count:int
      

      When I do this, I get the following error:

      ERROR 1022: Type mismatch merging schema prefix. Field Schema: chararray. Other Field Schema: count: int

      I can work around it by adding another projection with just a cast (as below), but I'd prefer if the form I just first just worked.

      RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray );
      MID = foreach RAW generate id, ExtractString(json, 'count') as count
      IN = foreach MID generate id, (int)count
      

      I'd prefer not to have to have ExtractInteger extends EvalFun<Integer> if I can avoid it. In our case, it gets even more cumbersome because we want to have something like ExtractStringTuple extends EvalFunc<Tuple> that returns a tuple of strings without parsing the JSON over and over again:

      RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray );
      IN = foreach RAW generate id, ExtractStringTuple(json, 'name', 'count', 'mean') as (name, count:int, mean:double);
      

      As indicated, I have tested this with Pig 0.7.0. My apologies if this is already fixed in 0.8 since I was not able to test with a newer version.

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Unassigned
            Reporter:
            Mike Dillon
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development