Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2691

Duplicate TOKENIZE schema

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.11
    • None
    • Incompatible change
    • Hide
      TOKENIZE: the default name of the field in the schema produced by this UDF now depends on the input field. This change could break your script if you were relying on the field being called "bag_of_tokenTuples" (i.e. you were not using an AS clause to rename the field).
      Show
      TOKENIZE: the default name of the field in the schema produced by this UDF now depends on the input field. This change could break your script if you were relying on the field being called "bag_of_tokenTuples" (i.e. you were not using an AS clause to rename the field).

    Description

      TOKENIZE produces a fixed named schema that results in duplicates if used more than once in the same generate statement.
      We could paramenterize the schema on the name of the field being tokenized.

      grunt> q = LOAD 'file' AS (source:chararray, target:chararray);
      grunt> e = FOREACH q GENERATE TOKENIZE(source), TOKENIZE(target);
      2012-05-09 20:18:37,235 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1108: 
      <line 2, column 14> Duplicate schema alias: bag_of_tokenTuples
      grunt> e = FOREACH q GENERATE TOKENIZE(source) as s_entities, TOKENIZE(target) as t_entities;
      grunt> describe e
      e: {s_entities: {tuple_of_tokens: (token: chararray)},t_entities: {tuple_of_tokens: (token: chararray)}}
      

      Attachments

        1. PIG-2691.patch
          0.7 kB
          Jie Li
        2. PIG-2691.patch.2
          2 kB
          Jie Li

        Activity

          People

            jay23jack Jie Li
            azaroth Gianmarco De Francisci Morales
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: