Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1581

Parser fails to recognize semicolons in quoted strings

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Duplicate
    • 0.7.0
    • 0.9.0
    • grunt
    • None
    • CentOS 5.5

    Description

      Within some contexts, the parser fails to treat semicolons correctly, and sees them as an EOL.

      Given an input file:

      /test1.txt (in the hdfs)
      1;a
      2;b
      3;c
      4;d
      5;e

      And the following Pig script:

      REGISTER /tmp/piggybank.jar ;
      DEFINE REGEXEXTRACTALL org.apache.pig.piggybank.evaluation.string.RegexExtractAll();
      lines = LOAD '/test1.txt' AS (line:chararray);
      delimited = FOREACH lines GENERATE FLATTEN (
      REGEXEXTRACTALL(line, '^(\\d+);(
      w+)$')
      ) AS (
      digit:int,
      word:chararray
      );
      DUMP delimited;

      I receive the following error:

      ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 5, column 40. Encountered: <EOF> after : "\'^(\\\\d+);"

      Attachments

        Issue Links

          Activity

            People

              xuefuz Xuefu Zhang
              christopher.hackman Christopher Hackman
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: