Pig
  1. Pig
  2. PIG-1991

Leading Underscore (_) not allowed in schema names

    Details

    • Type: Wish Wish
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.9.0
    • Fix Version/s: None
    • Component/s: grunt
    • Labels:
      None

      Description

      I have a Pig script which uses underscore in its schema name (_a)

      a = load 'test.txt' as (_a:long, b:chararray);
      dump a;
      

      This causes an error in Pig:

      <line 1, column 24> Unexpected character '_'
      2011-04-12 11:58:59,624 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 1, column 24> Unexpected character '_'

      Stack trace:
      Pig Stack Trace
      ---------------
      ERROR 1200: <line 1, column 24> Unexpected character '_'

      Failed to parse: <line 1, column 24> Unexpected character '_'
      at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:83)
      at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1555)
      at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1527)
      at org.apache.pig.PigServer.registerQuery(PigServer.java:582)
      at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:917)
      at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
      at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:176)
      at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:152)
      at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
      at org.apache.pig.Main.run(Main.java:489)
      at org.apache.pig.Main.main(Main.java:108)
      ================================================================================

      Schema names should be allowed to have underscores.

      Viraj

        Activity

        Hide
        Olga Natkovich added a comment -

        This is a new feature request which is too late for 0.9. We can consider it for 10 but in general we need to have strong reasons for changing Pig fundamentals and this just does not strike me like something we need to change.

        Show
        Olga Natkovich added a comment - This is a new feature request which is too late for 0.9. We can consider it for 10 but in general we need to have strong reasons for changing Pig fundamentals and this just does not strike me like something we need to change.
        Hide
        Alan Gates added a comment -

        The definition of variable names for Pig is:

        [a-zA-Z][a-zA-Z0-9]*

        I don't see any compelling reason to change that.

        Show
        Alan Gates added a comment - The definition of variable names for Pig is: [a-zA-Z] [a-zA-Z0-9] * I don't see any compelling reason to change that.
        Hide
        Edmund Dorsey added a comment -

        Stumbled across this request while trying to figure out why I couldn't access my fields in Avro using Pig 0.11. Not sure if it's appropriate to leave a comment here but given what seems to be the widespread adoption of Avro with Hadoop not supporting underscores seems to mean the Avro schema cannot use field names with underscores. In our case all our "reserved" field names start with a leading underscore and as a result we have not been able to use Pig with Avro as we can't access any of the fields with the leading underscore (we get the error "Unexpected character '_'").

        It seems like adding underscores as an allowed character in variable names would be completely backwards compatible and it would also bring the variable naming convention closer in line with the Java naming conventions used by Avro.

        (Note that I'm still pretty new to Pig so maybe there is a workaround I'm not aware of that makes this whole point moot)

        Show
        Edmund Dorsey added a comment - Stumbled across this request while trying to figure out why I couldn't access my fields in Avro using Pig 0.11. Not sure if it's appropriate to leave a comment here but given what seems to be the widespread adoption of Avro with Hadoop not supporting underscores seems to mean the Avro schema cannot use field names with underscores. In our case all our "reserved" field names start with a leading underscore and as a result we have not been able to use Pig with Avro as we can't access any of the fields with the leading underscore (we get the error "Unexpected character '_'"). It seems like adding underscores as an allowed character in variable names would be completely backwards compatible and it would also bring the variable naming convention closer in line with the Java naming conventions used by Avro. (Note that I'm still pretty new to Pig so maybe there is a workaround I'm not aware of that makes this whole point moot)
        Hide
        Adrien Mogenet added a comment -

        Another comment to say that we're currently building Parquet files through Spark + custom Scala world, and when "writing our case classes" to HDFS, the associated schema will contain fields such as `_1` and `_2`.

        I think there is no reason to forbid that.

        At least, there should be a clear reason of why the script failed, but I think this may be addressed to the `PigLoader` authors.

        Show
        Adrien Mogenet added a comment - Another comment to say that we're currently building Parquet files through Spark + custom Scala world, and when "writing our case classes" to HDFS, the associated schema will contain fields such as `_1` and `_2`. I think there is no reason to forbid that. At least, there should be a clear reason of why the script failed, but I think this may be addressed to the `PigLoader` authors.

          People

          • Assignee:
            Unassigned
            Reporter:
            Viraj Bhat
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development