[SPARK-18952] regex strings not properly escaped in codegen for aggregations - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.2
Fix Version/s: 2.1.1, 2.2.0
Component/s: SQL
Labels:
None

Description

If I use the function regexp_extract, and then in my regex string, use `\`, i.e. escape character, this fails codegen, because the `\` character is not properly escaped when codegen'd.

Example stack trace:

/* 059 */     private int maxSteps = 2;
/* 060 */     private int numRows = 0;
/* 061 */     private org.apache.spark.sql.types.StructType keySchema = new org.apache.spark.sql.types.StructType().add("date_format(window#325.start, yyyy-MM-dd HH:mm)", org.apache.spark.sql.types.DataTypes.StringType)
/* 062 */     .add("regexp_extract(source#310.description, ([a-zA-Z]+)\[.*, 1)", org.apache.spark.sql.types.DataTypes.StringType);
/* 063 */     private org.apache.spark.sql.types.StructType valueSchema = new org.apache.spark.sql.types.StructType().add("sum", org.apache.spark.sql.types.DataTypes.LongType);
/* 064 */     private Object emptyVBase;

...

org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 62, Column 58: Invalid escape sequence
	at org.codehaus.janino.Scanner.scanLiteralCharacter(Scanner.java:918)
	at org.codehaus.janino.Scanner.produce(Scanner.java:604)
	at org.codehaus.janino.Parser.peekRead(Parser.java:3239)
	at org.codehaus.janino.Parser.parseArguments(Parser.java:3055)
	at org.codehaus.janino.Parser.parseSelector(Parser.java:2914)
	at org.codehaus.janino.Parser.parseUnaryExpression(Parser.java:2617)
	at org.codehaus.janino.Parser.parseMultiplicativeExpression(Parser.java:2573)
	at org.codehaus.janino.Parser.parseAdditiveExpression(Parser.java:2552)

In the codegend expression, the literal should use `
` instead of `\`

Attachments

Issue Links

duplicates

SPARK-18866 Codegen fails with cryptic error if regexp_replace() output column is not aliased

Resolved

is duplicated by

SPARK-20556 codehaus fails to generate code because of unescaped strings

Closed

links to

[Github] Pull Request #16361 (brkyvz)

[Github] Pull Request #16518 (brkyvz)

Activity

People

Assignee:: Burak Yavuz

Reporter:: Burak Yavuz

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 20/Dec/16 21:59

Updated:: 02/May/17 18:30

Resolved:: 09/Jan/17 23:18