Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12451

Regexp functions don't support patterns containing '*/'

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.5.2
    • None
    • SQL
    • None

    Description

      When using the regexp functions in Spark SQL, patterns containing '*/' create runtime errors in the auto generated code. This is due to the fact that the code generator creates a multiline comment containing, amongst other things, the pattern.

      Here is an excerpt from my stacktrace to illustrate: (Helpfully, the stack trace includes all of the auto-generated code)

      Caused by: org.codehaus.commons.compiler.CompileException: Line 232, Column 54: Unexpected token "," in primary
      	at org.codehaus.janino.Parser.compileException(Parser.java:3125)
      	at org.codehaus.janino.Parser.parsePrimary(Parser.java:2512)
      	at org.codehaus.janino.Parser.parseUnaryExpression(Parser.java:2252)
      	at org.codehaus.janino.Parser.parseMultiplicativeExpression(Parser.java:2211)
      	at org.codehaus.janino.Parser.parseAdditiveExpression(Parser.java:2190)
      	at org.codehaus.janino.Parser.parseShiftExpression(Parser.java:2169)
      	at org.codehaus.janino.Parser.parseRelationalExpression(Parser.java:2072)
      	at org.codehaus.janino.Parser.parseEqualityExpression(Parser.java:2046)
      	at org.codehaus.janino.Parser.parseAndExpression(Parser.java:2025)
      	at org.codehaus.janino.Parser.parseExclusiveOrExpression(Parser.java:2004)
      	at org.codehaus.janino.Parser.parseInclusiveOrExpression(Parser.java:1983)
      	at org.codehaus.janino.Parser.parseConditionalAndExpression(Parser.java:1962)
      	at org.codehaus.janino.Parser.parseConditionalOrExpression(Parser.java:1941)
      	at org.codehaus.janino.Parser.parseConditionalExpression(Parser.java:1922)
      	at org.codehaus.janino.Parser.parseAssignmentExpression(Parser.java:1901)
      	at org.codehaus.janino.Parser.parseExpression(Parser.java:1886)
      	at org.codehaus.janino.Parser.parseBlockStatement(Parser.java:1149)
      	at org.codehaus.janino.Parser.parseBlockStatements(Parser.java:1085)
      	at org.codehaus.janino.Parser.parseMethodDeclarationRest(Parser.java:938)
      	at org.codehaus.janino.Parser.parseClassBodyDeclaration(Parser.java:620)
      	at org.codehaus.janino.Parser.parseClassBody(Parser.java:515)
      	at org.codehaus.janino.Parser.parseClassDeclarationRest(Parser.java:481)
      	at org.codehaus.janino.Parser.parseClassBodyDeclaration(Parser.java:577)
      	at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:229)
      	at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:192)
      	at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:84)
      	at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:77)
      	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:387)
      
      ... line 232 ...
      
          /* regexp_replace(input[46, StringType],^.*/,) */
          
          /* input[46, StringType] */
          
          boolean isNull31 = i.isNullAt(46);
          UTF8String primitive32 = isNull31 ? null : (i.getUTF8String(46));
          
          boolean isNull24 = true;
          UTF8String primitive25 = null;
          if (!isNull31) {
            /* ^.*/ */
            
            /* expression: ^.*/ */
            Object obj35 = expressions[4].eval(i);
            boolean isNull33 = obj35 == null;
            UTF8String primitive34 = null;
            if (!isNull33) {
              primitive34 = (UTF8String) obj35;
            }
      ...
      

      Note the multiple multiline comments, these obviously break when the regex pattern contains the end-of-comment token '*/'

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              willrdee William Dee
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: