Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-15573

Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Reopened
    • Not a Priority
    • Resolution: Unresolved
    • None
    • None
    • Table SQL / Planner
    • None

    Description

       UPDATE:

        Flink now uses Calcite for SQL planner, Calcite currently only support ISO8859-1 charset and the charset cannot be configured also. But even so, from my perspective, we still need to change the PlannerExpressionParserImpl#fieldRefrence‘s charset, cuz JavaIdentifier also cannot meet. 

        Considering about the implementation, PlannerExpressionParserImpl uses the Scala native parser tool, which reads and consumes `scala.Char`(or just regard it as java char type). For us, concerning only about char type is enough, which means on the implementation, in this case, we don‘t even care about the charset problem, leading to A simple and backwards compatible solution.

        The implementation almost the same as picture below indicates. Actually I have made this change in my company specific branch and deployed it. It works well~

       

      **************************************************************************************

      Now I am talking about the `PlannerExpressionParserImpl`

          For now  the fieldRefrence‘s  charset is JavaIdentifier,why not change it to UnicodeIdentifier?

          Currently in my team, we do actually have this problem. For instance, data from Es always contains `@timestamp` field , which JavaIdentifier can not meet. So what we did is just let the fieldRefrence Charset use Unicode

       

       lazy val extensionIdent: Parser[String] = ( "" ~> // handle whitespace rep1(acceptIf(Character.isUnicodeIdentifierStart)("identifier expected but '" + _ + "' found"), elem("identifier part", Character.isUnicodeIdentifierPart(: Char))) ^^ (.mkString) ) 
       lazy val fieldReference: PackratParser[UnresolvedReferenceExpression] = (STAR | ident | extensionIdent) ^^ { sym => unresolvedRef(sym) }

       

      It is simple but really makes sense~

       

      mysql supports unicode ,see the picture below , field called `@@`  

      Looking forward for any opinion

       

      Attachments

        1. image-2020-01-15-21-49-19-373.png
          23 kB
          Lsw_aka_laplace

        Activity

          People

            Unassigned Unassigned
            neighborhood Lsw_aka_laplace
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: