Details
-
Improvement
-
Status: Reopened
-
Not a Priority
-
Resolution: Unresolved
-
None
-
None
-
None
Description
UPDATE:
Flink now uses Calcite for SQL planner, Calcite currently only support ISO8859-1 charset and the charset cannot be configured also. But even so, from my perspective, we still need to change the PlannerExpressionParserImpl#fieldRefrence‘s charset, cuz JavaIdentifier also cannot meet.
Considering about the implementation, PlannerExpressionParserImpl uses the Scala native parser tool, which reads and consumes `scala.Char`(or just regard it as java char type). For us, concerning only about char type is enough, which means on the implementation, in this case, we don‘t even care about the charset problem, leading to A simple and backwards compatible solution.
The implementation almost the same as picture below indicates. Actually I have made this change in my company specific branch and deployed it. It works well~
**************************************************************************************
Now I am talking about the `PlannerExpressionParserImpl`
For now the fieldRefrence‘s charset is JavaIdentifier,why not change it to UnicodeIdentifier?
Currently in my team, we do actually have this problem. For instance, data from Es always contains `@timestamp` field , which JavaIdentifier can not meet. So what we did is just let the fieldRefrence Charset use Unicode
lazy val extensionIdent: Parser[String] = ( "" ~> // handle whitespace rep1(acceptIf(Character.isUnicodeIdentifierStart)("identifier expected but '" + _ + "' found"), elem("identifier part", Character.isUnicodeIdentifierPart(: Char))) ^^ (.mkString) ) lazy val fieldReference: PackratParser[UnresolvedReferenceExpression] = (STAR | ident | extensionIdent) ^^ { sym => unresolvedRef(sym) }
It is simple but really makes sense~
mysql supports unicode ,see the picture below , field called `@@`
Looking forward for any opinion