[FLINK-15573] Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Reopened
Priority: Not a Priority
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Table SQL / Planner
Labels:
None

Description

UPDATE:

Flink now uses Calcite for SQL planner, Calcite currently only support ISO8859-1 charset and the charset cannot be configured also. But even so, from my perspective, we still need to change the PlannerExpressionParserImpl#fieldRefrence‘s charset, cuz JavaIdentifier also cannot meet.

Considering about the implementation, PlannerExpressionParserImpl uses the Scala native parser tool, which reads and consumes `scala.Char`(or just regard it as java char type). For us, concerning only about char type is enough, which means on the implementation, in this case, we don‘t even care about the charset problem, leading to A simple and backwards compatible solution.

The implementation almost the same as picture below indicates. Actually I have made this change in my company specific branch and deployed it. It works well~

**************************************************************************************

Now I am talking about the `PlannerExpressionParserImpl`

For now the fieldRefrence‘s charset is JavaIdentifier，why not change it to UnicodeIdentifier?

Currently in my team, we do actually have this problem. For instance, data from Es always contains `@timestamp` field , which JavaIdentifier can not meet. So what we did is just let the fieldRefrence Charset use Unicode

 lazy val extensionIdent: Parser[String] = ( "" ~> // handle whitespace rep1(acceptIf(Character.isUnicodeIdentifierStart)("identifier expected but '" + _ + "' found"), elem("identifier part", Character.isUnicodeIdentifierPart(: Char))) ^^ (.mkString) ) 
 lazy val fieldReference: PackratParser[UnresolvedReferenceExpression] = (STAR | ident | extensionIdent) ^^ { sym => unresolvedRef(sym) }

It is simple but really makes sense~

mysql supports unicode ,see the picture below , field called `@@`

Looking forward for any opinion

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2020-01-15-21-49-19-373.png
15/Jan/20 13:49
23 kB
Lsw_aka_laplace

Activity

People

Assignee:: Unassigned

Reporter:: Lsw_aka_laplace

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 13/Jan/20 11:31

Updated:: 29/Jul/21 07:35