Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-14079 Full data type support in planner
  3. FLINK-10257

Incorrect CHAR type support in Flink SQL and Table API

    XMLWordPrintableJSON

Details

    Description

      Despite that we officially do not support CHAR type, this type is visible and accessible for the users. First of all, string literals have default type of CHAR in SQL. Secondly users can always cast expressions/columns to CHAR.

      Problem is that we do not support CHAR correctly. We mishandle it in:

      1. comparisons and functions
      2. writing values to sinks

      According to SQL standard (and as almost all of the other databases do), CHAR comparisons should ignore white spaces. On the other hand, functions like CONCAT or LENGTH shouldn't: http://troels.arvin.dk/db/rdbms/#data_types-char .

      Currently in In Flink we completely ignore those rules. Sometimes we store internally CHAR with padded spaces sometimes without. This results with semi random behaviour with respect to comparisons/functions/writing to sinks. For example following query:

      tEnv.sqlQuery("SELECT CAST(s AS CHAR(10)) FROM sourceTable").insertInto("targetTable")
      env.execute()
      

      Where `sourceTable` has single VARCHAR(10) column with values: "Hi", "Hello", "Hello world", writes to sink not padded strings (correctly), but following query:

      tEnv.sqlQuery("SELECT * FROM (SELECT CAST(s AS CHAR(10)) c FROM sourceTable) WHERE c = 'Hi'")
        .insertInto("targetTable")
      env.execute()

      Incorrectly filters out all of the results, because CAST(s AS CHAR(10)) is a NOOP in Flink, while 'Hi' constant handed by Calcite to us will be padded with 8 spaces.

      On the other hand following query produces strings padded with spaces:

      tEnv.sqlQuery("SELECT CASE l WHEN 1 THEN 'GERMANY' WHEN 2 THEN 'POLAND' ELSE 'this should not happen' END FROM sourceTable")
        .insertInto("targetTable")
      env.execute()
      
      val expected = Seq(
        "GERMANY",
        "POLAND",
        "POLAND").mkString("\n")
      
      org.junit.ComparisonFailure: Different elements in arrays: expected 3 elements and received 3
      expected: [GERMANY, POLAND, POLAND]
      received: [GERMANY , POLAND , POLAND ]
      

      To make matter even worse, Calcite's constant folding correctly performs comparisons, while if same comparisons are performed by Flink, they yield different results. In other words in SQL:

      SELECT 'POLAND' = 'POLAND    '
      

      return true, but same expression performed on columns

      SELECT CAST(country as CHAR(10)) = CAST(country_padded as CHAR(10)) FROM countries

      returns false.

      To further complicated things, in SQL our string literals have CHAR type, while in Table API our literals have String type (effectively VARCHAR) making results inconsistent between those two APIs.

       

      CC twalthr fhueske hequn8128

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pnowojski Piotr Nowojski
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m