Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7493

convert_fromJSON and unicode

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.16.0
    • Fix Version/s: None
    • Component/s: Functions - Drill
    • Labels:
      None

      Description

      transform a json string (with \uxxxx char) into json struct

      apache drill> SELECT x_str, convert_fromJSON(x_str) AS x_array 
      FROM (SELECT '["test=\u0014=test"]' x_str);
      +----------------------+----------------------+
      |        x_str         |       x_array        |
      +----------------------+----------------------+
      | ["test=\u0014=test"] | ["test=\u0014=test"] |
      +----------------------+----------------------+
      

      Use json struct :

      apache drill> SELECT x_str
      , x_array
      , x_array[0] AS x_array0 
      FROM(SELECT x_str, convert_fromJSON(x_str) AS x_array
      FROM (SELECT '["test=\u0014=test"]' x_str));
      +----------------------+----------------------+-------------+
      |        x_str         |       x_array        |  x_array0   |
      +----------------------+----------------------+-------------+
      | ["test=\u0014=test"] | ["test=\u0014=test"] | test==test |
      +----------------------+----------------------+-------------+
      

      Note that the char \u0014 is interpreted in x_array0

      if using split function on x_array0, an array is built with non interpreted \uxxxx

      apache drill> SELECT x_str
      , x_array
      , x_array[0] AS x_array0
      , split(x_array[0],',') AS x_array0_split 
      FROM(SELECT x_str, convert_fromJSON(x_str) AS x_array 
      FROM (SELECT '["test=\u0014=test"]' x_str));
      +----------------------+----------------------+-------------+----------------------+
      |        x_str         |       x_array        |  x_array0   |    x_array0_split    |
      +----------------------+----------------------+-------------+----------------------+
      | ["test=\u0014=test"] | ["test=\u0014=test"] | test==test | ["test=\u0014=test"] |
      +----------------------+----------------------+-------------+----------------------+
      

      It's not possible to use convert_fromJSON on the interpreted \uxxxx

      SELECT x_str
      , x_array
      , x_array[0] AS x_array0
      , split(x_array[0],',') AS x_array0_split
      , convert_fromJSON('["' || x_array[0] || '"]') AS convertJSONerror 
      FROM(SELECT x_str, convert_fromJSON(x_str) AS x_array 
      FROM (SELECT '["test=\u0014=test"]' x_str));
      Error: DATA_READ ERROR: Illegal unquoted character ((CTRL-CHAR, code 20)): has to be escaped using backslash to be included in string value
       at [Source: (org.apache.drill.exec.vector.complex.fn.DrillBufInputStream); line: 1, column: 9]
      

      don't work although the string is the same as the origin but \uxxxx is unfortunatly interpreted

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              benj641 benj
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: