Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-27370

SUBSTR UDF return '?' against 4-bytes character

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • All Versions
    • None
    • UDF

    Description

      SUBSTR doesn't seem to support 4-byte characters. This also happens in master branch. Also, this does not happen in vectorized mode, so it is a problem specific to non-vectorized mode. An example is below:

      -- vectorized mode
      create temporary table foo (str string) stored as orc;
      insert into foo values('安佐町大字久地字野𨵱4614番地'), ('あa🤎いiうu');
      SELECT
        SUBSTR(str, 1, 10) as a1,
        SUBSTR(str, 10, 3) as a2,
        SUBSTR(str, -7) as a3,
        substr(str, 1, 3) as b1,
        substr(str, 3) as b2,
        substr(str, -5) as b3
      from foo
      ;
      安佐町大字久地字野𨵱  𨵱4614番地  安佐町       町大字久地字野𨵱4614番地     614番地
      あa🤎             あa🤎いiうu        あa🤎        🤎いiうu    🤎いiうu 
      -- non-vectorized
      SELECT
        SUBSTR('安佐町大字久地字野𨵱4614番地', 1, 10) as a1,
        SUBSTR('安佐町大字久地字野𨵱4614番地', 10, 3) as a2,
        SUBSTR('安佐町大字久地字野𨵱4614番地', -7) as a3,
        substr('あa🤎いiうu', 1, 3) as b1,
        substr('あa🤎いiうu', 3) as b2,
        substr('あa🤎いiうu', -5) as b3
      ; 
      安佐町大字久地字野?    �4   ?4614番地     あa?   �いiうu    ?いiうu

       

      Attachments

        Activity

          People

            ryukobayashi Ryu Kobayashi
            ryu_kobayashi Ryu Kobayashi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: