Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
All Versions
-
None
Description
SUBSTR doesn't seem to support 4-byte characters. This also happens in master branch. Also, this does not happen in vectorized mode, so it is a problem specific to non-vectorized mode. An example is below:
-- vectorized mode create temporary table foo (str string) stored as orc; insert into foo values('安佐町大字久地字野𨵱4614番地'), ('あa🤎いiうu'); SELECT SUBSTR(str, 1, 10) as a1, SUBSTR(str, 10, 3) as a2, SUBSTR(str, -7) as a3, substr(str, 1, 3) as b1, substr(str, 3) as b2, substr(str, -5) as b3 from foo ; 安佐町大字久地字野𨵱 𨵱4614番地 安佐町 町大字久地字野𨵱4614番地 614番地 あa🤎 あa🤎いiうu あa🤎 🤎いiうu 🤎いiうu
-- non-vectorized SELECT SUBSTR('安佐町大字久地字野𨵱4614番地', 1, 10) as a1, SUBSTR('安佐町大字久地字野𨵱4614番地', 10, 3) as a2, SUBSTR('安佐町大字久地字野𨵱4614番地', -7) as a3, substr('あa🤎いiうu', 1, 3) as b1, substr('あa🤎いiうu', 3) as b2, substr('あa🤎いiうu', -5) as b3 ; 安佐町大字久地字野? �4 ?4614番地 あa? �いiうu ?いiうu