Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-11112

ISO-8859-1 text output has fragments of previous longer rows appended

    Details

      Description

      If a LazySimpleSerDe table is created using ISO 8859-1 encoding, query results for a string column are incorrect for any row that was preceded by a row containing a longer string.

      Example steps to reproduce:

      1. Create a table using ISO 8859-1 encoding:

      CREATE TABLE person_lat1 (name STRING)
      ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1');
      

      2. Copy an ISO-8859-1 encoded text file into the appropriate warehouse folder in HDFS. I'll attach an example file containing the following text:

      Müller,Thomas
      Jørgensen,Jørgen
      Peña,Andrés
      Nåm,Fæk
      

      3. Execute SELECT * FROM person_lat1

      Result - The following output appears:

      +-------------------+--+
      | person_lat1.name |
      +-------------------+--+
      | Müller,Thomas |
      | Jørgensen,Jørgen |
      | Peña,Andrésørgen |
      | Nåm,Fækdrésørgen |
      +-------------------+--+
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ychena Yongzhi Chen
                Reporter:
                ychena Yongzhi Chen
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: