Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3245

UTF encoded data not displayed correctly by Hive driver

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.8.0
    • None
    • JDBC
    • None

    Description

      various foreign language data (i.e. japanese, thai etc) is loaded into string columns via tab delimited text files. A simple projection of the columns in the table is not displaying the correct data. Exporting the data from Hive and looking at the files implies the data is loaded properly. it appears to be an encoding issue at the driver but unaware of any required URL connection properties re encoding that Hive JDBC requires.

      create table if not exists CERT.TLJA_JP_E ( RNUM int , C1 string, ORD int)
      row format delimited
      fields terminated by '\t'
      stored as textfile;

      create table if not exists CERT.TLJA_JP ( RNUM int , C1 string, ORD int)
      stored as sequencefile;

      load data local inpath '/home/hadoopadmin/jdbc-cert/CERT/CERT.TLJA_JP.txt'
      overwrite into table CERT.TLJA_JP_E;
      insert overwrite table CERT.TLJA_JP select * from CERT.TLJA_JP_E;

      Attachments

        1. CERT.TLJA.txt
          3 kB
          N Campbell
        2. ASF.LICENSE.NOT.GRANTED--screenshot-1.jpg
          48 kB
          N Campbell

        Issue Links

          Activity

            People

              szehon Szehon Ho
              the6campbells N Campbell
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: