Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-1692

Confusion code occurred while importing data from MySQL to HBase

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.4.4
    • 1.4.4
    • hbase-integration
    • None

    Description

      If the charset of MySQL is latin1(default) and tables contain Chinese characters, Importing data from MySQL to HBase will cause confusion code. Some guys said it's because charset "latin1"(similar with cp1252) of MySQL is not standard latin1(ISO-8859-1). ISO-8859-1 latin1 treats the code points between 0x80 and 0x9f as “undefined”.

      For details:
      latin1 is the default character set. MySQL's latin1 is the same as the Windows cp1252 character set. This means it is the same as the official ISO 8859-1 or IANA (Internet Assigned Numbers Authority) latin1, except that IANA latin1 treats the code points between 0x80 and 0x9f as “undefined,” whereas cp1252, and therefore MySQL's latin1, assign characters for those positions. For example, 0x80 is the Euro sign. For the “undefined” entries in cp1252, MySQL translates 0x81 to Unicode 0x0081, 0x8d to 0x008d, 0x8f to 0x008f, 0x90 to 0x0090, and 0x9d to 0x009d.

      Attachments

        Activity

          People

            Unassigned Unassigned
            alipayhuber Eric Huang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: