Sqoop
  1. Sqoop
  2. SQOOP-463

Sqoop User Guide's troubleshooting section should explain how to override the column mapping when importing a table from MySQL into Hive

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.1-incubating
    • Fix Version/s: 1.4.2
    • Component/s: docs
    • Labels:
      None

      Description

      When importing a table from MySQL into Hive where one of the columns is of datatype TINYINT(1), Sqoop will automatically map this to the Hive datatype boolean. This is because the MySQL JDBC connector maps the TINYINT(1) datatype to java.sql.Types.BIT, which Sqoop by default maps to the Hive type Boolean. Consequently, if you have values such as 1 or 0 in this column, they will fail to parse correctly in Hive, instead appearing as all NULL values.

      1. SQOOP-463-2.patch
        1 kB
        Kathleen Ting
      2. SQOOP-463-1.patch
        1 kB
        Kathleen Ting
      3. SQOOP-463.patch
        1 kB
        Kathleen Ting

        Activity

        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4357/
        -----------------------------------------------------------

        Review request for Sqoop and Arvind Prabhakar.

        Summary
        -------

        Updated documentation.

        This addresses bug SQOOP-463.
        https://issues.apache.org/jira/browse/SQOOP-463

        Diffs


        ./src/docs/user/troubleshooting.txt 1300726

        Diff: https://reviews.apache.org/r/4357/diff

        Testing
        -------

        Built docs and verified.

        Thanks,

        Kathleen

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4357/ ----------------------------------------------------------- Review request for Sqoop and Arvind Prabhakar. Summary ------- Updated documentation. This addresses bug SQOOP-463 . https://issues.apache.org/jira/browse/SQOOP-463 Diffs ./src/docs/user/troubleshooting.txt 1300726 Diff: https://reviews.apache.org/r/4357/diff Testing ------- Built docs and verified. Thanks, Kathleen
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4357/#review5969
        -----------------------------------------------------------

        Ship it!

        +1

        ./src/docs/user/troubleshooting.txt
        <https://reviews.apache.org/r/4357/#comment12943>

        It looks like space is missing between TINYINT(1) and datatype.

        • Cheolsoo

        On 2012-03-14 22:47:39, Kathleen Ting wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4357/

        -----------------------------------------------------------

        (Updated 2012-03-14 22:47:39)

        Review request for Sqoop and Arvind Prabhakar.

        Summary

        -------

        Updated documentation.

        This addresses bug SQOOP-463.

        https://issues.apache.org/jira/browse/SQOOP-463

        Diffs

        -----

        ./src/docs/user/troubleshooting.txt 1300726

        Diff: https://reviews.apache.org/r/4357/diff

        Testing

        -------

        Built docs and verified.

        Thanks,

        Kathleen

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4357/#review5969 ----------------------------------------------------------- Ship it! +1 ./src/docs/user/troubleshooting.txt < https://reviews.apache.org/r/4357/#comment12943 > It looks like space is missing between TINYINT(1) and datatype. Cheolsoo On 2012-03-14 22:47:39, Kathleen Ting wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4357/ ----------------------------------------------------------- (Updated 2012-03-14 22:47:39) Review request for Sqoop and Arvind Prabhakar. Summary ------- Updated documentation. This addresses bug SQOOP-463 . https://issues.apache.org/jira/browse/SQOOP-463 Diffs ----- ./src/docs/user/troubleshooting.txt 1300726 Diff: https://reviews.apache.org/r/4357/diff Testing ------- Built docs and verified. Thanks, Kathleen
        Hide
        Kathleen Ting added a comment -

        Fixed typo (missing space between two words). Thanks to Cheolsoo for catching that.

        Show
        Kathleen Ting added a comment - Fixed typo (missing space between two words). Thanks to Cheolsoo for catching that.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4357/
        -----------------------------------------------------------

        (Updated 2012-03-14 23:08:50.341102)

        Review request for Sqoop and Arvind Prabhakar.

        Summary
        -------

        Updated documentation.

        This addresses bug SQOOP-463.
        https://issues.apache.org/jira/browse/SQOOP-463

        Diffs (updated)


        ./src/docs/user/troubleshooting.txt 1300726

        Diff: https://reviews.apache.org/r/4357/diff

        Testing
        -------

        Built docs and verified.

        Thanks,

        Kathleen

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4357/ ----------------------------------------------------------- (Updated 2012-03-14 23:08:50.341102) Review request for Sqoop and Arvind Prabhakar. Summary ------- Updated documentation. This addresses bug SQOOP-463 . https://issues.apache.org/jira/browse/SQOOP-463 Diffs (updated) ./src/docs/user/troubleshooting.txt 1300726 Diff: https://reviews.apache.org/r/4357/diff Testing ------- Built docs and verified. Thanks, Kathleen
        Hide
        Jarek Jarcec Cecho added a comment - - edited

        I personally believe that mapping TINYINT(1) to java.sql.Types.BIT might be a bug in MySQL JDBC connector. As far as I know, the length field in numeric types in MySQL is only used for setting default display width and it's not related to actual data size. So that column with TINYINT(1) might contain higher values than 1. I was able to verify this piece of information in MySQL manual (see [1]) and live in running server (see [2]).

        [1] http://dev.mysql.com/doc/refman/5.6/en/numeric-type-overview.html

        [2] MySQL calls:

        mysql> create table pokus(col tinyint(1));
        Query OK, 0 rows affected (0.06 sec)

        mysql> describe pokus;
        ------------------------------------+

        Field Type Null Key Default Extra

        ------------------------------------+

        col tinyint(1) YES   NULL  

        ------------------------------------+
        1 row in set (0.00 sec)

        mysql> insert into pokus values(1);
        Query OK, 1 row affected (0.00 sec)

        mysql> insert into pokus values(2);
        Query OK, 1 row affected (0.00 sec)

        mysql> insert into pokus values(3);
        Query OK, 1 row affected (0.00 sec)

        mysql> insert into pokus values(10);
        Query OK, 1 row affected (0.00 sec)

        mysql> select * from pokus;
        ------

        col

        ------

        1
        2
        3
        10

        ------
        4 rows in set (0.00 sec)

        Show
        Jarek Jarcec Cecho added a comment - - edited I personally believe that mapping TINYINT(1) to java.sql.Types.BIT might be a bug in MySQL JDBC connector. As far as I know, the length field in numeric types in MySQL is only used for setting default display width and it's not related to actual data size. So that column with TINYINT(1) might contain higher values than 1. I was able to verify this piece of information in MySQL manual (see [1] ) and live in running server (see [2] ). [1] http://dev.mysql.com/doc/refman/5.6/en/numeric-type-overview.html [2] MySQL calls: mysql> create table pokus(col tinyint(1)); Query OK, 0 rows affected (0.06 sec) mysql> describe pokus; ------ ---------- ---- --- ------- ------+ Field Type Null Key Default Extra ------ ---------- ---- --- ------- ------+ col tinyint(1) YES   NULL   ------ ---------- ---- --- ------- ------+ 1 row in set (0.00 sec) mysql> insert into pokus values(1); Query OK, 1 row affected (0.00 sec) mysql> insert into pokus values(2); Query OK, 1 row affected (0.00 sec) mysql> insert into pokus values(3); Query OK, 1 row affected (0.00 sec) mysql> insert into pokus values(10); Query OK, 1 row affected (0.00 sec) mysql> select * from pokus; ------ col ------ 1 2 3 10 ------ 4 rows in set (0.00 sec)
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4357/
        -----------------------------------------------------------

        (Updated 2012-03-20 20:10:41.312155)

        Review request for Sqoop and Jarek Cecho.

        Summary
        -------

        Updated documentation.

        This addresses bug SQOOP-463.
        https://issues.apache.org/jira/browse/SQOOP-463

        Diffs


        ./src/docs/user/troubleshooting.txt 1300726

        Diff: https://reviews.apache.org/r/4357/diff

        Testing
        -------

        Built docs and verified.

        Thanks,

        Kathleen

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4357/ ----------------------------------------------------------- (Updated 2012-03-20 20:10:41.312155) Review request for Sqoop and Jarek Cecho. Summary ------- Updated documentation. This addresses bug SQOOP-463 . https://issues.apache.org/jira/browse/SQOOP-463 Diffs ./src/docs/user/troubleshooting.txt 1300726 Diff: https://reviews.apache.org/r/4357/diff Testing ------- Built docs and verified. Thanks, Kathleen
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4357/#review6138
        -----------------------------------------------------------

        I did not yet try how MySQL JDBC driver will behave myself, but column defined as TINYINT(1) might contain valid values in range 0-255 (in unsigned form). So if the driver is reporting type BIT for such columns, then there must be some data truncation. Therefore I would suggest extending this new section also for normal (non-hive) import case.

        • Jarek

        On 2012-03-20 20:10:41, Kathleen Ting wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4357/

        -----------------------------------------------------------

        (Updated 2012-03-20 20:10:41)

        Review request for Sqoop and Jarek Cecho.

        Summary

        -------

        Updated documentation.

        This addresses bug SQOOP-463.

        https://issues.apache.org/jira/browse/SQOOP-463

        Diffs

        -----

        ./src/docs/user/troubleshooting.txt 1300726

        Diff: https://reviews.apache.org/r/4357/diff

        Testing

        -------

        Built docs and verified.

        Thanks,

        Kathleen

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4357/#review6138 ----------------------------------------------------------- I did not yet try how MySQL JDBC driver will behave myself, but column defined as TINYINT(1) might contain valid values in range 0-255 (in unsigned form). So if the driver is reporting type BIT for such columns, then there must be some data truncation. Therefore I would suggest extending this new section also for normal (non-hive) import case. Jarek On 2012-03-20 20:10:41, Kathleen Ting wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4357/ ----------------------------------------------------------- (Updated 2012-03-20 20:10:41) Review request for Sqoop and Jarek Cecho. Summary ------- Updated documentation. This addresses bug SQOOP-463 . https://issues.apache.org/jira/browse/SQOOP-463 Diffs ----- ./src/docs/user/troubleshooting.txt 1300726 Diff: https://reviews.apache.org/r/4357/diff Testing ------- Built docs and verified. Thanks, Kathleen
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4357/
        -----------------------------------------------------------

        (Updated 2012-03-21 01:49:19.048841)

        Review request for Sqoop and Jarek Cecho.

        Changes
        -------

        Added use case of non-Hive import to HDFS.

        Summary
        -------

        Updated documentation.

        This addresses bug SQOOP-463.
        https://issues.apache.org/jira/browse/SQOOP-463

        Diffs (updated)


        ./src/docs/user/troubleshooting.txt 1300726

        Diff: https://reviews.apache.org/r/4357/diff

        Testing
        -------

        Built docs and verified.

        Thanks,

        Kathleen

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4357/ ----------------------------------------------------------- (Updated 2012-03-21 01:49:19.048841) Review request for Sqoop and Jarek Cecho. Changes ------- Added use case of non-Hive import to HDFS. Summary ------- Updated documentation. This addresses bug SQOOP-463 . https://issues.apache.org/jira/browse/SQOOP-463 Diffs (updated) ./src/docs/user/troubleshooting.txt 1300726 Diff: https://reviews.apache.org/r/4357/diff Testing ------- Built docs and verified. Thanks, Kathleen
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4357/#review6167
        -----------------------------------------------------------

        Ship it!

        Thank you for the change Kate!

        • Jarek

        On 2012-03-21 01:49:19, Kathleen Ting wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4357/

        -----------------------------------------------------------

        (Updated 2012-03-21 01:49:19)

        Review request for Sqoop and Jarek Cecho.

        Summary

        -------

        Updated documentation.

        This addresses bug SQOOP-463.

        https://issues.apache.org/jira/browse/SQOOP-463

        Diffs

        -----

        ./src/docs/user/troubleshooting.txt 1300726

        Diff: https://reviews.apache.org/r/4357/diff

        Testing

        -------

        Built docs and verified.

        Thanks,

        Kathleen

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4357/#review6167 ----------------------------------------------------------- Ship it! Thank you for the change Kate! Jarek On 2012-03-21 01:49:19, Kathleen Ting wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4357/ ----------------------------------------------------------- (Updated 2012-03-21 01:49:19) Review request for Sqoop and Jarek Cecho. Summary ------- Updated documentation. This addresses bug SQOOP-463 . https://issues.apache.org/jira/browse/SQOOP-463 Diffs ----- ./src/docs/user/troubleshooting.txt 1300726 Diff: https://reviews.apache.org/r/4357/diff Testing ------- Built docs and verified. Thanks, Kathleen
        Hide
        Kathleen Ting added a comment -

        Added use case of non-Hive import to HDFS.

        Show
        Kathleen Ting added a comment - Added use case of non-Hive import to HDFS.
        Hide
        Jarek Jarcec Cecho added a comment -

        Patch committed, thank you Kate!

        Show
        Jarek Jarcec Cecho added a comment - Patch committed, thank you Kate!
        Hide
        Hudson added a comment -

        Integrated in Sqoop-ant-jdk-1.6 #99 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6/99/)
        SQOOP-463. Sqoop User Guide's troubleshooting section should explain how to override the column mapping when importing a table from MySQL into Hive.

        (Kathleen Ting via Jarek Jarcec Cecho) (Revision 1303911)

        Result = SUCCESS
        jarcec : http://svn.apache.org/viewvc/?view=rev&rev=1303911
        Files :

        • /incubator/sqoop/trunk/src/docs/user/troubleshooting.txt
        Show
        Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6 #99 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6/99/ ) SQOOP-463 . Sqoop User Guide's troubleshooting section should explain how to override the column mapping when importing a table from MySQL into Hive. (Kathleen Ting via Jarek Jarcec Cecho) (Revision 1303911) Result = SUCCESS jarcec : http://svn.apache.org/viewvc/?view=rev&rev=1303911 Files : /incubator/sqoop/trunk/src/docs/user/troubleshooting.txt

          People

          • Assignee:
            Kathleen Ting
            Reporter:
            Kathleen Ting
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development