Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-685

Sqoop will fail with OutOfMemory on large tables using mysql

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

      1. MAPREDUCE-685.3.patch
        17 kB
        Aaron Kimball
      2. MAPREDUCE-685.patch.2
        17 kB
        Aaron Kimball
      3. MAPREDUCE-685.patch
        15 kB
        Aaron Kimball

        Activity

        Aaron Kimball created issue -
        Aaron Kimball made changes -
        Field Original Value New Value
        Attachment MAPREDUCE-685.patch [ 12412206 ]
        Aaron Kimball made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Aaron Kimball added a comment -

        Added patch that fixes this issue. Also includes some other performance enhancements:

        • MySQL now uses "LIMIT 1" when making SELECTs against tables for metadata-reading purposes.
        • Transactions are no longer opened with TRANSACTION_SERIALIZABLE since it's unnecessary for metadata reads

        No new tests for this included since Hadoop testing doesn't mesh well with MySQL. I tested locally by building a 1.7 GB table in mysql and reading into a local HDFS instance. This failed before applying the patch, and succeeds afterwards.

        Show
        Aaron Kimball added a comment - Added patch that fixes this issue. Also includes some other performance enhancements: MySQL now uses "LIMIT 1" when making SELECTs against tables for metadata-reading purposes. Transactions are no longer opened with TRANSACTION_SERIALIZABLE since it's unnecessary for metadata reads No new tests for this included since Hadoop testing doesn't mesh well with MySQL. I tested locally by building a 1.7 GB table in mysql and reading into a local HDFS instance. This failed before applying the patch, and succeeds afterwards.
        Hide
        Todd Lipcon added a comment -

        Couple notes:

        • The SQL_BIG_RESULT hint I mentioned (offline) was meant for the query that actually returns lots of rows. If you're doing LIMIT 1 you don't need it.
        • Why check against null stmt in execute()? Isn't it assumed that passing null here would throw an NPE?
        • Also, why return null here instead of letting the SQLException fall through?
        Show
        Todd Lipcon added a comment - Couple notes: The SQL_BIG_RESULT hint I mentioned (offline) was meant for the query that actually returns lots of rows. If you're doing LIMIT 1 you don't need it. Why check against null stmt in execute()? Isn't it assumed that passing null here would throw an NPE? Also, why return null here instead of letting the SQLException fall through?
        Aaron Kimball made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Aaron Kimball made changes -
        Attachment MAPREDUCE-685.patch.2 [ 12412298 ]
        Aaron Kimball made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Aaron Kimball added a comment -

        Removed SQL_BIG_RESULT. Also, good call re. the null check; no reason not to pass straight through. I've modified the API for execute() accordingly.

        Show
        Aaron Kimball added a comment - Removed SQL_BIG_RESULT. Also, good call re. the null check; no reason not to pass straight through. I've modified the API for execute() accordingly.
        Aaron Kimball made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Aaron Kimball made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12412298/MAPREDUCE-685.patch.2
        against trunk revision 790971.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12412298/MAPREDUCE-685.patch.2 against trunk revision 790971. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/console This message is automatically generated.
        Hide
        Aaron Kimball added a comment -

        These test errors are unrelated to this patch.

        Show
        Aaron Kimball added a comment - These test errors are unrelated to this patch.
        Hide
        Tom White added a comment -

        The latest patch no longer applies. Can you please regenerate it Aaron?

        Show
        Tom White added a comment - The latest patch no longer applies. Can you please regenerate it Aaron?
        Aaron Kimball made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Aaron Kimball made changes -
        Attachment MAPREDUCE-685.3.patch [ 12412766 ]
        Aaron Kimball made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Aaron Kimball added a comment -

        Attaching rebased patch.

        Show
        Aaron Kimball added a comment - Attaching rebased patch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12412766/MAPREDUCE-685.3.patch
        against trunk revision 791909.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12412766/MAPREDUCE-685.3.patch against trunk revision 791909. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/console This message is automatically generated.
        Hide
        Aaron Kimball added a comment -

        New patch test failures are unrelated to this patch.

        Show
        Aaron Kimball added a comment - New patch test failures are unrelated to this patch.
        Hide
        Tom White added a comment -

        I've just committed this. Thanks Aaron!

        Show
        Tom White added a comment - I've just committed this. Thanks Aaron!
        Tom White made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Fix Version/s 0.21.0 [ 12314045 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #20 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/20/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #20 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/20/ )
        Hide
        Martin Dittus added a comment -

        We just found that PostgreSQL shows the same behaviour. What do you think of making this a generic fix instead? It seems Postgres has the same mechanism to enable streaming of ResultSets:

        http://jdbc.postgresql.org/documentation/83/query.html – "Changing code to cursor mode is as simple as setting the fetch size of the Statement to the appropriate size. Setting the fetch size back to 0 will cause all rows to be cached (the default behaviour)."

        Show
        Martin Dittus added a comment - We just found that PostgreSQL shows the same behaviour. What do you think of making this a generic fix instead? It seems Postgres has the same mechanism to enable streaming of ResultSets: http://jdbc.postgresql.org/documentation/83/query.html – "Changing code to cursor mode is as simple as setting the fetch size of the Statement to the appropriate size. Setting the fetch size back to 0 will cause all rows to be cached (the default behaviour)."
        Hide
        Aaron Kimball added a comment -

        Because it's not actually the same fix Postgresql wants you to do statement.setFetchSize(something_reasonable) e.g., 40.

        MySQL wants you to do statement.setFetchSize(INT_MIN). The only cursor modes MySQL supports are fully buffered (fetch size = 0) and fully row-wise cursors (fetch_size = INT_MIN).

        That having been said, I have just finished a postgresql patch ready to post up here this week Just waiting for some existing patches to get committed first so that it applies cleanly.

        Show
        Aaron Kimball added a comment - Because it's not actually the same fix Postgresql wants you to do statement.setFetchSize(something_reasonable) e.g., 40. MySQL wants you to do statement.setFetchSize(INT_MIN) . The only cursor modes MySQL supports are fully buffered (fetch size = 0) and fully row-wise cursors (fetch_size = INT_MIN). That having been said, I have just finished a postgresql patch ready to post up here this week Just waiting for some existing patches to get committed first so that it applies cleanly.
        Tom White made changes -
        Fix Version/s 0.21.0 [ 12314045 ]
        Chris Douglas made changes -
        Component/s contrib/sqoop [ 12312930 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Patch Available Patch Available Open Open
        6d 20h 39m 3 Aaron Kimball 07/Jul/09 18:55
        Open Open Patch Available Patch Available
        1m 10s 4 Aaron Kimball 07/Jul/09 18:56
        Patch Available Patch Available Resolved Resolved
        1d 18h 21m 1 Tom White 09/Jul/09 13:18

          People

          • Assignee:
            Aaron Kimball
            Reporter:
            Aaron Kimball
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development