Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-685

Sqoop will fail with OutOfMemory on large tables using mysql

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client.

      1. MAPREDUCE-685.3.patch
        17 kB
        Aaron Kimball
      2. MAPREDUCE-685.patch
        15 kB
        Aaron Kimball
      3. MAPREDUCE-685.patch.2
        17 kB
        Aaron Kimball

        Activity

        Chris Douglas made changes -
        Component/s contrib/sqoop [ 12312930 ]
        Tom White made changes -
        Fix Version/s 0.21.0 [ 12314045 ]
        Hide
        Aaron Kimball added a comment -

        Because it's not actually the same fix Postgresql wants you to do statement.setFetchSize(something_reasonable) e.g., 40.

        MySQL wants you to do statement.setFetchSize(INT_MIN). The only cursor modes MySQL supports are fully buffered (fetch size = 0) and fully row-wise cursors (fetch_size = INT_MIN).

        That having been said, I have just finished a postgresql patch ready to post up here this week Just waiting for some existing patches to get committed first so that it applies cleanly.

        Show
        Aaron Kimball added a comment - Because it's not actually the same fix Postgresql wants you to do statement.setFetchSize(something_reasonable) e.g., 40. MySQL wants you to do statement.setFetchSize(INT_MIN) . The only cursor modes MySQL supports are fully buffered (fetch size = 0) and fully row-wise cursors (fetch_size = INT_MIN). That having been said, I have just finished a postgresql patch ready to post up here this week Just waiting for some existing patches to get committed first so that it applies cleanly.
        Hide
        Martin Dittus added a comment -

        We just found that PostgreSQL shows the same behaviour. What do you think of making this a generic fix instead? It seems Postgres has the same mechanism to enable streaming of ResultSets:

        http://jdbc.postgresql.org/documentation/83/query.html – "Changing code to cursor mode is as simple as setting the fetch size of the Statement to the appropriate size. Setting the fetch size back to 0 will cause all rows to be cached (the default behaviour)."

        Show
        Martin Dittus added a comment - We just found that PostgreSQL shows the same behaviour. What do you think of making this a generic fix instead? It seems Postgres has the same mechanism to enable streaming of ResultSets: http://jdbc.postgresql.org/documentation/83/query.html – "Changing code to cursor mode is as simple as setting the fetch size of the Statement to the appropriate size. Setting the fetch size back to 0 will cause all rows to be cached (the default behaviour)."
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #20 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/20/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #20 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/20/ )
        Tom White made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Fix Version/s 0.21.0 [ 12314045 ]
        Resolution Fixed [ 1 ]
        Hide
        Tom White added a comment -

        I've just committed this. Thanks Aaron!

        Show
        Tom White added a comment - I've just committed this. Thanks Aaron!
        Hide
        Aaron Kimball added a comment -

        New patch test failures are unrelated to this patch.

        Show
        Aaron Kimball added a comment - New patch test failures are unrelated to this patch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12412766/MAPREDUCE-685.3.patch
        against trunk revision 791909.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12412766/MAPREDUCE-685.3.patch against trunk revision 791909. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/364/console This message is automatically generated.
        Hide
        Aaron Kimball added a comment -

        Attaching rebased patch.

        Show
        Aaron Kimball added a comment - Attaching rebased patch.
        Aaron Kimball made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Aaron Kimball made changes -
        Attachment MAPREDUCE-685.3.patch [ 12412766 ]
        Aaron Kimball made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Tom White added a comment -

        The latest patch no longer applies. Can you please regenerate it Aaron?

        Show
        Tom White added a comment - The latest patch no longer applies. Can you please regenerate it Aaron?
        Hide
        Aaron Kimball added a comment -

        These test errors are unrelated to this patch.

        Show
        Aaron Kimball added a comment - These test errors are unrelated to this patch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12412298/MAPREDUCE-685.patch.2
        against trunk revision 790971.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12412298/MAPREDUCE-685.patch.2 against trunk revision 790971. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/console This message is automatically generated.
        Aaron Kimball made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Aaron Kimball made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Aaron Kimball added a comment -

        Removed SQL_BIG_RESULT. Also, good call re. the null check; no reason not to pass straight through. I've modified the API for execute() accordingly.

        Show
        Aaron Kimball added a comment - Removed SQL_BIG_RESULT. Also, good call re. the null check; no reason not to pass straight through. I've modified the API for execute() accordingly.
        Aaron Kimball made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Aaron Kimball made changes -
        Attachment MAPREDUCE-685.patch.2 [ 12412298 ]
        Aaron Kimball made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Todd Lipcon added a comment -

        Couple notes:

        • The SQL_BIG_RESULT hint I mentioned (offline) was meant for the query that actually returns lots of rows. If you're doing LIMIT 1 you don't need it.
        • Why check against null stmt in execute()? Isn't it assumed that passing null here would throw an NPE?
        • Also, why return null here instead of letting the SQLException fall through?
        Show
        Todd Lipcon added a comment - Couple notes: The SQL_BIG_RESULT hint I mentioned (offline) was meant for the query that actually returns lots of rows. If you're doing LIMIT 1 you don't need it. Why check against null stmt in execute()? Isn't it assumed that passing null here would throw an NPE? Also, why return null here instead of letting the SQLException fall through?
        Hide
        Aaron Kimball added a comment -

        Added patch that fixes this issue. Also includes some other performance enhancements:

        • MySQL now uses "LIMIT 1" when making SELECTs against tables for metadata-reading purposes.
        • Transactions are no longer opened with TRANSACTION_SERIALIZABLE since it's unnecessary for metadata reads

        No new tests for this included since Hadoop testing doesn't mesh well with MySQL. I tested locally by building a 1.7 GB table in mysql and reading into a local HDFS instance. This failed before applying the patch, and succeeds afterwards.

        Show
        Aaron Kimball added a comment - Added patch that fixes this issue. Also includes some other performance enhancements: MySQL now uses "LIMIT 1" when making SELECTs against tables for metadata-reading purposes. Transactions are no longer opened with TRANSACTION_SERIALIZABLE since it's unnecessary for metadata reads No new tests for this included since Hadoop testing doesn't mesh well with MySQL. I tested locally by building a 1.7 GB table in mysql and reading into a local HDFS instance. This failed before applying the patch, and succeeds afterwards.
        Aaron Kimball made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Aaron Kimball made changes -
        Field Original Value New Value
        Attachment MAPREDUCE-685.patch [ 12412206 ]
        Aaron Kimball created issue -

          People

          • Assignee:
            Aaron Kimball
            Reporter:
            Aaron Kimball
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development