Sqoop
  1. Sqoop
  2. SQOOP-387

Use TRUNCATE for PostgreSQLManager

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.4.2
    • Fix Version/s: None
    • Component/s: connectors/postgresql
    • Labels:

      Description

      PostgreSQLManager uses "DELETE" sentence for procesing in staging-table.
      When it does "EXPORT" many times, the processing time is prolonged in "DELETE".
      Therefore replace it with "TRUNCATE" sentence and improve performance.

      1. SQOOP-387.patch
        3 kB
        Shinichi Yamashita
      2. SQOOP-387.patch
        3 kB
        Shinichi Yamashita

        Activity

        Hide
        Shinichi Yamashita added a comment -

        Override migrateData and deleteAllRecords methods to use in staging-table.

        Show
        Shinichi Yamashita added a comment - Override migrateData and deleteAllRecords methods to use in staging-table.
        Hide
        Bilung Lee added a comment -

        Thanks for your contribution! One comment. Instead of overriding the whole migrateData and deleteAllRecords methods, how about having a getDeleteQuery method to return the DELETE statement in SqlManager? Then, use it in migrateData/deleteAllRecords methods of SqlManager and only override that in PostgreSQLManager to return TRUNCATE statement?

        Show
        Bilung Lee added a comment - Thanks for your contribution! One comment. Instead of overriding the whole migrateData and deleteAllRecords methods, how about having a getDeleteQuery method to return the DELETE statement in SqlManager? Then, use it in migrateData/deleteAllRecords methods of SqlManager and only override that in PostgreSQLManager to return TRUNCATE statement?
        Hide
        Shinichi Yamashita added a comment -

        The reason that overridden only PostgreSQLManager (not SqlManager) is because PostgreSQL can handle truncate in transaction.
        In other words TRUNCATE of PostgreSQL can have rollback mechanism.

        In addition, PostgreSQL doesn't release disk space until it executes "VACUUM".
        Therefore the implementation of current PostgreSQL causes performance degradation in the long term.

        Show
        Shinichi Yamashita added a comment - The reason that overridden only PostgreSQLManager (not SqlManager) is because PostgreSQL can handle truncate in transaction. In other words TRUNCATE of PostgreSQL can have rollback mechanism. In addition, PostgreSQL doesn't release disk space until it executes "VACUUM". Therefore the implementation of current PostgreSQL causes performance degradation in the long term.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4272/
        -----------------------------------------------------------

        Review request for Sqoop and Bilung Lee.

        Summary
        -------

        I fixed SQOOP-387 patch.

        1. I add getDeleteQuery method, which returns DELETE statement in SqlManager by default and returns TRUNCATE statement in PostgreSQLManager.
        2. Modify deleteAllRecords method and migrateData method of SqlManager to use getDeleteQuery method.

        This addresses bug SQOOP-387.
        https://issues.apache.org/jira/browse/SQOOP-387

        Diffs


        http://svn.apache.org/repos/asf/incubator/sqoop/trunk/src/java/org/apache/sqoop/manager/PostgresqlManager.java 1298894
        http://svn.apache.org/repos/asf/incubator/sqoop/trunk/src/java/org/apache/sqoop/manager/SqlManager.java 1298894

        Diff: https://reviews.apache.org/r/4272/diff

        Testing
        -------

        ant test
        I confirmed "TRUNCATE TABLE" query was executed in PostgreSQL.

        Thanks,

        Shinichi

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4272/ ----------------------------------------------------------- Review request for Sqoop and Bilung Lee. Summary ------- I fixed SQOOP-387 patch. 1. I add getDeleteQuery method, which returns DELETE statement in SqlManager by default and returns TRUNCATE statement in PostgreSQLManager. 2. Modify deleteAllRecords method and migrateData method of SqlManager to use getDeleteQuery method. This addresses bug SQOOP-387 . https://issues.apache.org/jira/browse/SQOOP-387 Diffs http://svn.apache.org/repos/asf/incubator/sqoop/trunk/src/java/org/apache/sqoop/manager/PostgresqlManager.java 1298894 http://svn.apache.org/repos/asf/incubator/sqoop/trunk/src/java/org/apache/sqoop/manager/SqlManager.java 1298894 Diff: https://reviews.apache.org/r/4272/diff Testing ------- ant test I confirmed "TRUNCATE TABLE" query was executed in PostgreSQL. Thanks, Shinichi
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4272/#review5803
        -----------------------------------------------------------

        Seem that the TRUNCATE statement may not return the number of affected rows like the DELETE statement does. You may want to consider removing dependency on the return count as pointed out below.

        An alternative option is to factor out more code to be overridden if the dependency is to be kept for other databases.

        http://svn.apache.org/repos/asf/incubator/sqoop/trunk/src/java/org/apache/sqoop/manager/SqlManager.java
        <https://reviews.apache.org/r/4272/#comment12649>

        The return code seems to be 0 for a successful TRUNCATE.

        http://svn.apache.org/repos/asf/incubator/sqoop/trunk/src/java/org/apache/sqoop/manager/SqlManager.java
        <https://reviews.apache.org/r/4272/#comment12650>

        The return count won't make sense for TRUNCATE here.

        http://svn.apache.org/repos/asf/incubator/sqoop/trunk/src/java/org/apache/sqoop/manager/SqlManager.java
        <https://reviews.apache.org/r/4272/#comment12653>

        The return count would be 0 for a successful TRUNCATE.

        http://svn.apache.org/repos/asf/incubator/sqoop/trunk/src/java/org/apache/sqoop/manager/SqlManager.java
        <https://reviews.apache.org/r/4272/#comment12655>

        Due to the reason mentioned above, this checking would fail for PostgreSQL now.

        • Bilung

        On 2012-03-09 17:31:41, Shinichi Yamashita wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4272/

        -----------------------------------------------------------

        (Updated 2012-03-09 17:31:41)

        Review request for Sqoop and Bilung Lee.

        Summary

        -------

        I fixed SQOOP-387 patch.

        1. I add getDeleteQuery method, which returns DELETE statement in SqlManager by default and returns TRUNCATE statement in PostgreSQLManager.

        2. Modify deleteAllRecords method and migrateData method of SqlManager to use getDeleteQuery method.

        This addresses bug SQOOP-387.

        https://issues.apache.org/jira/browse/SQOOP-387

        Diffs

        -----

        http://svn.apache.org/repos/asf/incubator/sqoop/trunk/src/java/org/apache/sqoop/manager/PostgresqlManager.java 1298894

        http://svn.apache.org/repos/asf/incubator/sqoop/trunk/src/java/org/apache/sqoop/manager/SqlManager.java 1298894

        Diff: https://reviews.apache.org/r/4272/diff

        Testing

        -------

        ant test

        I confirmed "TRUNCATE TABLE" query was executed in PostgreSQL.

        Thanks,

        Shinichi

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4272/#review5803 ----------------------------------------------------------- Seem that the TRUNCATE statement may not return the number of affected rows like the DELETE statement does. You may want to consider removing dependency on the return count as pointed out below. An alternative option is to factor out more code to be overridden if the dependency is to be kept for other databases. http://svn.apache.org/repos/asf/incubator/sqoop/trunk/src/java/org/apache/sqoop/manager/SqlManager.java < https://reviews.apache.org/r/4272/#comment12649 > The return code seems to be 0 for a successful TRUNCATE. http://svn.apache.org/repos/asf/incubator/sqoop/trunk/src/java/org/apache/sqoop/manager/SqlManager.java < https://reviews.apache.org/r/4272/#comment12650 > The return count won't make sense for TRUNCATE here. http://svn.apache.org/repos/asf/incubator/sqoop/trunk/src/java/org/apache/sqoop/manager/SqlManager.java < https://reviews.apache.org/r/4272/#comment12653 > The return count would be 0 for a successful TRUNCATE. http://svn.apache.org/repos/asf/incubator/sqoop/trunk/src/java/org/apache/sqoop/manager/SqlManager.java < https://reviews.apache.org/r/4272/#comment12655 > Due to the reason mentioned above, this checking would fail for PostgreSQL now. Bilung On 2012-03-09 17:31:41, Shinichi Yamashita wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4272/ ----------------------------------------------------------- (Updated 2012-03-09 17:31:41) Review request for Sqoop and Bilung Lee. Summary ------- I fixed SQOOP-387 patch. 1. I add getDeleteQuery method, which returns DELETE statement in SqlManager by default and returns TRUNCATE statement in PostgreSQLManager. 2. Modify deleteAllRecords method and migrateData method of SqlManager to use getDeleteQuery method. This addresses bug SQOOP-387 . https://issues.apache.org/jira/browse/SQOOP-387 Diffs ----- http://svn.apache.org/repos/asf/incubator/sqoop/trunk/src/java/org/apache/sqoop/manager/PostgresqlManager.java 1298894 http://svn.apache.org/repos/asf/incubator/sqoop/trunk/src/java/org/apache/sqoop/manager/SqlManager.java 1298894 Diff: https://reviews.apache.org/r/4272/diff Testing ------- ant test I confirmed "TRUNCATE TABLE" query was executed in PostgreSQL. Thanks, Shinichi
        Hide
        Shinichi Yamashita added a comment -

        Hi,

        I forgot to tell that TRUNCATE of PostgreSQL did not return a value such as DELETE to you.
        Therefore I think that I apply the patch which I attached first and use it.
        If a count of the number of deletion is necessary, I want to make modifications to count it with the first patch.

        Regards,
        Shinichi

        Show
        Shinichi Yamashita added a comment - Hi, I forgot to tell that TRUNCATE of PostgreSQL did not return a value such as DELETE to you. Therefore I think that I apply the patch which I attached first and use it. If a count of the number of deletion is necessary, I want to make modifications to count it with the first patch. Regards, Shinichi
        Hide
        Shinichi Yamashita added a comment -

        attach patch file

        Show
        Shinichi Yamashita added a comment - attach patch file
        Hide
        Shinichi Yamashita added a comment -

        I created this patch from Sqoop trunk.

        Show
        Shinichi Yamashita added a comment - I created this patch from Sqoop trunk.
        Hide
        Jarek Jarcec Cecho added a comment -

        Hi Shinichi,
        please accept my apology that your patch was lying without any notice. Would you mind uploading it to Apache Review board (https://reviews.apache.org/) to repository sqoop-trunk?

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - Hi Shinichi, please accept my apology that your patch was lying without any notice. Would you mind uploading it to Apache Review board ( https://reviews.apache.org/ ) to repository sqoop-trunk? Jarcec
        Hide
        Jarek Jarcec Cecho added a comment -

        I'm cancelling "Patch Available" status for now as the patch is quite old and we're waiting on update.

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - I'm cancelling "Patch Available" status for now as the patch is quite old and we're waiting on update. Jarcec
        Hide
        Michael Bush added a comment -

        The same applied to Oracle. In SqlManager.java - deleteAllRecords should be changed from

        String deleteQuery = "DELETE FROM " + tableName;

        to

        String deleteQuery = "TRUNCATE TABLE " + tableName;

        However, for Oracle, TRUNCATE is only available if you have elevated database privileges or are the schema owner for the table. So try the truncate first and if there are no privileges to truncate, then just use the delete.

        Show
        Michael Bush added a comment - The same applied to Oracle. In SqlManager.java - deleteAllRecords should be changed from String deleteQuery = "DELETE FROM " + tableName; to String deleteQuery = "TRUNCATE TABLE " + tableName; However, for Oracle, TRUNCATE is only available if you have elevated database privileges or are the schema owner for the table. So try the truncate first and if there are no privileges to truncate, then just use the delete.

          People

          • Assignee:
            Unassigned
            Reporter:
            Shinichi Yamashita
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development