Sqoop
  1. Sqoop
  2. SQOOP-331

Support boundary query on the command line

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.0-incubating
    • Fix Version/s: 1.4.0-incubating
    • Component/s: tools
    • Labels:
      None

      Description

      It would be nice if the sqoop would have ability to specify query that will fetch minimal and maximal value for creating splits in DataDrivenDBInputFormat from the command line.

      Normally sqoop will generate query to get maximal and minimal value for creating splits in following form: SELECT min($split_by_column), max($split_by_column) FROM $table WHERE $cmd_where. In my use case, I needed to import only portion of data with ranges based on the split_by_column that I already have preselected and that are available in special table that holds data ranges and appropriate primary key values. So my auto generated query looked like this: SELECT min(id), max(id) FROM table WHERE id => min_id and id <= max_id. That query is obviously useless and is just creating unnecessary load on the database server. It would be nice to supply my own boundary query that will use the extra table with data ranges.

      1. SQOOP-331.patch
        5 kB
        Jarek Jarcec Cecho
      2. SQOOP-331.patch
        15 kB
        Jarek Jarcec Cecho

        Activity

        Hide
        Jarek Jarcec Cecho added a comment -

        I've add new parameter for boundary query and propagated it to all places where it looked needed. However I'm not sure that I've propagated everywhere, so it would be nice to get some sort of feedback here.

        Also since boundary query is only optional configuration property and might not be used at all, I'm not sure how to construct tests for it. So far I've included only test for parsing parameters.

        Any sort of feedback would be greatly appreciated.

        Show
        Jarek Jarcec Cecho added a comment - I've add new parameter for boundary query and propagated it to all places where it looked needed. However I'm not sure that I've propagated everywhere, so it would be nice to get some sort of feedback here. Also since boundary query is only optional configuration property and might not be used at all, I'm not sure how to construct tests for it. So far I've included only test for parsing parameters. Any sort of feedback would be greatly appreciated.
        Hide
        Arvind Prabhakar added a comment -

        Thanks for the excellent patch Jarek. It is almost ready for commit except for a few things:

        • The case for free-form query needs to be handled. This would be in the DataDrivenImportJob class within the else block below your current modifications. Here is a sample diff that I cooked up quickly to convey this point:
          --- src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java	(revision 1165478)
          +++ src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java	(working copy)
          @@ -153,15 +153,26 @@
                   DataDrivenDBInputFormat.setInput(job, DBWritable.class,
                       mgr.escapeTableName(tableName), whereClause,
                       mgr.escapeColName(splitByCol), sqlColNames);
          +
          +        // If user specified boundary query on the command line propagate it to
          +        // the job
          +        if(options.getBoundaryQuery() != null) {
          +          DataDrivenDBInputFormat.setBoundingQuery(job.getConfiguration(),
          +                  options.getBoundaryQuery());
          +        }
                 } else {
                   // Import a free-form query.
                   String inputQuery = options.getSqlQuery();
                   String sanitizedQuery = inputQuery.replace(
                       DataDrivenDBInputFormat.SUBSTITUTE_TOKEN, " (1 = 1) ");
           
          -        String inputBoundingQuery =
          -            mgr.getInputBoundsQuery(splitByCol, sanitizedQuery);
          +        String inputBoundingQuery = options.getBoundaryQuery();
          +
                   if (inputBoundingQuery == null) {
          +          mgr.getInputBoundsQuery(splitByCol, sanitizedQuery);
          +        }
          +
          +        if (inputBoundingQuery == null) {
                       inputBoundingQuery = "SELECT MIN(" + splitByCol + "), MAX("
                               + splitByCol + ") FROM (" + sanitizedQuery + ") AS t1";
                   }
          
          
        • Second: since you have introduced a new command line option, it is necessary that the userguide and man pages be updated. These are located under src/docs directory and can be built using ant docs target. In order to build them though, you would need to have asciidoc isntalled on your machine.
        • Bonus nit: there is a checkstyle violation in ImportTool:530 where the line is longer than 80 characters.

        Apart from that everything looks great. Some suggestions going forward:

        • Usually when introducing a new functionality, it is required to have at least one test that exercises that functionality. The test you have added is good but does not really exercise the functionality.
        • We use Apache Review Board (https://reviews.apache.org/) to post reviews for patches that are longer than a few lines. This helps the reviewers give contextual feedback where necessary.

        Please let me know if you have any questions for me on these suggestions.

        Show
        Arvind Prabhakar added a comment - Thanks for the excellent patch Jarek. It is almost ready for commit except for a few things: The case for free-form query needs to be handled. This would be in the DataDrivenImportJob class within the else block below your current modifications. Here is a sample diff that I cooked up quickly to convey this point: --- src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java (revision 1165478) +++ src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java (working copy) @@ -153,15 +153,26 @@ DataDrivenDBInputFormat.setInput(job, DBWritable.class, mgr.escapeTableName(tableName), whereClause, mgr.escapeColName(splitByCol), sqlColNames); + + // If user specified boundary query on the command line propagate it to + // the job + if(options.getBoundaryQuery() != null) { + DataDrivenDBInputFormat.setBoundingQuery(job.getConfiguration(), + options.getBoundaryQuery()); + } } else { // Import a free-form query. String inputQuery = options.getSqlQuery(); String sanitizedQuery = inputQuery.replace( DataDrivenDBInputFormat.SUBSTITUTE_TOKEN, " (1 = 1) "); - String inputBoundingQuery = - mgr.getInputBoundsQuery(splitByCol, sanitizedQuery); + String inputBoundingQuery = options.getBoundaryQuery(); + if (inputBoundingQuery == null) { + mgr.getInputBoundsQuery(splitByCol, sanitizedQuery); + } + + if (inputBoundingQuery == null) { inputBoundingQuery = "SELECT MIN(" + splitByCol + "), MAX(" + splitByCol + ") FROM (" + sanitizedQuery + ") AS t1"; } Second: since you have introduced a new command line option, it is necessary that the userguide and man pages be updated. These are located under src/docs directory and can be built using ant docs target. In order to build them though, you would need to have asciidoc isntalled on your machine. Bonus nit: there is a checkstyle violation in ImportTool:530 where the line is longer than 80 characters. Apart from that everything looks great. Some suggestions going forward: Usually when introducing a new functionality, it is required to have at least one test that exercises that functionality. The test you have added is good but does not really exercise the functionality. We use Apache Review Board ( https://reviews.apache.org/ ) to post reviews for patches that are longer than a few lines. This helps the reviewers give contextual feedback where necessary. Please let me know if you have any questions for me on these suggestions.
        Hide
        Jarek Jarcec Cecho added a comment -

        Hi Arvind,
        thank you very much for your feedback. I think that I do understand all of your suggestions and I'll try incorporate them into my patch as soon as I can.

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - Hi Arvind, thank you very much for your feedback. I think that I do understand all of your suggestions and I'll try incorporate them into my patch as soon as I can. Jarcec
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1946/
        -----------------------------------------------------------

        Review request for Sqoop.

        Summary
        -------

        I've incorporated all Arvind's suggestions (hopefully ).

        This addresses bug SQOOP-331.
        https://issues.apache.org/jira/browse/SQOOP-331

        Diffs


        /src/docs/man/import-args.txt 1171925
        /src/docs/user/import.txt 1171925
        /src/java/com/cloudera/sqoop/SqoopOptions.java 1171925
        /src/java/com/cloudera/sqoop/manager/SqlManager.java 1171925
        /src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java 1171925
        /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1171925
        /src/java/com/cloudera/sqoop/tool/ImportTool.java 1171925
        /src/test/com/cloudera/sqoop/TestSqoopOptions.java 1171925

        Diff: https://reviews.apache.org/r/1946/diff

        Testing
        -------

        I'm still having troubles to create meaningful tests for this patch. I've came up with two different approaches, but I wasn't able to get running either of them:

        1) Use boundary query for limiting import data (like "select 1, 2"). This is totally wrong usage of this parameter, but I was thinking that It might be fine for the testing purpose. Unfortunately underlying code is using this query only in case that is creating more than one map task and I was not able to forced it create more than one. Which make sense because the -m parameter is also only a hint.

        2) Parse logs. Fortunately class responsible for creating splits is printing used boundary query, so there is possibility to parse those logs and look for used boundary query. But I'm not sure how this can be done in proper fashion.

        Any ideas will be welcomed.

        Jarcec

        Thanks,

        Jarek

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1946/ ----------------------------------------------------------- Review request for Sqoop. Summary ------- I've incorporated all Arvind's suggestions (hopefully ). This addresses bug SQOOP-331 . https://issues.apache.org/jira/browse/SQOOP-331 Diffs /src/docs/man/import-args.txt 1171925 /src/docs/user/import.txt 1171925 /src/java/com/cloudera/sqoop/SqoopOptions.java 1171925 /src/java/com/cloudera/sqoop/manager/SqlManager.java 1171925 /src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java 1171925 /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1171925 /src/java/com/cloudera/sqoop/tool/ImportTool.java 1171925 /src/test/com/cloudera/sqoop/TestSqoopOptions.java 1171925 Diff: https://reviews.apache.org/r/1946/diff Testing ------- I'm still having troubles to create meaningful tests for this patch. I've came up with two different approaches, but I wasn't able to get running either of them: 1) Use boundary query for limiting import data (like "select 1, 2"). This is totally wrong usage of this parameter, but I was thinking that It might be fine for the testing purpose. Unfortunately underlying code is using this query only in case that is creating more than one map task and I was not able to forced it create more than one. Which make sense because the -m parameter is also only a hint. 2) Parse logs. Fortunately class responsible for creating splits is printing used boundary query, so there is possibility to parse those logs and look for used boundary query. But I'm not sure how this can be done in proper fashion. Any ideas will be welcomed. Jarcec Thanks, Jarek
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1946/
        -----------------------------------------------------------

        (Updated 2011-09-20 08:29:02.745328)

        Review request for Sqoop and Arvind Prabhakar.

        Changes
        -------

        I've added Arvind to reviewer list.

        Summary
        -------

        I've incorporated all Arvind's suggestions (hopefully ).

        This addresses bug SQOOP-331.
        https://issues.apache.org/jira/browse/SQOOP-331

        Diffs


        /src/docs/man/import-args.txt 1171925
        /src/docs/user/import.txt 1171925
        /src/java/com/cloudera/sqoop/SqoopOptions.java 1171925
        /src/java/com/cloudera/sqoop/manager/SqlManager.java 1171925
        /src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java 1171925
        /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1171925
        /src/java/com/cloudera/sqoop/tool/ImportTool.java 1171925
        /src/test/com/cloudera/sqoop/TestSqoopOptions.java 1171925

        Diff: https://reviews.apache.org/r/1946/diff

        Testing
        -------

        I'm still having troubles to create meaningful tests for this patch. I've came up with two different approaches, but I wasn't able to get running either of them:

        1) Use boundary query for limiting import data (like "select 1, 2"). This is totally wrong usage of this parameter, but I was thinking that It might be fine for the testing purpose. Unfortunately underlying code is using this query only in case that is creating more than one map task and I was not able to forced it create more than one. Which make sense because the -m parameter is also only a hint.

        2) Parse logs. Fortunately class responsible for creating splits is printing used boundary query, so there is possibility to parse those logs and look for used boundary query. But I'm not sure how this can be done in proper fashion.

        Any ideas will be welcomed.

        Jarcec

        Thanks,

        Jarek

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1946/ ----------------------------------------------------------- (Updated 2011-09-20 08:29:02.745328) Review request for Sqoop and Arvind Prabhakar. Changes ------- I've added Arvind to reviewer list. Summary ------- I've incorporated all Arvind's suggestions (hopefully ). This addresses bug SQOOP-331 . https://issues.apache.org/jira/browse/SQOOP-331 Diffs /src/docs/man/import-args.txt 1171925 /src/docs/user/import.txt 1171925 /src/java/com/cloudera/sqoop/SqoopOptions.java 1171925 /src/java/com/cloudera/sqoop/manager/SqlManager.java 1171925 /src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java 1171925 /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1171925 /src/java/com/cloudera/sqoop/tool/ImportTool.java 1171925 /src/test/com/cloudera/sqoop/TestSqoopOptions.java 1171925 Diff: https://reviews.apache.org/r/1946/diff Testing ------- I'm still having troubles to create meaningful tests for this patch. I've came up with two different approaches, but I wasn't able to get running either of them: 1) Use boundary query for limiting import data (like "select 1, 2"). This is totally wrong usage of this parameter, but I was thinking that It might be fine for the testing purpose. Unfortunately underlying code is using this query only in case that is creating more than one map task and I was not able to forced it create more than one. Which make sense because the -m parameter is also only a hint. 2) Parse logs. Fortunately class responsible for creating splits is printing used boundary query, so there is possibility to parse those logs and look for used boundary query. But I'm not sure how this can be done in proper fashion. Any ideas will be welcomed. Jarcec Thanks, Jarek
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1946/#review2034
        -----------------------------------------------------------

        Jarek, the changes look good. Regarding the test case problem you have mentioned, I don't think that it is really a test case issue. On the contrary, it seems to be a bug to me. Here is why: the purpose of boundary query is to limit the overall size of ingest from the database. It does not and should not matter how many mappers are used to perform the ingest. You can fix this by modifying the DataDrivenDBInputFormat.getSplits() method and ensuring that the single split it generates for the one mapper case uses the boundary query if specified.

        /src/docs/user/import.txt
        <https://reviews.apache.org/r/1946/#comment4582>

        nit:The BOUNDARY need not be all capitalized.

        /src/docs/user/import.txt
        <https://reviews.apache.org/r/1946/#comment4583>

        Please remove the trailing whitespace.

        /src/docs/user/import.txt
        <https://reviews.apache.org/r/1946/#comment4584>

        Trailing whitespace.

        • Arvind

        On 2011-09-20 08:29:02, Jarek Jarcec wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/1946/

        -----------------------------------------------------------

        (Updated 2011-09-20 08:29:02)

        Review request for Sqoop and Arvind Prabhakar.

        Summary

        -------

        I've incorporated all Arvind's suggestions (hopefully ).

        This addresses bug SQOOP-331.

        https://issues.apache.org/jira/browse/SQOOP-331

        Diffs

        -----

        /src/docs/man/import-args.txt 1171925

        /src/docs/user/import.txt 1171925

        /src/java/com/cloudera/sqoop/SqoopOptions.java 1171925

        /src/java/com/cloudera/sqoop/manager/SqlManager.java 1171925

        /src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java 1171925

        /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1171925

        /src/java/com/cloudera/sqoop/tool/ImportTool.java 1171925

        /src/test/com/cloudera/sqoop/TestSqoopOptions.java 1171925

        Diff: https://reviews.apache.org/r/1946/diff

        Testing

        -------

        I'm still having troubles to create meaningful tests for this patch. I've came up with two different approaches, but I wasn't able to get running either of them:

        1) Use boundary query for limiting import data (like "select 1, 2"). This is totally wrong usage of this parameter, but I was thinking that It might be fine for the testing purpose. Unfortunately underlying code is using this query only in case that is creating more than one map task and I was not able to forced it create more than one. Which make sense because the -m parameter is also only a hint.

        2) Parse logs. Fortunately class responsible for creating splits is printing used boundary query, so there is possibility to parse those logs and look for used boundary query. But I'm not sure how this can be done in proper fashion.

        Any ideas will be welcomed.

        Jarcec

        Thanks,

        Jarek

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1946/#review2034 ----------------------------------------------------------- Jarek, the changes look good. Regarding the test case problem you have mentioned, I don't think that it is really a test case issue. On the contrary, it seems to be a bug to me. Here is why: the purpose of boundary query is to limit the overall size of ingest from the database. It does not and should not matter how many mappers are used to perform the ingest. You can fix this by modifying the DataDrivenDBInputFormat.getSplits() method and ensuring that the single split it generates for the one mapper case uses the boundary query if specified. /src/docs/user/import.txt < https://reviews.apache.org/r/1946/#comment4582 > nit:The BOUNDARY need not be all capitalized. /src/docs/user/import.txt < https://reviews.apache.org/r/1946/#comment4583 > Please remove the trailing whitespace. /src/docs/user/import.txt < https://reviews.apache.org/r/1946/#comment4584 > Trailing whitespace. Arvind On 2011-09-20 08:29:02, Jarek Jarcec wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1946/ ----------------------------------------------------------- (Updated 2011-09-20 08:29:02) Review request for Sqoop and Arvind Prabhakar. Summary ------- I've incorporated all Arvind's suggestions (hopefully ). This addresses bug SQOOP-331 . https://issues.apache.org/jira/browse/SQOOP-331 Diffs ----- /src/docs/man/import-args.txt 1171925 /src/docs/user/import.txt 1171925 /src/java/com/cloudera/sqoop/SqoopOptions.java 1171925 /src/java/com/cloudera/sqoop/manager/SqlManager.java 1171925 /src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java 1171925 /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1171925 /src/java/com/cloudera/sqoop/tool/ImportTool.java 1171925 /src/test/com/cloudera/sqoop/TestSqoopOptions.java 1171925 Diff: https://reviews.apache.org/r/1946/diff Testing ------- I'm still having troubles to create meaningful tests for this patch. I've came up with two different approaches, but I wasn't able to get running either of them: 1) Use boundary query for limiting import data (like "select 1, 2"). This is totally wrong usage of this parameter, but I was thinking that It might be fine for the testing purpose. Unfortunately underlying code is using this query only in case that is creating more than one map task and I was not able to forced it create more than one. Which make sense because the -m parameter is also only a hint. 2) Parse logs. Fortunately class responsible for creating splits is printing used boundary query, so there is possibility to parse those logs and look for used boundary query. But I'm not sure how this can be done in proper fashion. Any ideas will be welcomed. Jarcec Thanks, Jarek
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1946/
        -----------------------------------------------------------

        (Updated 2011-09-28 14:21:32.994622)

        Review request for Sqoop and Arvind Prabhakar.

        Changes
        -------

        After some thinking and looking into the code, I do agree with you Arvind

        I've changed the DataDrivenDBInputFormat so that boundary query will be always used in case that is specified on command line. This version also ships fully featured test.

        Summary
        -------

        I've incorporated all Arvind's suggestions (hopefully ).

        This addresses bug SQOOP-331.
        https://issues.apache.org/jira/browse/SQOOP-331

        Diffs (updated)


        /src/docs/man/import-args.txt 1176793
        /src/docs/user/import.txt 1176793
        /src/java/com/cloudera/sqoop/SqoopOptions.java 1176793
        /src/java/com/cloudera/sqoop/manager/SqlManager.java 1176793
        /src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java 1176793
        /src/java/com/cloudera/sqoop/mapreduce/db/DataDrivenDBInputFormat.java 1176793
        /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1176793
        /src/java/com/cloudera/sqoop/tool/ImportTool.java 1176793
        /src/test/com/cloudera/sqoop/TestBoundaryQuery.java PRE-CREATION
        /src/test/com/cloudera/sqoop/TestSqoopOptions.java 1176793

        Diff: https://reviews.apache.org/r/1946/diff

        Testing
        -------

        I'm still having troubles to create meaningful tests for this patch. I've came up with two different approaches, but I wasn't able to get running either of them:

        1) Use boundary query for limiting import data (like "select 1, 2"). This is totally wrong usage of this parameter, but I was thinking that It might be fine for the testing purpose. Unfortunately underlying code is using this query only in case that is creating more than one map task and I was not able to forced it create more than one. Which make sense because the -m parameter is also only a hint.

        2) Parse logs. Fortunately class responsible for creating splits is printing used boundary query, so there is possibility to parse those logs and look for used boundary query. But I'm not sure how this can be done in proper fashion.

        Any ideas will be welcomed.

        Jarcec

        Thanks,

        Jarek

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1946/ ----------------------------------------------------------- (Updated 2011-09-28 14:21:32.994622) Review request for Sqoop and Arvind Prabhakar. Changes ------- After some thinking and looking into the code, I do agree with you Arvind I've changed the DataDrivenDBInputFormat so that boundary query will be always used in case that is specified on command line. This version also ships fully featured test. Summary ------- I've incorporated all Arvind's suggestions (hopefully ). This addresses bug SQOOP-331 . https://issues.apache.org/jira/browse/SQOOP-331 Diffs (updated) /src/docs/man/import-args.txt 1176793 /src/docs/user/import.txt 1176793 /src/java/com/cloudera/sqoop/SqoopOptions.java 1176793 /src/java/com/cloudera/sqoop/manager/SqlManager.java 1176793 /src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java 1176793 /src/java/com/cloudera/sqoop/mapreduce/db/DataDrivenDBInputFormat.java 1176793 /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1176793 /src/java/com/cloudera/sqoop/tool/ImportTool.java 1176793 /src/test/com/cloudera/sqoop/TestBoundaryQuery.java PRE-CREATION /src/test/com/cloudera/sqoop/TestSqoopOptions.java 1176793 Diff: https://reviews.apache.org/r/1946/diff Testing ------- I'm still having troubles to create meaningful tests for this patch. I've came up with two different approaches, but I wasn't able to get running either of them: 1) Use boundary query for limiting import data (like "select 1, 2"). This is totally wrong usage of this parameter, but I was thinking that It might be fine for the testing purpose. Unfortunately underlying code is using this query only in case that is creating more than one map task and I was not able to forced it create more than one. Which make sense because the -m parameter is also only a hint. 2) Parse logs. Fortunately class responsible for creating splits is printing used boundary query, so there is possibility to parse those logs and look for used boundary query. But I'm not sure how this can be done in proper fashion. Any ideas will be welcomed. Jarcec Thanks, Jarek
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1946/#review2132
        -----------------------------------------------------------

        Ship it!

        Changes look good Jarek. There are a couple of checkstyle violations noted below. Other than that, it is good to go. Please address these checkstyle violations and attach the patch to the JIRA.

        /src/java/com/cloudera/sqoop/mapreduce/db/DataDrivenDBInputFormat.java
        <https://reviews.apache.org/r/1946/#comment4951>

        Checkstyle violation: ')' is preceded with whitespace.

        /src/test/com/cloudera/sqoop/TestBoundaryQuery.java
        <https://reviews.apache.org/r/1946/#comment4952>

        Checkstyle violation: trailing whitespace.

        • Arvind

        On 2011-09-28 14:21:32, Jarek Jarcec wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/1946/

        -----------------------------------------------------------

        (Updated 2011-09-28 14:21:32)

        Review request for Sqoop and Arvind Prabhakar.

        Summary

        -------

        I've incorporated all Arvind's suggestions (hopefully ).

        This addresses bug SQOOP-331.

        https://issues.apache.org/jira/browse/SQOOP-331

        Diffs

        -----

        /src/docs/man/import-args.txt 1176793

        /src/docs/user/import.txt 1176793

        /src/java/com/cloudera/sqoop/SqoopOptions.java 1176793

        /src/java/com/cloudera/sqoop/manager/SqlManager.java 1176793

        /src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java 1176793

        /src/java/com/cloudera/sqoop/mapreduce/db/DataDrivenDBInputFormat.java 1176793

        /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1176793

        /src/java/com/cloudera/sqoop/tool/ImportTool.java 1176793

        /src/test/com/cloudera/sqoop/TestBoundaryQuery.java PRE-CREATION

        /src/test/com/cloudera/sqoop/TestSqoopOptions.java 1176793

        Diff: https://reviews.apache.org/r/1946/diff

        Testing

        -------

        I'm still having troubles to create meaningful tests for this patch. I've came up with two different approaches, but I wasn't able to get running either of them:

        1) Use boundary query for limiting import data (like "select 1, 2"). This is totally wrong usage of this parameter, but I was thinking that It might be fine for the testing purpose. Unfortunately underlying code is using this query only in case that is creating more than one map task and I was not able to forced it create more than one. Which make sense because the -m parameter is also only a hint.

        2) Parse logs. Fortunately class responsible for creating splits is printing used boundary query, so there is possibility to parse those logs and look for used boundary query. But I'm not sure how this can be done in proper fashion.

        Any ideas will be welcomed.

        Jarcec

        Thanks,

        Jarek

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1946/#review2132 ----------------------------------------------------------- Ship it! Changes look good Jarek. There are a couple of checkstyle violations noted below. Other than that, it is good to go. Please address these checkstyle violations and attach the patch to the JIRA. /src/java/com/cloudera/sqoop/mapreduce/db/DataDrivenDBInputFormat.java < https://reviews.apache.org/r/1946/#comment4951 > Checkstyle violation: ')' is preceded with whitespace. /src/test/com/cloudera/sqoop/TestBoundaryQuery.java < https://reviews.apache.org/r/1946/#comment4952 > Checkstyle violation: trailing whitespace. Arvind On 2011-09-28 14:21:32, Jarek Jarcec wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1946/ ----------------------------------------------------------- (Updated 2011-09-28 14:21:32) Review request for Sqoop and Arvind Prabhakar. Summary ------- I've incorporated all Arvind's suggestions (hopefully ). This addresses bug SQOOP-331 . https://issues.apache.org/jira/browse/SQOOP-331 Diffs ----- /src/docs/man/import-args.txt 1176793 /src/docs/user/import.txt 1176793 /src/java/com/cloudera/sqoop/SqoopOptions.java 1176793 /src/java/com/cloudera/sqoop/manager/SqlManager.java 1176793 /src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java 1176793 /src/java/com/cloudera/sqoop/mapreduce/db/DataDrivenDBInputFormat.java 1176793 /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1176793 /src/java/com/cloudera/sqoop/tool/ImportTool.java 1176793 /src/test/com/cloudera/sqoop/TestBoundaryQuery.java PRE-CREATION /src/test/com/cloudera/sqoop/TestSqoopOptions.java 1176793 Diff: https://reviews.apache.org/r/1946/diff Testing ------- I'm still having troubles to create meaningful tests for this patch. I've came up with two different approaches, but I wasn't able to get running either of them: 1) Use boundary query for limiting import data (like "select 1, 2"). This is totally wrong usage of this parameter, but I was thinking that It might be fine for the testing purpose. Unfortunately underlying code is using this query only in case that is creating more than one map task and I was not able to forced it create more than one. Which make sense because the -m parameter is also only a hint. 2) Parse logs. Fortunately class responsible for creating splits is printing used boundary query, so there is possibility to parse those logs and look for used boundary query. But I'm not sure how this can be done in proper fashion. Any ideas will be welcomed. Jarcec Thanks, Jarek
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1946/
        -----------------------------------------------------------

        (Updated 2011-09-28 16:53:44.364392)

        Review request for Sqoop and Arvind Prabhakar.

        Changes
        -------

        Code style violation corrections.

        Summary
        -------

        I've incorporated all Arvind's suggestions (hopefully ).

        This addresses bug SQOOP-331.
        https://issues.apache.org/jira/browse/SQOOP-331

        Diffs (updated)


        /src/test/com/cloudera/sqoop/TestSqoopOptions.java 1176793
        /src/test/com/cloudera/sqoop/TestBoundaryQuery.java PRE-CREATION
        /src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java 1176793
        /src/java/com/cloudera/sqoop/mapreduce/db/DataDrivenDBInputFormat.java 1176793
        /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1176793
        /src/java/com/cloudera/sqoop/tool/ImportTool.java 1176793
        /src/java/com/cloudera/sqoop/manager/SqlManager.java 1176793
        /src/docs/man/import-args.txt 1176793
        /src/docs/user/import.txt 1176793
        /src/java/com/cloudera/sqoop/SqoopOptions.java 1176793

        Diff: https://reviews.apache.org/r/1946/diff

        Testing
        -------

        I'm still having troubles to create meaningful tests for this patch. I've came up with two different approaches, but I wasn't able to get running either of them:

        1) Use boundary query for limiting import data (like "select 1, 2"). This is totally wrong usage of this parameter, but I was thinking that It might be fine for the testing purpose. Unfortunately underlying code is using this query only in case that is creating more than one map task and I was not able to forced it create more than one. Which make sense because the -m parameter is also only a hint.

        2) Parse logs. Fortunately class responsible for creating splits is printing used boundary query, so there is possibility to parse those logs and look for used boundary query. But I'm not sure how this can be done in proper fashion.

        Any ideas will be welcomed.

        Jarcec

        Thanks,

        Jarek

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1946/ ----------------------------------------------------------- (Updated 2011-09-28 16:53:44.364392) Review request for Sqoop and Arvind Prabhakar. Changes ------- Code style violation corrections. Summary ------- I've incorporated all Arvind's suggestions (hopefully ). This addresses bug SQOOP-331 . https://issues.apache.org/jira/browse/SQOOP-331 Diffs (updated) /src/test/com/cloudera/sqoop/TestSqoopOptions.java 1176793 /src/test/com/cloudera/sqoop/TestBoundaryQuery.java PRE-CREATION /src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java 1176793 /src/java/com/cloudera/sqoop/mapreduce/db/DataDrivenDBInputFormat.java 1176793 /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1176793 /src/java/com/cloudera/sqoop/tool/ImportTool.java 1176793 /src/java/com/cloudera/sqoop/manager/SqlManager.java 1176793 /src/docs/man/import-args.txt 1176793 /src/docs/user/import.txt 1176793 /src/java/com/cloudera/sqoop/SqoopOptions.java 1176793 Diff: https://reviews.apache.org/r/1946/diff Testing ------- I'm still having troubles to create meaningful tests for this patch. I've came up with two different approaches, but I wasn't able to get running either of them: 1) Use boundary query for limiting import data (like "select 1, 2"). This is totally wrong usage of this parameter, but I was thinking that It might be fine for the testing purpose. Unfortunately underlying code is using this query only in case that is creating more than one map task and I was not able to forced it create more than one. Which make sense because the -m parameter is also only a hint. 2) Parse logs. Fortunately class responsible for creating splits is printing used boundary query, so there is possibility to parse those logs and look for used boundary query. But I'm not sure how this can be done in proper fashion. Any ideas will be welcomed. Jarcec Thanks, Jarek
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1946/#review2136
        -----------------------------------------------------------

        Ship it!

        +1

        • Arvind

        On 2011-09-28 16:53:44, Jarek Jarcec wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/1946/

        -----------------------------------------------------------

        (Updated 2011-09-28 16:53:44)

        Review request for Sqoop and Arvind Prabhakar.

        Summary

        -------

        I've incorporated all Arvind's suggestions (hopefully ).

        This addresses bug SQOOP-331.

        https://issues.apache.org/jira/browse/SQOOP-331

        Diffs

        -----

        /src/test/com/cloudera/sqoop/TestSqoopOptions.java 1176793

        /src/test/com/cloudera/sqoop/TestBoundaryQuery.java PRE-CREATION

        /src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java 1176793

        /src/java/com/cloudera/sqoop/mapreduce/db/DataDrivenDBInputFormat.java 1176793

        /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1176793

        /src/java/com/cloudera/sqoop/tool/ImportTool.java 1176793

        /src/java/com/cloudera/sqoop/manager/SqlManager.java 1176793

        /src/docs/man/import-args.txt 1176793

        /src/docs/user/import.txt 1176793

        /src/java/com/cloudera/sqoop/SqoopOptions.java 1176793

        Diff: https://reviews.apache.org/r/1946/diff

        Testing

        -------

        I'm still having troubles to create meaningful tests for this patch. I've came up with two different approaches, but I wasn't able to get running either of them:

        1) Use boundary query for limiting import data (like "select 1, 2"). This is totally wrong usage of this parameter, but I was thinking that It might be fine for the testing purpose. Unfortunately underlying code is using this query only in case that is creating more than one map task and I was not able to forced it create more than one. Which make sense because the -m parameter is also only a hint.

        2) Parse logs. Fortunately class responsible for creating splits is printing used boundary query, so there is possibility to parse those logs and look for used boundary query. But I'm not sure how this can be done in proper fashion.

        Any ideas will be welcomed.

        Jarcec

        Thanks,

        Jarek

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1946/#review2136 ----------------------------------------------------------- Ship it! +1 Arvind On 2011-09-28 16:53:44, Jarek Jarcec wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1946/ ----------------------------------------------------------- (Updated 2011-09-28 16:53:44) Review request for Sqoop and Arvind Prabhakar. Summary ------- I've incorporated all Arvind's suggestions (hopefully ). This addresses bug SQOOP-331 . https://issues.apache.org/jira/browse/SQOOP-331 Diffs ----- /src/test/com/cloudera/sqoop/TestSqoopOptions.java 1176793 /src/test/com/cloudera/sqoop/TestBoundaryQuery.java PRE-CREATION /src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java 1176793 /src/java/com/cloudera/sqoop/mapreduce/db/DataDrivenDBInputFormat.java 1176793 /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1176793 /src/java/com/cloudera/sqoop/tool/ImportTool.java 1176793 /src/java/com/cloudera/sqoop/manager/SqlManager.java 1176793 /src/docs/man/import-args.txt 1176793 /src/docs/user/import.txt 1176793 /src/java/com/cloudera/sqoop/SqoopOptions.java 1176793 Diff: https://reviews.apache.org/r/1946/diff Testing ------- I'm still having troubles to create meaningful tests for this patch. I've came up with two different approaches, but I wasn't able to get running either of them: 1) Use boundary query for limiting import data (like "select 1, 2"). This is totally wrong usage of this parameter, but I was thinking that It might be fine for the testing purpose. Unfortunately underlying code is using this query only in case that is creating more than one map task and I was not able to forced it create more than one. Which make sense because the -m parameter is also only a hint. 2) Parse logs. Fortunately class responsible for creating splits is printing used boundary query, so there is possibility to parse those logs and look for used boundary query. But I'm not sure how this can be done in proper fashion. Any ideas will be welcomed. Jarcec Thanks, Jarek
        Hide
        Arvind Prabhakar added a comment -

        Patch committed. Thanks Jarek!

        Show
        Arvind Prabhakar added a comment - Patch committed. Thanks Jarek!
        Hide
        Hudson added a comment -

        Integrated in Sqoop-jdk-1.6 #31 (See https://builds.apache.org/job/Sqoop-jdk-1.6/31/)
        SQOOP-331. Support for boundary query.

        (Jarek Jarcec Cecho via Arvind Prabhakar)

        arvind : http://svn.apache.org/viewvc/?view=rev&rev=1176981
        Files :

        • /incubator/sqoop/trunk/src/docs/man/import-args.txt
        • /incubator/sqoop/trunk/src/docs/user/import.txt
        • /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/SqoopOptions.java
        • /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/manager/SqlManager.java
        • /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java
        • /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/mapreduce/db/DataDrivenDBInputFormat.java
        • /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java
        • /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/tool/ImportTool.java
        • /incubator/sqoop/trunk/src/test/com/cloudera/sqoop/TestBoundaryQuery.java
        • /incubator/sqoop/trunk/src/test/com/cloudera/sqoop/TestSqoopOptions.java
        Show
        Hudson added a comment - Integrated in Sqoop-jdk-1.6 #31 (See https://builds.apache.org/job/Sqoop-jdk-1.6/31/ ) SQOOP-331 . Support for boundary query. (Jarek Jarcec Cecho via Arvind Prabhakar) arvind : http://svn.apache.org/viewvc/?view=rev&rev=1176981 Files : /incubator/sqoop/trunk/src/docs/man/import-args.txt /incubator/sqoop/trunk/src/docs/user/import.txt /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/SqoopOptions.java /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/manager/SqlManager.java /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/mapreduce/db/DataDrivenDBInputFormat.java /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java /incubator/sqoop/trunk/src/java/com/cloudera/sqoop/tool/ImportTool.java /incubator/sqoop/trunk/src/test/com/cloudera/sqoop/TestBoundaryQuery.java /incubator/sqoop/trunk/src/test/com/cloudera/sqoop/TestSqoopOptions.java

          People

          • Assignee:
            Jarek Jarcec Cecho
            Reporter:
            Jarek Jarcec Cecho
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development