Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-1411

The number of tasks is not set properly in PGBulkloadExportManager

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.4.5
    • 1.4.6
    • connectors
    • None

    Description

      The '-m' option does not work and the number of reduce tasks is always set to 0.
      The cause seems to be the changes of configureNumTasks in ExportJobBase and JobBase.

      Attachments

        1. SQOOP-1411-0.patch
          3 kB
          Masatake Iwasaki
        2. SQOOP-1411-1.patch
          6 kB
          Masatake Iwasaki
        3. SQOOP-1411-2.patch
          4 kB
          Masatake Iwasaki

        Issue Links

          Activity

            Ordinary sqoop job does not have reduce tasks but pg_bulkload connector needs its own staging in reduce tasks.

            iwasakims Masatake Iwasaki added a comment - Ordinary sqoop job does not have reduce tasks but pg_bulkload connector needs its own staging in reduce tasks.

            This patch fixes configureNumTasks. It changes the property for specifying the number of reduce task from "mapred.reduce.tasks" to "pgbulkload.job.reduces".

            iwasakims Masatake Iwasaki added a comment - This patch fixes configureNumTasks. It changes the property for specifying the number of reduce task from "mapred.reduce.tasks" to "pgbulkload.job.reduces".
            gwenshap Gwen Shapira added a comment -

            Thanks for submitting an issue and a patch iwasakims.

            Some questions I have:

            • Why rename a parameter? What was wrong with the existing name?
            • The change to configureNumTasks is very confusing. Are you trying to set both number of mappers and reducers in same function? This makes sense, but the code no longer calls setJobNumMappers and removes an important validation and warning.

            Can you resolve these issues and add a bit of documentation to your logic in configureNumTasks?

            gwenshap Gwen Shapira added a comment - Thanks for submitting an issue and a patch iwasakims . Some questions I have: Why rename a parameter? What was wrong with the existing name? The change to configureNumTasks is very confusing. Are you trying to set both number of mappers and reducers in same function? This makes sense, but the code no longer calls setJobNumMappers and removes an important validation and warning. Can you resolve these issues and add a bit of documentation to your logic in configureNumTasks?

            the code no longer calls setJobNumMappers and removes an important validation and warning.

            The number of map tasks is set in ExportJobBase#configureNumTasks which is called from PGBulkloadExportJob.

                int numMaps = super.configureNumTasks(job);
            

            I would like not copying and pasting the code in configureNumTasks but calling super.configureNumTasks in order to follow future changes.

            Why rename a parameter? What was wrong with the existing name?

            Because the value of mapred.reduce.tasks (which is deprecated by mapreduce.job.reduces) are set to 0 in JobBase#configureNumTasks.

                job.setNumReduceTasks(0);
            
            iwasakims Masatake Iwasaki added a comment - the code no longer calls setJobNumMappers and removes an important validation and warning. The number of map tasks is set in ExportJobBase#configureNumTasks which is called from PGBulkloadExportJob. int numMaps = super .configureNumTasks(job); I would like not copying and pasting the code in configureNumTasks but calling super.configureNumTasks in order to follow future changes. Why rename a parameter? What was wrong with the existing name? Because the value of mapred.reduce.tasks (which is deprecated by mapreduce.job.reduces) are set to 0 in JobBase#configureNumTasks. job.setNumReduceTasks(0);

            I thinkd the logic of configureNumTasks is confusing as

            • export job uses different property from import job in order to specify the number of map tasks,
            • it returns the number of map tasks though the comment of JobBase#configureNumTasks says "Configure the number of map/reduce tasks to use in the job.".

            Should I split configureNumTasks to configureNumMapTasks and configureNumReduceTasks and override them?

            iwasakims Masatake Iwasaki added a comment - I thinkd the logic of configureNumTasks is confusing as export job uses different property from import job in order to specify the number of map tasks, it returns the number of map tasks though the comment of JobBase#configureNumTasks says "Configure the number of map/reduce tasks to use in the job.". Should I split configureNumTasks to configureNumMapTasks and configureNumReduceTasks and override them?

            I updated the patch on review board.

            • splitted configureNumTasks to configureNumMapTasks and configureNumReduceTasks.
            • reverted the change to use pgbulkload.job.reduces instead of mapred.reduce.tasks.
            iwasakims Masatake Iwasaki added a comment - I updated the patch on review board . splitted configureNumTasks to configureNumMapTasks and configureNumReduceTasks. reverted the change to use pgbulkload.job.reduces instead of mapred.reduce.tasks.
            gwenshap Gwen Shapira added a comment -

            It looks good to me.

            Can you attach the latest version of the patch to this Jira? It will help our committers make sure the right version is committed.

            gwenshap Gwen Shapira added a comment - It looks good to me. Can you attach the latest version of the patch to this Jira? It will help our committers make sure the right version is committed.

            Thanks for your comments gwenshap. attaching updated patch.

            iwasakims Masatake Iwasaki added a comment - Thanks for your comments gwenshap . attaching updated patch.
            gwenshap Gwen Shapira added a comment -

            Changing status to "patch available" so committers will see this.

            gwenshap Gwen Shapira added a comment - Changing status to "patch available" so committers will see this.

            attaching updated patch which keeps JobBase#configureNumTasks for backward compatibility.

            iwasakims Masatake Iwasaki added a comment - attaching updated patch which keeps JobBase#configureNumTasks for backward compatibility.

            Commit cfe503744b885defa0998462b4210bee12dec518 in sqoop's branch refs/heads/trunk from jarcec
            [ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=cfe5037 ]

            SQOOP-1411: The number of tasks is not set properly in PGBulkloadExportManager

            (Masatake Iwasaki via Jarek Jarcec Cecho)

            jira-bot ASF subversion and git services added a comment - Commit cfe503744b885defa0998462b4210bee12dec518 in sqoop's branch refs/heads/trunk from jarcec [ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=cfe5037 ] SQOOP-1411 : The number of tasks is not set properly in PGBulkloadExportManager (Masatake Iwasaki via Jarek Jarcec Cecho)
            hudson Hudson added a comment -

            SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop200 #923 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop200/923/)
            SQOOP-1411: The number of tasks is not set properly in PGBulkloadExportManager (jarcec: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=cfe503744b885defa0998462b4210bee12dec518)

            • src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java
            • src/java/org/apache/sqoop/mapreduce/ExportJobBase.java
            • src/java/org/apache/sqoop/mapreduce/postgresql/PGBulkloadExportJob.java
            • src/java/org/apache/sqoop/mapreduce/JobBase.java
            hudson Hudson added a comment - SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop200 #923 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop200/923/ ) SQOOP-1411 : The number of tasks is not set properly in PGBulkloadExportManager (jarcec: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=cfe503744b885defa0998462b4210bee12dec518 ) src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java src/java/org/apache/sqoop/mapreduce/ExportJobBase.java src/java/org/apache/sqoop/mapreduce/postgresql/PGBulkloadExportJob.java src/java/org/apache/sqoop/mapreduce/JobBase.java

            Thank you for the contribution iwasakims and for the review gwenshap!

            jarcec Jarek Jarcec Cecho added a comment - Thank you for the contribution iwasakims and for the review gwenshap !
            hudson Hudson added a comment -

            FAILURE: Integrated in Sqoop-ant-jdk-1.6-hadoop20 #917 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop20/917/)
            SQOOP-1411: The number of tasks is not set properly in PGBulkloadExportManager (jarcec: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=cfe503744b885defa0998462b4210bee12dec518)

            • src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java
            • src/java/org/apache/sqoop/mapreduce/postgresql/PGBulkloadExportJob.java
            • src/java/org/apache/sqoop/mapreduce/JobBase.java
            • src/java/org/apache/sqoop/mapreduce/ExportJobBase.java
            hudson Hudson added a comment - FAILURE: Integrated in Sqoop-ant-jdk-1.6-hadoop20 #917 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop20/917/ ) SQOOP-1411 : The number of tasks is not set properly in PGBulkloadExportManager (jarcec: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=cfe503744b885defa0998462b4210bee12dec518 ) src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java src/java/org/apache/sqoop/mapreduce/postgresql/PGBulkloadExportJob.java src/java/org/apache/sqoop/mapreduce/JobBase.java src/java/org/apache/sqoop/mapreduce/ExportJobBase.java
            hudson Hudson added a comment -

            SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop23 #1120 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop23/1120/)
            SQOOP-1411: The number of tasks is not set properly in PGBulkloadExportManager (jarcec: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=cfe503744b885defa0998462b4210bee12dec518)

            • src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java
            • src/java/org/apache/sqoop/mapreduce/postgresql/PGBulkloadExportJob.java
            • src/java/org/apache/sqoop/mapreduce/ExportJobBase.java
            • src/java/org/apache/sqoop/mapreduce/JobBase.java
            hudson Hudson added a comment - SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop23 #1120 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop23/1120/ ) SQOOP-1411 : The number of tasks is not set properly in PGBulkloadExportManager (jarcec: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=cfe503744b885defa0998462b4210bee12dec518 ) src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java src/java/org/apache/sqoop/mapreduce/postgresql/PGBulkloadExportJob.java src/java/org/apache/sqoop/mapreduce/ExportJobBase.java src/java/org/apache/sqoop/mapreduce/JobBase.java
            hudson Hudson added a comment -

            SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop100 #881 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop100/881/)
            SQOOP-1411: The number of tasks is not set properly in PGBulkloadExportManager (jarcec: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=cfe503744b885defa0998462b4210bee12dec518)

            • src/java/org/apache/sqoop/mapreduce/postgresql/PGBulkloadExportJob.java
            • src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java
            • src/java/org/apache/sqoop/mapreduce/JobBase.java
            • src/java/org/apache/sqoop/mapreduce/ExportJobBase.java
            hudson Hudson added a comment - SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop100 #881 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop100/881/ ) SQOOP-1411 : The number of tasks is not set properly in PGBulkloadExportManager (jarcec: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=cfe503744b885defa0998462b4210bee12dec518 ) src/java/org/apache/sqoop/mapreduce/postgresql/PGBulkloadExportJob.java src/test/com/cloudera/sqoop/manager/PGBulkloadManagerManualTest.java src/java/org/apache/sqoop/mapreduce/JobBase.java src/java/org/apache/sqoop/mapreduce/ExportJobBase.java

            People

              iwasakims Masatake Iwasaki
              iwasakims Masatake Iwasaki
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: