Sqoop
  1. Sqoop
  2. SQOOP-350

Add support for requiring that a connector be used, otherwise the job should fail

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: 1.3.0
    • Fix Version/s: 1.4.3
    • Component/s: connectors
    • Labels:
      None

      Description

      There are situations where it is critical that a specific connector be used during a Sqoop. For example, if you have a table that doesn't have a suitable column for partitioning, and thus you're relying on OraOop's row-based partitioning, then it's critical that OraOop be used. If the Sqoop request falls back to the generic Oracle connector, this puts huge, unacceptable load on the database.

      The proposal is to add a -connector <class name> parameter, which will cause the job to fail unless it's handled by the connector (from sqoop.ConnFactory.getManager) with the matching class name.

        Issue Links

          Activity

          Hide
          Jarek Jarcec Cecho added a comment -

          I believe that this issue was resolved by SQOOP-529, where we moved handling of the --connecton-manager parameter out of default factory and thus enforce it's usage all the time. Please do not hesitate to reopen this issue if needed.

          Show
          Jarek Jarcec Cecho added a comment - I believe that this issue was resolved by SQOOP-529 , where we moved handling of the --connecton-manager parameter out of default factory and thus enforce it's usage all the time. Please do not hesitate to reopen this issue if needed.
          Hide
          David Robson added a comment -

          @Ken - You should be able to set this on the command line with -Dsqoop.connection.factories=com.quest.oraoop.OraOopManagerFactory. Yes OraOop does not work for single mapper as there would be no benefit - so for that particular job you can leave the command line option out.

          Show
          David Robson added a comment - @Ken - You should be able to set this on the command line with -Dsqoop.connection.factories=com.quest.oraoop.OraOopManagerFactory. Yes OraOop does not work for single mapper as there would be no benefit - so for that particular job you can leave the command line option out.
          Hide
          Ken Krugler added a comment -

          @David - it looks like this parameter is one set in sqoop-site.xml, yes? If so, then that's not a good solution for us, as we have one import we do that has to use a single mapper, in which case OraOop won't be used, right?

          Show
          Ken Krugler added a comment - @David - it looks like this parameter is one set in sqoop-site.xml, yes? If so, then that's not a good solution for us, as we have one import we do that has to use a single mapper, in which case OraOop won't be used, right?
          Hide
          David Robson added a comment -

          Actually - I found a parameter to do this I believe.
          If you add the parameter sqoop.connection.factories=com.quest.oraoop.OraOopManagerFactory then this should force the OraOop manager factory to be used.
          It is a bit confusing because OraOop ignores the connection-manager setting - should ManagerFactory plugins respect this setting? Or if a user sets this should the DefaultManagerFactory be forced to be used?

          Show
          David Robson added a comment - Actually - I found a parameter to do this I believe. If you add the parameter sqoop.connection.factories=com.quest.oraoop.OraOopManagerFactory then this should force the OraOop manager factory to be used. It is a bit confusing because OraOop ignores the connection-manager setting - should ManagerFactory plugins respect this setting? Or if a user sets this should the DefaultManagerFactory be forced to be used?
          Hide
          Ken Krugler added a comment -

          As David Robson explains, current support doesn't actual do what I need - which is to force a particular connection manager factory to be used.

          Show
          Ken Krugler added a comment - As David Robson explains, current support doesn't actual do what I need - which is to force a particular connection manager factory to be used.
          Hide
          David Robson added a comment -

          Actually the connection-manager does not do exactly what Ken wants here.
          The connection-manager parameter is for the DefaultManagerFactory to tell it what GenericJdbcManager to use. The Quest Data Connector for Oracle and Hadoop works by implementing its own manager factory - OraOopManagerFactory. So in the case a user specifies connection-manager - the OraOopManagerFactory could still process the job anyway as it does not look at this parameter.
          Would it be possible to have a connection-manager-factory property? This was you could force Sqoop to use the OraOopManagerFactory, or if you wanted to use the connection-manager parameter, you could specify --connection-manager-factory=DefaultManagerFactory --connection-manager=...
          Ken is currently experiencing a side effect of this parameter in that if you specify an invalid class here, OraOopManagerFactory will still process the job, but if it doesn't, when the DefaultManagerFactory is used after that it will fail.

          Show
          David Robson added a comment - Actually the connection-manager does not do exactly what Ken wants here. The connection-manager parameter is for the DefaultManagerFactory to tell it what GenericJdbcManager to use. The Quest Data Connector for Oracle and Hadoop works by implementing its own manager factory - OraOopManagerFactory. So in the case a user specifies connection-manager - the OraOopManagerFactory could still process the job anyway as it does not look at this parameter. Would it be possible to have a connection-manager-factory property? This was you could force Sqoop to use the OraOopManagerFactory, or if you wanted to use the connection-manager parameter, you could specify --connection-manager-factory=DefaultManagerFactory --connection-manager=... Ken is currently experiencing a side effect of this parameter in that if you specify an invalid class here, OraOopManagerFactory will still process the job, but if it doesn't, when the DefaultManagerFactory is used after that it will fail.
          Hide
          Ken Krugler added a comment -

          This, as Arvind pointed out, is already handled by the --connection-manager parameter.

          Show
          Ken Krugler added a comment - This, as Arvind pointed out, is already handled by the --connection-manager parameter.
          Hide
          Ken Krugler added a comment -

          Yes, thanks for pointing that out - I thought this was a way to change the Sqoop connection manager, not the pluggable connection being used.

          I tried it out, and using --connection-manager com.quest.oraoop.OraOopManagerFactory forced OraOop to be used. If OraOop tried to hand off responsibility (because it decided it couldn't handle the request) then the Sqoop failed, which is exactly the behavior that we need.

          It would be great to call this out in the documentation, maybe with a note that the class will (typically be) xxx.YyyManagerFactory, since it took a few tries for me to figure out that the right class name is the factory, not the actual manager.

          Show
          Ken Krugler added a comment - Yes, thanks for pointing that out - I thought this was a way to change the Sqoop connection manager, not the pluggable connection being used. I tried it out, and using --connection-manager com.quest.oraoop.OraOopManagerFactory forced OraOop to be used. If OraOop tried to hand off responsibility (because it decided it couldn't handle the request) then the Sqoop failed, which is exactly the behavior that we need. It would be great to call this out in the documentation, maybe with a note that the class will (typically be) xxx.YyyManagerFactory, since it took a few tries for me to figure out that the right class name is the factory, not the actual manager.
          Hide
          Arvind Prabhakar added a comment -

          Sqoop currently supports the option of explicitly specifying a connection manager (via the --connection-manager <class-name> option). Would this be sufficient to bind the job to the connector?

          Show
          Arvind Prabhakar added a comment - Sqoop currently supports the option of explicitly specifying a connection manager (via the --connection-manager <class-name> option). Would this be sufficient to bind the job to the connector?

            People

            • Assignee:
              Unassigned
              Reporter:
              Ken Krugler
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development