Details

    • Type: Task Task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Over the last several months, Sqoop has evolved to a state that is functional and has room for extensions. Developing extensions requires a stable API and documentation. I am attaching to this ticket a description of Sqoop's design and internal APIs, which include some open questions. I would like to solicit input on the design regarding these open questions and standardize the API.

      1. sqoop-reference.txt
        12 kB
        Aaron Kimball
      2. MAPREDUCE-1036.patch
        13 kB
        Aaron Kimball

        Issue Links

          Activity

          Hide
          Aaron Kimball added a comment -

          Attaching a draft of the API reference. After the open questions are discussed, I will upload a final version of this document formatted as a patch which extends the existing user-facing documentation.

          Show
          Aaron Kimball added a comment - Attaching a draft of the API reference. After the open questions are discussed, I will upload a final version of this document formatted as a patch which extends the existing user-facing documentation.
          Hide
          Aaron Kimball added a comment -

          After ruminating on this for a while, here are the decisions I plan to implement:

          • ImportOptions will contain a reference to the Configuration used when Sqoop's main() method was invoked. This will be the preferred mechanism to send manager-specific data from the command line (i.e., via -D k=v and ToolRunner/GenericOptionsParser).
          • StreamHandlerFactory will be renamed to AsyncSink to be shorter. All child classes will be renamed as well.
          • Abstract classes will be used instead of interfaces for all externally-implemented APIs. This affects ManagerFactory, ConnManager, and StreamHandlerFactory/AsyncSink. Going forward, interfaces will only be used for objects internally created via factories whose implementations are opaque to external clients, or to indicate the presence of a single method or behavior (e.g., Closeable).
          Show
          Aaron Kimball added a comment - After ruminating on this for a while, here are the decisions I plan to implement: ImportOptions will contain a reference to the Configuration used when Sqoop's main() method was invoked. This will be the preferred mechanism to send manager-specific data from the command line (i.e., via -D k=v and ToolRunner/GenericOptionsParser). StreamHandlerFactory will be renamed to AsyncSink to be shorter. All child classes will be renamed as well. Abstract classes will be used instead of interfaces for all externally-implemented APIs. This affects ManagerFactory, ConnManager, and StreamHandlerFactory/AsyncSink. Going forward, interfaces will only be used for objects internally created via factories whose implementations are opaque to external clients, or to indicate the presence of a single method or behavior (e.g., Closeable ).
          Hide
          Aaron Kimball added a comment -

          In the same spirit as the "Context" objects provided to Mapper.map() and Reducer.reduce(), I'm also going to modify ConnManager.importJob() to receive an ImportJobContext as its parameter.

          Show
          Aaron Kimball added a comment - In the same spirit as the "Context" objects provided to Mapper.map() and Reducer.reduce(), I'm also going to modify ConnManager.importJob() to receive an ImportJobContext as its parameter.
          Hide
          Todd Lipcon added a comment -

          Read over the doc just now. Looks good to me, and I agree with all of your postruminatory decisions. +1

          Show
          Todd Lipcon added a comment - Read over the doc just now. Looks good to me, and I agree with all of your postruminatory decisions. +1
          Hide
          Tom White added a comment -

          > Abstract classes will be used instead of interfaces for all externally-implemented APIs.

          +1

          The document looks fine to me (nit: "delimeters" typo). I noticed that BigDecimalSerializer seems to be out of place - should it go in Common with Writables, or at least in a different package?

          Show
          Tom White added a comment - > Abstract classes will be used instead of interfaces for all externally-implemented APIs. +1 The document looks fine to me (nit: "delimeters" typo). I noticed that BigDecimalSerializer seems to be out of place - should it go in Common with Writables, or at least in a different package?
          Hide
          Aaron Kimball added a comment -

          BigDecimalSerializer is used from within SqoopRecord classes auto-generated by Sqoop. Therefore it'll be used by client programs. Under the conventions outlined above, that means it belongs in o.a.h.sqoop.lib.

          If it has broader applicability than Sqoop, then it may make sense to promote it elsewhere. But it's not a Writable in-and-of-itself. (SqoopRecord uses BigDecimalSerializer to de/serialize BigDecimal fields within its own readFields/write methods.) To my knowledge, Hadoop doesn't currently have a policy with regard to which package external serializers go in. If you think it'd be more generally useful, I'm happy to move it into common. Do you think it should just go in o.a.h.io?

          Show
          Aaron Kimball added a comment - BigDecimalSerializer is used from within SqoopRecord classes auto-generated by Sqoop. Therefore it'll be used by client programs. Under the conventions outlined above, that means it belongs in o.a.h.sqoop.lib. If it has broader applicability than Sqoop, then it may make sense to promote it elsewhere. But it's not a Writable in-and-of-itself. (SqoopRecord uses BigDecimalSerializer to de/serialize BigDecimal fields within its own readFields/write methods.) To my knowledge, Hadoop doesn't currently have a policy with regard to which package external serializers go in. If you think it'd be more generally useful, I'm happy to move it into common. Do you think it should just go in o.a.h.io?
          Hide
          Aaron Kimball added a comment -

          Attaching API document formatted as a patch against Sqoop's documentation. Modified document to take into account discussion thus far.

          Show
          Aaron Kimball added a comment - Attaching API document formatted as a patch against Sqoop's documentation. Modified document to take into account discussion thus far.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12422756/MAPREDUCE-1036.patch
          against trunk revision 827854.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/195/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/195/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/195/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/195/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422756/MAPREDUCE-1036.patch against trunk revision 827854. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/195/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/195/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/195/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/195/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          I committed this. Thanks, Aaron!

          Show
          Chris Douglas added a comment - I committed this. Thanks, Aaron!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #133 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/133/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #133 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/133/ )

            People

            • Assignee:
              Aaron Kimball
              Reporter:
              Aaron Kimball
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development