Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-1603

Sqoop2: Explicit support for Merge in the Sqoop Job lifecyle in the MR engine

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 2.0.0
    • None
    • None

    Description

      The Destroyer api and its javadoc

      
      /**
       * This allows connector to define work to complete execution, for example,
       * resource cleaning.
       */
      public abstract class Destroyer<LinkConfiguration, JobConfiguration> {
      
        /**
         * Callback to clean up after job execution.
         *
         * @param context Destroyer context
         * @param linkConfiguration link configuration object
         * @param jobConfiguration job configuration object for the FROM and TO
         *        In case of the FROM initializer this will represent the FROM job configuration
         *        In case of the TO initializer this will represent the TO job configuration
         */
        public abstract void destroy(DestroyerContext context,
                                     LinkConfiguration linkConfiguration,
                                     JobConfiguration jobConfiguration);
      
      }
      
      

      This ticket was created while reviewing the Kite Connector use case where the destroyer does the actual temp data set merge
      https://reviews.apache.org/r/26963/diff/# stanleyxu2005

      public void destroy(DestroyerContext context, LinkConfiguration link,
            ToJobConfiguration job) {
          LOG.info("Running Kite connector destroyer");
          // Every loader instance creates a temporary dataset. If the MR job is
          // successful, all temporary dataset should be merged as one dataset,
          // otherwise they should be deleted all.
          String[] uris = KiteDatasetExecutor.listTemporaryDatasetUris(
              job.toDataset.uri);
          if (context.isSuccess()) {
            KiteDatasetExecutor executor = new KiteDatasetExecutor(job.toDataset.uri,
                context.getSchema(), link.link.fileFormat);
            for (String uri : uris) {
              executor.mergeDataset(uri);
              LOG.info(String.format("Temporary dataset %s merged", uri));
            }
          } else {
            for (String uri : uris) {
              KiteDatasetExecutor.deleteDataset(uri);
              LOG.info(String.format("Temporary dataset %s deleted", uri));
            }
          }
        }
      

      Wondering if such things should be its own phase rather than in destroyers. The responsibility of destroyer is more to clean up/ closing/ daat sources for both FROM/TO data sources to be more precise .. should such operations that modify records / merge/ munge be its own step ?.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              vybs Veena Basavaraj
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: