Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-331

Support boundary query on the command line

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.0-incubating
    • Fix Version/s: 1.4.0-incubating
    • Component/s: tools
    • Labels:
      None

      Description

      It would be nice if the sqoop would have ability to specify query that will fetch minimal and maximal value for creating splits in DataDrivenDBInputFormat from the command line.

      Normally sqoop will generate query to get maximal and minimal value for creating splits in following form: SELECT min($split_by_column), max($split_by_column) FROM $table WHERE $cmd_where. In my use case, I needed to import only portion of data with ranges based on the split_by_column that I already have preselected and that are available in special table that holds data ranges and appropriate primary key values. So my auto generated query looked like this: SELECT min(id), max(id) FROM table WHERE id => min_id and id <= max_id. That query is obviously useless and is just creating unnecessary load on the database server. It would be nice to supply my own boundary query that will use the extra table with data ranges.

        Attachments

        1. SQOOP-331.patch
          15 kB
          Jarek Jarcec Cecho
        2. SQOOP-331.patch
          5 kB
          Jarek Jarcec Cecho

          Activity

            People

            • Assignee:
              jarcec Jarek Jarcec Cecho
              Reporter:
              jarcec Jarek Jarcec Cecho
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: