1. Sqoop
  2. SQOOP-1277

Import not splitted when using --boundary-query


    • Type: Bug Bug
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 1.4.4
    • Fix Version/s: None
    • Component/s: hive-integration
    • Labels:
    • Environment:

      Amazon AWS


      I try to import Mysql Data into a hive table. I would like to use a custom boundary query. Results : sqoop does not split the load into multiple query and the import takes too long time.

      My creation command :

      sqoop job -Dsqoop.metastore.client.record.password=true \
          --create importJobName -- import \
          --connect jdbc:mysql://some_jdbc_pram \
          --username user_name \
          --password MyPassword \
          --table my_table \
          --columns "collect_id,collected_data_id,value" \
          --boundary-query "SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name = 'key.name'" \
          --split-by column_name \
          --num-mappers X \
          --hive-import \
          --hive-overwrite \
          --hive-table hivedb.hibetable --as-textfile --null-string \\\\N --null-non-string \\\\N

      The following message is displayed :

      WARN db.DataDrivenDBInputFormat: Could not find $CONDITIONS token in query: SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name = 'key.name'; splits may not partition data.

      I tried to add the $CONDITION to the creation command

      --boundary-query "SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name = 'key.name' AND \$CONDITION" \

      But the job execution failed:

      INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name = 'key.name' AND $CONDITIONS
      INFO mapred.JobClient: Cleaning up the staging area hdfs://
      ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop cause:java.io.IOException: com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Unknown column '$CONDITIONS' in 'where clause'


        Porati Sébastien created issue -
        Gwen Shapira made changes -
        Field Original Value New Value
        Assignee Gwen Shapira [ gwenshap ]
        Brett Medalen made changes -
        Comment [ I had a similar problem and came across this JIRA. I put the boundary-query in single quotes and I didn't get the $CONDITIONS error any more.

        Take a look at the Sqoop User Guide about single vs. double quotes: http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_free_form_query_imports

        While the user guide is talking about the -query switch, it also appears to apply to the -boundary-query switch as well. ]


          • Assignee:
            Gwen Shapira
            Porati Sébastien
          • Votes:
            1 Vote for this issue
            3 Start watching this issue


            • Created: