Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-2165

Can't use warehouse-dir with parquet

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.4.5
    • Fix Version/s: None
    • Component/s: hive-integration
    • Labels:
      None

      Description

      Gwen and I were working on some code for Data Warehousing that uses sqoop and we found something interesting.

      At one place:
      Sqoop1 claims warehouse-dir and target-dir are incompatible:
      https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/tool/ImportTool.java#L1006

      (We should mention this in the docs btw)

      But, then if we only put the warehouse-dir (and don't specify the target dir), it complains that the target-dir needs to be specified. See here:
      https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/tool/ImportTool.java#L1019

      And, fyi, here is the query we ran:

      sqoop job --create user_upserts_import --meta-connect jdbc:hsqldb:hsql://${SQOOP_METASTORE_HOST}:16000/sqoop \
      -- import --connect jdbc:mysql://<MYSQL>:3306/oltp --username root \
      -m 8 --incremental append --check-column last_modified --split-by last_modified --as-parquetfile \
      --query 'SELECT user.id, user.age, user.gender,
      occupation.occupation, zipcode, last_modified FROM user JOIN occupation
      ON (user.occupation_id = occupation.id) WHERE $CONDITIONS' \--hive-import --hive-table user_upserts --warehouse-dir /etl/movielens/
      

      If we specify just the target-dir, we get a warning about writing to target-dir and the data goes to default warehouse directory (/usr/hive...), which is pretty unexpected:

      15/03/03 14:47:22 WARN util.AppendUtils: Cannot append files to target dir; no such directory: _sqoop/03144643000000600_30456_mgrover-haa2-4.vpc.cloudera.com_f8bf8ac4

      obviously the directory in the warning is not the target dir we specified... this looks like something internal to the Kite/Parquet code.

        Attachments

          Activity

            People

            • Assignee:
              ted.m Theodore michael Malaska
              Reporter:
              mgrover Mark Grover
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: