Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-1293

--hive-import causes --target-dir and --warehouse-dir to not be respected, nor --delete-target-dir

    XMLWordPrintableJSON

Details

    Description

      Hi,

      I'm importing a table from SQL Server 2012 and am using --hive-import to create the metadata automatically, but am finding that it causes --target-dir and --warehouse-dir to not be respected, nor --delete-target-dir.

      sqoop import --connect "jdbc:sqlserver://x.x.x.x:1533;database=MyDatabase" --username omitted --password omitted --driver com.microsoft.sqlserver.jdbc.SQLServerDriver --table "cube.DimCounterParty" --split-by CounterpartyKey --hive-import --target-dir /MyDatabase/CounterParty --delete-target-dir

      (fyi I'm using --driver to work around bug SQOOP-1292)

      So I tried --warehouse-dir in case it needed that instead of --target-dir

      sqoop import --connect "jdbc:sqlserver://x.x.x.x:1533;database=MyDatabase" --username omitted --password omitted --driver com.microsoft.sqlserver.jdbc.SQLServerDriver --table "cube.DimCounterParty" --split-by CounterpartyKey --hive-import --warehouse-dir /MyDatabase/CounterParty --delete-target-dir

      but in both cases it ingested the data to /apps/hive/warehouse/cube.db/dimcounterparty.

      What's also strange is that it created the directory specified for --warehouse-dir but then didn't appear to place the data in it.

      I wanted to use --delete-target-dir to replace the whole table each time for this test since the source table is only ~650,000 rows and 185MB.

      What I've found is that on top of ingesting in to /apps/hive/warehouse/cube.db/dimcounterparty by disregarding --delete-target-dir it is causing the table volume to grow cumulatively for each run, such that after a few runs the

      select count(*)

      on the table now shows 5,546,661 rows instead of 650,000.

      Here is the the hive warehouse directory on HDFS where you can see the accumulation of the data:

       hadoop fs -ls /apps/hive/warehouse/cube.db/dimcounterparty/
      Found 40 items
      -rw-r--r--   3 root hdfs          0 2014-03-07 08:44 /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS
      -rw-r--r--   3 root hdfs          0 2014-03-07 09:10 /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_1
      -rw-r--r--   3 root hdfs          0 2014-03-07 09:33 /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_2
      -rw-r--r--   3 root hdfs          0 2014-03-07 09:37 /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_3
      -rw-r--r--   3 root hdfs          0 2014-03-07 09:42 /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_4
      -rw-r--r--   3 root hdfs          0 2014-03-07 10:04 /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_5
      -rw-r--r--   3 root hdfs          0 2014-03-07 10:14 /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_6
      -rw-r--r--   3 root hdfs          0 2014-03-07 10:16 /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_7
      -rw-r--r--   3 root hdfs   49044407 2014-03-07 08:44 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000
      -rw-r--r--   3 root hdfs   49045389 2014-03-07 09:10 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_1
      -rw-r--r--   3 root hdfs   49045944 2014-03-07 09:33 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_2
      -rw-r--r--   3 root hdfs   49045944 2014-03-07 09:37 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_3
      -rw-r--r--   3 root hdfs   49045944 2014-03-07 09:41 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_4
      -rw-r--r--   3 root hdfs   49045944 2014-03-07 10:04 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_5
      -rw-r--r--   3 root hdfs   49045944 2014-03-07 10:14 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_6
      -rw-r--r--   3 root hdfs   49045944 2014-03-07 10:15 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_7
      -rw-r--r--   3 root hdfs   52363518 2014-03-07 08:44 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001
      -rw-r--r--   3 root hdfs   52363912 2014-03-07 09:10 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_1
      -rw-r--r--   3 root hdfs   52364256 2014-03-07 09:33 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_2
      -rw-r--r--   3 root hdfs   52364256 2014-03-07 09:37 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_3
      -rw-r--r--   3 root hdfs   52364256 2014-03-07 09:41 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_4
      -rw-r--r--   3 root hdfs   52364256 2014-03-07 10:03 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_5
      -rw-r--r--   3 root hdfs   52364256 2014-03-07 10:14 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_6
      -rw-r--r--   3 root hdfs   52364256 2014-03-07 10:15 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_7
      -rw-r--r--   3 root hdfs   51796051 2014-03-07 08:44 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002
      -rw-r--r--   3 root hdfs   51796027 2014-03-07 09:10 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_1
      -rw-r--r--   3 root hdfs   51796623 2014-03-07 09:33 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_2
      -rw-r--r--   3 root hdfs   51796623 2014-03-07 09:37 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_3
      -rw-r--r--   3 root hdfs   51796623 2014-03-07 09:41 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_4
      -rw-r--r--   3 root hdfs   51796623 2014-03-07 10:03 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_5
      -rw-r--r--   3 root hdfs   51796623 2014-03-07 10:14 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_6
      -rw-r--r--   3 root hdfs   51796623 2014-03-07 10:15 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_7
      -rw-r--r--   3 root hdfs   45445570 2014-03-07 08:44 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003
      -rw-r--r--   3 root hdfs   45445544 2014-03-07 09:10 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_1
      -rw-r--r--   3 root hdfs   45445719 2014-03-07 09:33 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_2
      -rw-r--r--   3 root hdfs   45445719 2014-03-07 09:37 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_3
      -rw-r--r--   3 root hdfs   45445719 2014-03-07 09:42 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_4
      -rw-r--r--   3 root hdfs   45445719 2014-03-07 10:04 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_5
      -rw-r--r--   3 root hdfs   45445719 2014-03-07 10:14 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_6
      -rw-r--r--   3 root hdfs   45445719 2014-03-07 10:16 /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_7
      

      Is this a bug that it doesn't respect --target-dir or at least --warehouse-dir?

      This highlights another issue that this should be more intuitive and/or

      sqoop import --help

      should make it easier to see what options are (not) compatible, or it should specify in the output at job initiation time where switches will be disregarded, such as it does when using

      --hive-<option>

      without

      --hive-import

      In my last place I recall using sqoop create-hive-table to generate the metadata after import and then editing the table location metadata. It would be a lot better if we could fix the behaviour of --hive-import to not require such a multi-step workaround.

      Thanks

      Hari Sekhon
      http://www.linkedin.com/in/harisekhon

      Attachments

        Activity

          People

            Unassigned Unassigned
            harisekhon Hari Sekhon
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: