Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-3151

Sqoop export HDFS file type auto detection can pick wrong type

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.4.6
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      It appears that Sqoop export tries to detect the file format by reading the first 3 characters of a file. Based on that header, the appropriate file reader is used. However, if the result set happens to contain the header sequence, the wrong reader is chosen resulting in a misleading error.

      For example, if someone is exporting a table in which one of the field values is "PART". Since Sqoop sees the letters "PAR", it is invoking the Kite SDK as it assumes the file is in Parquet format. This leads to a misleading error:

      ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist: hdfs://<path>.metadata
      org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist: hdfs://<path>.metadata

      This can be reproduced easily, using Hive as a real world example:

      > create table test2 (val string);
      > insert into test1 values ('PAR');

      Then run a sqoop export against the table data:

      $ sqoop export --connect $MYCONN --username $MYUSER --password $MYPWD -m 1 --export-dir /user/hive/warehouse/test --table $MYTABLE

      Sqoop will fail with the following:
      ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist: hdfs://<path>.metadata
      org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist: hdfs://<path>.metadata

      Changing value from "PAR" to something else, like 'Obj' (Avro) or 'SEQ' (sequencefile), which will result in similar errors.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sanysandish@gmail.com Sandish Kumar HN
                Reporter:
                BoglarkaEgyed Boglarka Egyed
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: