Hive
  1. Hive
  2. HIVE-4554

Failed to create a table from existing file if file path has spaces

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.12.0
    • Component/s: CLI
    • Labels:
      None

      Description

      To reproduce the problem,

      1. Create a table, say, person_age (name STRING, age INT).
      2. Create a file whose name has a space in it, say, "data set.txt".
      3. Try to load the date in the file to the table.

      The following error can be seen in the console:

      hive> LOAD DATA INPATH '/home/xzhang/temp/data set.txt' INTO TABLE person_age;
      Loading data to table default.person_age
      Failed with exception Wrong file format. Please check the file's format.
      FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

      Note: the error message is confusing.

      1. HIVE-4554.patch
        0.7 kB
        Xuefu Zhang
      2. HIVE-4554.patch.1
        5 kB
        Xuefu Zhang
      3. HIVE-4554.patch.2
        4 kB
        Xuefu Zhang
      4. HIVE-4554.patch.3
        4 kB
        Xuefu Zhang
      5. HIVE-4554.patch.4
        8 kB
        Xuefu Zhang
      6. HIVE-4554.patch.5
        8 kB
        Xuefu Zhang

        Activity

        Hide
        Ashutosh Chauhan added a comment -

        This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.

        Show
        Ashutosh Chauhan added a comment - This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #2131 (See https://builds.apache.org/job/Hive-trunk-h0.21/2131/)
        HIVE-4554 : Failed to create a table from existing file if file path has spaces (Xuefu Zhang via Ashutosh Chauhan) (Revision 1490101)

        Result = FAILURE
        hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1490101
        Files :

        • /hive/trunk/build-common.xml
        • /hive/trunk/data/files/person age.txt
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java
        • /hive/trunk/ql/src/test/queries/clientpositive/load_file_with_space_in_the_name.q
        • /hive/trunk/ql/src/test/queries/clientpositive/load_hdfs_file_with_space_in_the_name.q
        • /hive/trunk/ql/src/test/results/clientpositive/load_file_with_space_in_the_name.q.out
        • /hive/trunk/ql/src/test/results/clientpositive/load_hdfs_file_with_space_in_the_name.q.out
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #2131 (See https://builds.apache.org/job/Hive-trunk-h0.21/2131/ ) HIVE-4554 : Failed to create a table from existing file if file path has spaces (Xuefu Zhang via Ashutosh Chauhan) (Revision 1490101) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1490101 Files : /hive/trunk/build-common.xml /hive/trunk/data/files/person age.txt /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java /hive/trunk/ql/src/test/queries/clientpositive/load_file_with_space_in_the_name.q /hive/trunk/ql/src/test/queries/clientpositive/load_hdfs_file_with_space_in_the_name.q /hive/trunk/ql/src/test/results/clientpositive/load_file_with_space_in_the_name.q.out /hive/trunk/ql/src/test/results/clientpositive/load_hdfs_file_with_space_in_the_name.q.out
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-hadoop2 #227 (See https://builds.apache.org/job/Hive-trunk-hadoop2/227/)
        HIVE-4554 : Failed to create a table from existing file if file path has spaces (Xuefu Zhang via Ashutosh Chauhan) (Revision 1490101)

        Result = FAILURE
        hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1490101
        Files :

        • /hive/trunk/build-common.xml
        • /hive/trunk/data/files/person age.txt
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java
        • /hive/trunk/ql/src/test/queries/clientpositive/load_file_with_space_in_the_name.q
        • /hive/trunk/ql/src/test/queries/clientpositive/load_hdfs_file_with_space_in_the_name.q
        • /hive/trunk/ql/src/test/results/clientpositive/load_file_with_space_in_the_name.q.out
        • /hive/trunk/ql/src/test/results/clientpositive/load_hdfs_file_with_space_in_the_name.q.out
        Show
        Hudson added a comment - Integrated in Hive-trunk-hadoop2 #227 (See https://builds.apache.org/job/Hive-trunk-hadoop2/227/ ) HIVE-4554 : Failed to create a table from existing file if file path has spaces (Xuefu Zhang via Ashutosh Chauhan) (Revision 1490101) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1490101 Files : /hive/trunk/build-common.xml /hive/trunk/data/files/person age.txt /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java /hive/trunk/ql/src/test/queries/clientpositive/load_file_with_space_in_the_name.q /hive/trunk/ql/src/test/queries/clientpositive/load_hdfs_file_with_space_in_the_name.q /hive/trunk/ql/src/test/results/clientpositive/load_file_with_space_in_the_name.q.out /hive/trunk/ql/src/test/results/clientpositive/load_hdfs_file_with_space_in_the_name.q.out
        Hide
        Ashutosh Chauhan added a comment -

        Committed to trunk. Thanks, Xuefu!

        Show
        Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Xuefu!
        Hide
        Xuefu Zhang added a comment -

        Patch is updated with the following change made to the test case in TestMinimrCliDriver:

        From:

        dfs -rmr hdfs:///tmp/test/load_file_with_space_in_the_name;

        to:

        dfs -rmr hdfs:///tmp/test;

        which should get ride of /tmp/test, allowing other test cases to create it again.

        Show
        Xuefu Zhang added a comment - Patch is updated with the following change made to the test case in TestMinimrCliDriver: From: dfs -rmr hdfs:///tmp/test/load_file_with_space_in_the_name; to: dfs -rmr hdfs:///tmp/test; which should get ride of /tmp/test, allowing other test cases to create it again.
        Hide
        Ashutosh Chauhan added a comment -

        TestMinimrCliDriver.schemeAuthority.q fails with exception {{ mkdir: cannot create directory hdfs:///tmp/test: File exists }} I think if you modify last line of your test to do dfs -rmr hdfs:///tmp/test that should be sufficient.

        Show
        Ashutosh Chauhan added a comment - TestMinimrCliDriver.schemeAuthority.q fails with exception {{ mkdir: cannot create directory hdfs:///tmp/test: File exists }} I think if you modify last line of your test to do dfs -rmr hdfs:///tmp/test that should be sufficient.
        Hide
        Ashutosh Chauhan added a comment -

        Thanks, Xuefu for testing that.
        +1 will commit if tests pass.

        Show
        Ashutosh Chauhan added a comment - Thanks, Xuefu for testing that. +1 will commit if tests pass.
        Hide
        Xuefu Zhang added a comment -

        Patch is updated with new test case for loading HDFS file with special character (space) in the file name to a table.

        Show
        Xuefu Zhang added a comment - Patch is updated with new test case for loading HDFS file with special character (space) in the file name to a table.
        Hide
        Ashutosh Chauhan added a comment -

        Comment on RB.

        Show
        Ashutosh Chauhan added a comment - Comment on RB.
        Hide
        Xuefu Zhang added a comment -

        HIVE-4554.patch.3 is the same as HIVE-4554.patch.2 except that it includs the data input file for the new test case which was missing.

        All test case passed.

        RB request is here: https://reviews.apache.org/r/11335/

        Show
        Xuefu Zhang added a comment - HIVE-4554 .patch.3 is the same as HIVE-4554 .patch.2 except that it includs the data input file for the new test case which was missing. All test case passed. RB request is here: https://reviews.apache.org/r/11335/
        Hide
        Xuefu Zhang added a comment -

        Thank you, Ashutosh.

        I have updated the batch using URIUtil.decode() to decode the special characters in the path instead of using getPath(), which does the decoding but causes the problems as you mentioned.

        Show
        Xuefu Zhang added a comment - Thank you, Ashutosh. I have updated the batch using URIUtil.decode() to decode the special characters in the path instead of using getPath(), which does the decoding but causes the problems as you mentioned.
        Hide
        Ashutosh Chauhan added a comment -

        Canceling patch since there are review comments.

        Show
        Ashutosh Chauhan added a comment - Canceling patch since there are review comments.
        Hide
        Ashutosh Chauhan added a comment -

        Few comments:

        • In EximUtil.java changing relativeToAbsolutePath to return path instead of uri may not be a good idea. This function is also used by SemanticAnalyzer::createTable which uses it to get absolute location and then stores in metastore. If this is changed to return only path component we will loose scheme and authority.
        • Similarly LoadSemanticAnalyzer, line 149 its better to use toString, instead of path in error message, since location could be remote fs and hence scheme, authority, port etc. are useful info in the error message.
        • At line 261, uri is not necessarily of local fs, so other components of url needs to be preserved, we shall use url here also.

        I believe your test case will pass if you revert these changes and keep other changes. Can you test? Also, it will be better to create phabricator or RB entry for easier review.

        Show
        Ashutosh Chauhan added a comment - Few comments: In EximUtil.java changing relativeToAbsolutePath to return path instead of uri may not be a good idea. This function is also used by SemanticAnalyzer::createTable which uses it to get absolute location and then stores in metastore. If this is changed to return only path component we will loose scheme and authority. Similarly LoadSemanticAnalyzer, line 149 its better to use toString, instead of path in error message, since location could be remote fs and hence scheme, authority, port etc. are useful info in the error message. At line 261, uri is not necessarily of local fs, so other components of url needs to be preserved, we shall use url here also. I believe your test case will pass if you revert these changes and keep other changes. Can you test? Also, it will be better to create phabricator or RB entry for easier review.
        Hide
        Xuefu Zhang added a comment -

        Updated the patch and added a test case.

        Show
        Xuefu Zhang added a comment - Updated the patch and added a test case.
        Hide
        Xuefu Zhang added a comment -

        Patch attempting to fix the issue.

        Show
        Xuefu Zhang added a comment - Patch attempting to fix the issue.

          People

          • Assignee:
            Xuefu Zhang
            Reporter:
            Xuefu Zhang
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development