Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4554

Failed to create a table from existing file if file path has spaces

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.12.0
    • Component/s: CLI
    • Labels:
      None

      Description

      To reproduce the problem,

      1. Create a table, say, person_age (name STRING, age INT).
      2. Create a file whose name has a space in it, say, "data set.txt".
      3. Try to load the date in the file to the table.

      The following error can be seen in the console:

      hive> LOAD DATA INPATH '/home/xzhang/temp/data set.txt' INTO TABLE person_age;
      Loading data to table default.person_age
      Failed with exception Wrong file format. Please check the file's format.
      FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

      Note: the error message is confusing.

      1. HIVE-4554.patch.5
        8 kB
        Xuefu Zhang
      2. HIVE-4554.patch.4
        8 kB
        Xuefu Zhang
      3. HIVE-4554.patch.3
        4 kB
        Xuefu Zhang
      4. HIVE-4554.patch.2
        4 kB
        Xuefu Zhang
      5. HIVE-4554.patch.1
        5 kB
        Xuefu Zhang
      6. HIVE-4554.patch
        0.7 kB
        Xuefu Zhang

        Activity

        Hide
        ashutoshc Ashutosh Chauhan added a comment -

        This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.

        Show
        ashutoshc Ashutosh Chauhan added a comment - This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.
        Hide
        hudson Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #2131 (See https://builds.apache.org/job/Hive-trunk-h0.21/2131/)
        HIVE-4554 : Failed to create a table from existing file if file path has spaces (Xuefu Zhang via Ashutosh Chauhan) (Revision 1490101)

        Result = FAILURE
        hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1490101
        Files :

        • /hive/trunk/build-common.xml
        • /hive/trunk/data/files/person age.txt
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java
        • /hive/trunk/ql/src/test/queries/clientpositive/load_file_with_space_in_the_name.q
        • /hive/trunk/ql/src/test/queries/clientpositive/load_hdfs_file_with_space_in_the_name.q
        • /hive/trunk/ql/src/test/results/clientpositive/load_file_with_space_in_the_name.q.out
        • /hive/trunk/ql/src/test/results/clientpositive/load_hdfs_file_with_space_in_the_name.q.out
        Show
        hudson Hudson added a comment - Integrated in Hive-trunk-h0.21 #2131 (See https://builds.apache.org/job/Hive-trunk-h0.21/2131/ ) HIVE-4554 : Failed to create a table from existing file if file path has spaces (Xuefu Zhang via Ashutosh Chauhan) (Revision 1490101) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1490101 Files : /hive/trunk/build-common.xml /hive/trunk/data/files/person age.txt /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java /hive/trunk/ql/src/test/queries/clientpositive/load_file_with_space_in_the_name.q /hive/trunk/ql/src/test/queries/clientpositive/load_hdfs_file_with_space_in_the_name.q /hive/trunk/ql/src/test/results/clientpositive/load_file_with_space_in_the_name.q.out /hive/trunk/ql/src/test/results/clientpositive/load_hdfs_file_with_space_in_the_name.q.out
        Hide
        hudson Hudson added a comment -

        Integrated in Hive-trunk-hadoop2 #227 (See https://builds.apache.org/job/Hive-trunk-hadoop2/227/)
        HIVE-4554 : Failed to create a table from existing file if file path has spaces (Xuefu Zhang via Ashutosh Chauhan) (Revision 1490101)

        Result = FAILURE
        hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1490101
        Files :

        • /hive/trunk/build-common.xml
        • /hive/trunk/data/files/person age.txt
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java
        • /hive/trunk/ql/src/test/queries/clientpositive/load_file_with_space_in_the_name.q
        • /hive/trunk/ql/src/test/queries/clientpositive/load_hdfs_file_with_space_in_the_name.q
        • /hive/trunk/ql/src/test/results/clientpositive/load_file_with_space_in_the_name.q.out
        • /hive/trunk/ql/src/test/results/clientpositive/load_hdfs_file_with_space_in_the_name.q.out
        Show
        hudson Hudson added a comment - Integrated in Hive-trunk-hadoop2 #227 (See https://builds.apache.org/job/Hive-trunk-hadoop2/227/ ) HIVE-4554 : Failed to create a table from existing file if file path has spaces (Xuefu Zhang via Ashutosh Chauhan) (Revision 1490101) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1490101 Files : /hive/trunk/build-common.xml /hive/trunk/data/files/person age.txt /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java /hive/trunk/ql/src/test/queries/clientpositive/load_file_with_space_in_the_name.q /hive/trunk/ql/src/test/queries/clientpositive/load_hdfs_file_with_space_in_the_name.q /hive/trunk/ql/src/test/results/clientpositive/load_file_with_space_in_the_name.q.out /hive/trunk/ql/src/test/results/clientpositive/load_hdfs_file_with_space_in_the_name.q.out
        Hide
        ashutoshc Ashutosh Chauhan added a comment -

        Committed to trunk. Thanks, Xuefu!

        Show
        ashutoshc Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Xuefu!
        Hide
        xuefuz Xuefu Zhang added a comment -

        Patch is updated with the following change made to the test case in TestMinimrCliDriver:

        From:

        dfs -rmr hdfs:///tmp/test/load_file_with_space_in_the_name;

        to:

        dfs -rmr hdfs:///tmp/test;

        which should get ride of /tmp/test, allowing other test cases to create it again.

        Show
        xuefuz Xuefu Zhang added a comment - Patch is updated with the following change made to the test case in TestMinimrCliDriver: From: dfs -rmr hdfs:///tmp/test/load_file_with_space_in_the_name; to: dfs -rmr hdfs:///tmp/test; which should get ride of /tmp/test, allowing other test cases to create it again.
        Hide
        ashutoshc Ashutosh Chauhan added a comment -

        TestMinimrCliDriver.schemeAuthority.q fails with exception {{ mkdir: cannot create directory hdfs:///tmp/test: File exists }} I think if you modify last line of your test to do dfs -rmr hdfs:///tmp/test that should be sufficient.

        Show
        ashutoshc Ashutosh Chauhan added a comment - TestMinimrCliDriver.schemeAuthority.q fails with exception {{ mkdir: cannot create directory hdfs:///tmp/test: File exists }} I think if you modify last line of your test to do dfs -rmr hdfs:///tmp/test that should be sufficient.
        Hide
        ashutoshc Ashutosh Chauhan added a comment -

        Thanks, Xuefu for testing that.
        +1 will commit if tests pass.

        Show
        ashutoshc Ashutosh Chauhan added a comment - Thanks, Xuefu for testing that. +1 will commit if tests pass.
        Hide
        xuefuz Xuefu Zhang added a comment -

        Patch is updated with new test case for loading HDFS file with special character (space) in the file name to a table.

        Show
        xuefuz Xuefu Zhang added a comment - Patch is updated with new test case for loading HDFS file with special character (space) in the file name to a table.
        Hide
        ashutoshc Ashutosh Chauhan added a comment -

        Comment on RB.

        Show
        ashutoshc Ashutosh Chauhan added a comment - Comment on RB.
        Hide
        xuefuz Xuefu Zhang added a comment -

        HIVE-4554.patch.3 is the same as HIVE-4554.patch.2 except that it includs the data input file for the new test case which was missing.

        All test case passed.

        RB request is here: https://reviews.apache.org/r/11335/

        Show
        xuefuz Xuefu Zhang added a comment - HIVE-4554 .patch.3 is the same as HIVE-4554 .patch.2 except that it includs the data input file for the new test case which was missing. All test case passed. RB request is here: https://reviews.apache.org/r/11335/
        Hide
        xuefuz Xuefu Zhang added a comment -

        Thank you, Ashutosh.

        I have updated the batch using URIUtil.decode() to decode the special characters in the path instead of using getPath(), which does the decoding but causes the problems as you mentioned.

        Show
        xuefuz Xuefu Zhang added a comment - Thank you, Ashutosh. I have updated the batch using URIUtil.decode() to decode the special characters in the path instead of using getPath(), which does the decoding but causes the problems as you mentioned.
        Hide
        ashutoshc Ashutosh Chauhan added a comment -

        Canceling patch since there are review comments.

        Show
        ashutoshc Ashutosh Chauhan added a comment - Canceling patch since there are review comments.
        Hide
        ashutoshc Ashutosh Chauhan added a comment -

        Few comments:

        • In EximUtil.java changing relativeToAbsolutePath to return path instead of uri may not be a good idea. This function is also used by SemanticAnalyzer::createTable which uses it to get absolute location and then stores in metastore. If this is changed to return only path component we will loose scheme and authority.
        • Similarly LoadSemanticAnalyzer, line 149 its better to use toString, instead of path in error message, since location could be remote fs and hence scheme, authority, port etc. are useful info in the error message.
        • At line 261, uri is not necessarily of local fs, so other components of url needs to be preserved, we shall use url here also.

        I believe your test case will pass if you revert these changes and keep other changes. Can you test? Also, it will be better to create phabricator or RB entry for easier review.

        Show
        ashutoshc Ashutosh Chauhan added a comment - Few comments: In EximUtil.java changing relativeToAbsolutePath to return path instead of uri may not be a good idea. This function is also used by SemanticAnalyzer::createTable which uses it to get absolute location and then stores in metastore. If this is changed to return only path component we will loose scheme and authority. Similarly LoadSemanticAnalyzer, line 149 its better to use toString, instead of path in error message, since location could be remote fs and hence scheme, authority, port etc. are useful info in the error message. At line 261, uri is not necessarily of local fs, so other components of url needs to be preserved, we shall use url here also. I believe your test case will pass if you revert these changes and keep other changes. Can you test? Also, it will be better to create phabricator or RB entry for easier review.
        Hide
        xuefuz Xuefu Zhang added a comment -

        Updated the patch and added a test case.

        Show
        xuefuz Xuefu Zhang added a comment - Updated the patch and added a test case.
        Hide
        xuefuz Xuefu Zhang added a comment -

        Patch attempting to fix the issue.

        Show
        xuefuz Xuefu Zhang added a comment - Patch attempting to fix the issue.

          People

          • Assignee:
            xuefuz Xuefu Zhang
            Reporter:
            xuefuz Xuefu Zhang
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development