Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3867

Store relative paths in metadata file

    XMLWordPrintableJSON

Details

    Description

      git.commit.id.abbrev=cf4f745
      git.commit.time=29.09.2015 @ 23\:19\:52 UTC

      The below sequence of steps reproduces the issue

      1. Create the cache file

      0: jdbc:drill:zk=10.10.103.60:5181> refresh table metadata dfs.`/drill/testdata/metadata_caching/lineitem`;
      +-------+-------------------------------------------------------------------------------------+
      |  ok   |                                       summary                                       |
      +-------+-------------------------------------------------------------------------------------+
      | true  | Successfully updated metadata for table /drill/testdata/metadata_caching/lineitem.  |
      +-------+-------------------------------------------------------------------------------------+
      1 row selected (1.558 seconds)
      

      2. Move the directory

      hadoop fs -mv /drill/testdata/metadata_caching/lineitem /drill/
      

      3. Now run a query on top of it

      0: jdbc:drill:zk=10.10.103.60:5181> select * from dfs.`/drill/lineitem` limit 1;
      Error: SYSTEM ERROR: FileNotFoundException: Requested file maprfs:///drill/testdata/metadata_caching/lineitem/2006/1 does not exist.
      
      
      [Error Id: b456d912-57a0-4690-a44b-140d4964903e on pssc-66.qa.lab:31010] (state=,code=0)
      

      This is obvious given the fact that we are storing absolute file paths in the cache file.

      Summary description of the fix:

      In Drill 1.11 and later, Drill stores the paths to the Parquet files as relative paths instead of absolute paths. You can move partitioned Parquet directories from one location in the distributed files system to another without issuing the REFRESH TABLE METADATA command to rebuild the Parquet metadata files; the metadata remains valid in the new location.

      Note

      Reverting back to a previous version of Drill from 1.11 is not recommended because Drill will incorrectly interpret the Parquet metadata files created by Drill 1.11. Should this occur, remove the Parquet metadata files and run the refresh table metadata command to rebuild the files in the older format.

      Attachments

        Issue Links

          Activity

            People

              vitalii Vitalii Diravka
              rkins Rahul Kumar Challapalli
              Paul Rogers Paul Rogers
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: