Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-112

copyFromLocal should exclude .crc files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 0.1.0
    • None
    • None
    • DFS cluster of 6 3hz Xeons with 2Gb RAM running Centos 4.2 and Sun's JDK1.5 - but Probably applies in any environment

    Description

      Doug Cutting says: "The problem is that when copyFromLocal
      enumerates local files it should exclude .crc files, but it does not.
      This is the listFiles() call on DistributedFileSystem:160. It should
      filter this, excluding files that are FileSystem.isChecksumFile().

      BTW, as a workaround, it is safe to first remove all of the .crc files,
      but your files will no longer be checksummed as they are read. On
      systems without ECC memory file corruption is not uncommon, but I have
      seen very little on clusters that have ECC."

      Original observations:

      Hello Team,

      I created a backup of my DFS database:

      1. bin/hadoop dfs -copyToLocal /user/root/crawl /mylocaldir

      I now want to restore from the backup using:

      1. bin/hadoop dfs -copyFromLocal /mylocaldir/crawl /user/root

      However I'm getting the following error:

      copyFromLocal: Target /user/root/crawl/crawldb/current/part-00000/.data.crc
      already exists

      I get this message with every permutation of the command that I've tried, and
      even after totally deleting all content in the DFS directories.

      I'd be grateful for any pointers.

      Many thanks,

      Attachments

        Activity

          People

            cutting Doug Cutting
            monu.ogbe Monu Ogbe
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: