Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2281

Support non-default FileSystem

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.12
    • 1.14
    • None
    • None
    • Patch Available

    Description

      If a path (input or output) does not belong to the configured default FileSystem various Nutch tools may raise an exception like

        Exception in ... java.lang.IllegalArgumentException: Wrong FS: s3a://..., expected: hdfs://...
      

      This is fixed by getting a reference to the FileSystem from the Path object

        FileSystem fs = path.getFileSystem(getConf());
      

      instead of

        FileSystem fs = FileSystem.get(getConf());
      

      A given path (e.g., s3a://...) may not belong to the default file system (hdfs:// or file:// in local mode) and simple checks such as fs.exists(path) then will fail. Cf. FileSystem.checkPath(path), and FileSystem.get(conf) vs. FileSystem.get(URI,conf) which is called by Path.getFileSystem(conf).
      Note that the FileSystem for input and output may be different, e.g., read from HDFS and write to S3.

      Attachments

        Issue Links

          Activity

            People

              snagel Sebastian Nagel
              snagel Sebastian Nagel
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: