Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2281

Support non-default FileSystem

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.12
    • Fix Version/s: 1.14
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      If a path (input or output) does not belong to the configured default FileSystem various Nutch tools may raise an exception like

        Exception in ... java.lang.IllegalArgumentException: Wrong FS: s3a://..., expected: hdfs://...
      

      This is fixed by getting a reference to the FileSystem from the Path object

        FileSystem fs = path.getFileSystem(getConf());
      

      instead of

        FileSystem fs = FileSystem.get(getConf());
      

      A given path (e.g., s3a://...) may not belong to the default file system (hdfs:// or file:// in local mode) and simple checks such as fs.exists(path) then will fail. Cf. FileSystem.checkPath(path), and FileSystem.get(conf) vs. FileSystem.get(URI,conf) which is called by Path.getFileSystem(conf).
      Note that the FileSystem for input and output may be different, e.g., read from HDFS and write to S3.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                snagel Sebastian Nagel
                Reporter:
                snagel Sebastian Nagel
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: