Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-220

Crunch not working with S3

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.6.0
    • 0.7.0
    • IO
    • None
    • Cloudera Hadoop with Amazon S3

    Description

      I am trying to use crunch to read file from S3 and write to S3. I am able to read the file .But giving an error while writing to s3. Not sure if it is a bug or I am missing a hadoop configuration. I am able to read from s3 and write to a local file or hdfs directly. Here is the code and error. I am passing s3 key and secret as parameters.

      PCollection<String> lines =pipeline.read(From.sequenceFile(inputdir, Writables.strings()));

      PCollection<String> textline = lines.parallelDo(new DoFn<String, String>() {
      public void process(String line, Emitter<String> emitter) {
      if (headerNotWritten)

      { //emitter.emit("Writing Header"); emitter.emit(table_header.getTable_header()); emitter.emit(line); headerNotWritten =false; }

      else

      { emitter.emit(line); }

      }
      }, Writables.strings()); // Indicates the serialization format

      pipeline.writeTextFile(textline, outputdir);

      Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: s3n://bktname/testcsv, expected: hdfs://ip-address.compute.internal
      [ip-addresscompute.amazonaws.com] out: at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:410)
      [ip-address-82.eu-west-1.compute.amazonaws.com] out: at org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106)
      [ip-address-82.eu-west-1.compute.amazonaws.com] out: at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162)
      [ip-address-82.eu-west-1.compute.amazonaws.com] out: at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:558)
      [ip-address-82.eu-west-1.compute.amazonaws.com] out: at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:797)
      [ip-address-82.eu-west-1.compute.amazonaws.com] out: at org.apache.crunch.io.impl.FileTargetImpl.handleExisting(FileTargetImpl.java:133)
      [ip-address-82.eu-west-1.compute.amazonaws.com] out: at org.apache.crunch.impl.mr.MRPipeline.write(MRPipeline.java:212)
      [ip-address-82.eu-west-1.compute.amazonaws.com] out: at org.apache.crunch.impl.mr.MRPipeline.write(MRPipeline.java:200)
      [ip-address-82.eu-west-1.compute.amazonaws.com] out: at org.apache.crunch.impl.mr.collect.PCollectionImpl.write(PCollectionImpl.java:132)
      [ec2-79-125-102-82.eu-west-1.compute.amazonaws.com] out: at org.apache.crunch.impl.mr.MRPipeline.writeTextFile(MRPipeline.java:356)

      Attachments

        1. CRUNCH-220.patch
          1 kB
          Josh Wills
        2. CRUNCH-220.patch
          0.7 kB
          Deepak Subhramanian

        Activity

          People

            jwills Josh Wills
            deepakas Deepak Subhramanian
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: