Flume
  1. Flume
  2. FLUME-330

Collector outputing to S3 sink fails and doesn't recover when S3 returns consecutive 404 status code

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: v0.9.1u1
    • Fix Version/s: None
    • Component/s: Node
    • Labels:
      None
    • Environment:

      Ubuntu 9.10 x86

      $ flume version
      Flume 0.9.1+29
      Git repository
      rev 1b753ff62e62a05ca9ac9f3ee8782b5eb8c48da7
      Compiled by bruno on Sun Oct 10 15:25:07 PDT 2010

      Description

      My collector is sourced to autoCollectorSource and sinked to S3N. The collector occasionally fails and doesn't recover until I restart the node. Looking at the agents, they all have backed up logged messages. Once the consumer node is restarted, the agents resume normally and the logged messages slowly deplete. This happens at least a couple times a week.

      Below is relevant log output:
      2010-11-08 09:25:47,451 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/logs%2Fredirector%2Finsertdate%3D2010-11-08%2Finserthour%3D00%2Fweblog-domU-12-31-39-01-5D-24.compute-1.internal-log.00000018.20101108-092546510%2B0000.2386949180521782.seq.gz' - Unexpected response code 404, expected 200
      2010-11-08 09:25:47,470 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/logs%2Fredirector%2Finsertdate%3D2010-11-08%2Finserthour%3D00%2Fweblog-domU-12-31-39-01-5D-24.compute-1.internal-log.00000018.20101108-092546510%2B0000.2386949180521782.seq.gz_%24folder%24' - Unexpected response code 404, expected 200

        Issue Links

          Activity

          Hide
          Viksit Gaur added a comment -

          I can confirm that this issue still exists in Flume 1.4.0-bin

          Hadoop versions
          rw-rw-r- 1 ubuntu ubuntu 279781 Dec 17 22:24 commons-httpclient-3.0.1.jar
          rw-rw-r- 1 ubuntu ubuntu 58160 Dec 17 22:24 commons-codec-1.4.jar
          rw-rw-r- 1 ubuntu ubuntu 321806 Dec 17 22:24 jets3t-0.6.1.jar
          rw-rw-r- 1 ubuntu ubuntu 4203147 Dec 17 22:24 hadoop-core-1.2.1.jar

          Show
          Viksit Gaur added a comment - I can confirm that this issue still exists in Flume 1.4.0-bin Hadoop versions rw-rw-r - 1 ubuntu ubuntu 279781 Dec 17 22:24 commons-httpclient-3.0.1.jar rw-rw-r - 1 ubuntu ubuntu 58160 Dec 17 22:24 commons-codec-1.4.jar rw-rw-r - 1 ubuntu ubuntu 321806 Dec 17 22:24 jets3t-0.6.1.jar rw-rw-r - 1 ubuntu ubuntu 4203147 Dec 17 22:24 hadoop-core-1.2.1.jar
          Hide
          flume_noizwaves added a comment -

          Are there any other known fixes for this issue aside from replacing the hadoop core with emr-hadoop-core-0.20.jar?

          I am using flume 0.9.4 and LZO compression (instructions at http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/) and S3 sinks. Using emr-hadoop-core-0.20.jar causes this error during node startup:

          2011-07-10 14:54:07,770 [logicalNode buntu-19] INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
          2011-07-10 14:54:07,772 [logicalNode buntu-19] WARN lzo.LzoCompressor: java.lang.NoSuchFieldError: workingMemoryBuf
          2011-07-10 14:54:07,772 [logicalNode buntu-19] ERROR lzo.LzoCodec: Failed to load/initialize native-lzo library

          When using the included CDH build of hadoop-core, I get the 404 status errors and files uploaded to S3 that are of incorrect size. Instead of all being ~3.5mb, every odd file is 300bytes and even file is ~7mb.

          Thanks in advance

          Show
          flume_noizwaves added a comment - Are there any other known fixes for this issue aside from replacing the hadoop core with emr-hadoop-core-0.20.jar? I am using flume 0.9.4 and LZO compression (instructions at http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/ ) and S3 sinks. Using emr-hadoop-core-0.20.jar causes this error during node startup: 2011-07-10 14:54:07,770 [logicalNode buntu-19] INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 2011-07-10 14:54:07,772 [logicalNode buntu-19] WARN lzo.LzoCompressor: java.lang.NoSuchFieldError: workingMemoryBuf 2011-07-10 14:54:07,772 [logicalNode buntu-19] ERROR lzo.LzoCodec: Failed to load/initialize native-lzo library When using the included CDH build of hadoop-core, I get the 404 status errors and files uploaded to S3 that are of incorrect size. Instead of all being ~3.5mb, every odd file is 300bytes and even file is ~7mb. Thanks in advance
          Hide
          Disabled imported user added a comment -

          I am running a 0.9.3 build from the git master branch as of 2/5/2011. I have not yet upgraded to 0.9.3-RC0.

          As a side note, I'm not sure this is a problem other than the weird log message - the warnings occur before creating the file in S3, not at the end, as I previously believed.

          Show
          Disabled imported user added a comment - I am running a 0.9.3 build from the git master branch as of 2/5/2011. I have not yet upgraded to 0.9.3-RC0. As a side note, I'm not sure this is a problem other than the weird log message - the warnings occur before creating the file in S3, not at the end, as I previously believed.
          Hide
          Disabled imported user added a comment -

          @Aaron, what version of flume are you using? The 404 issue could quite possibly still exist in the 0.9.1 branch.

          Show
          Disabled imported user added a comment - @Aaron, what version of flume are you using? The 404 issue could quite possibly still exist in the 0.9.1 branch.
          Hide
          Disabled imported user added a comment -

          I am still encountering the 404s, despite using the EMR version of hadoop-core.

          Show
          Disabled imported user added a comment - I am still encountering the 404s, despite using the EMR version of hadoop-core.
          Hide
          Jonathan Hsieh added a comment -

          Is this still an issue? Based on Patrick's comment, I will mark this "not a bug" in a few days unless I here anything suggesting that I shouldn't.

          Show
          Jonathan Hsieh added a comment - Is this still an issue? Based on Patrick's comment, I will mark this "not a bug" in a few days unless I here anything suggesting that I shouldn't.
          Hide
          Disabled imported user added a comment - - edited

          I've switched out my deployment of flume and everything seems to be working fine now. Key things to note:

          • I'm running on alestic's Ubuntu 10.04 LTS image on Amazon EC2.
          • I've created my own flume 0.9.2 debian package based off of the current one; the code for my package was pulled from the v0.9.2 tag @ github.
          • I've replaced cloudera's hadoop-core.jar with one that is deployed on Amazon EMR. Something in the current release of cloudera's hadoop-core.jar is not working with S3. (cloudera's hadoop package installs the jars @ /usr/lib/hadoop)

          Since I made these changes, the 404 errors have disappeared. You might want to look at the comments of https://issues.cloudera.org/browse/FLUME-66 for more information.

          So to sum things up, there is no inherent issue with flume v0.9.2 sinking to S3; an issue with the hadoop-core.jar that cloudera distributes is incompatible with S3.

          Show
          Disabled imported user added a comment - - edited I've switched out my deployment of flume and everything seems to be working fine now. Key things to note: I'm running on alestic's Ubuntu 10.04 LTS image on Amazon EC2. I've created my own flume 0.9.2 debian package based off of the current one; the code for my package was pulled from the v0.9.2 tag @ github. I've replaced cloudera's hadoop-core.jar with one that is deployed on Amazon EMR. Something in the current release of cloudera's hadoop-core.jar is not working with S3. (cloudera's hadoop package installs the jars @ /usr/lib/hadoop) Since I made these changes, the 404 errors have disappeared. You might want to look at the comments of https://issues.cloudera.org/browse/FLUME-66 for more information. So to sum things up, there is no inherent issue with flume v0.9.2 sinking to S3; an issue with the hadoop-core.jar that cloudera distributes is incompatible with S3.
          Hide
          Disabled imported user added a comment -

          I am also experiencing this issue. I am receiving the following in my logs.

          2011-02-03 13:29:22,105 INFO com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink: Opening s3n://aarontest/user-events-log.00000031.20110203-132922104-0500.26083711009449194.seq
          2011-02-03 13:29:22,117 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/user-events-log.00000031.20110203-132922104-0500.26083711009449194.seq.gz' - Unexpected response code 404, expected 200
          2011-02-03 13:29:22,129 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/user-events-log.00000031.20110203-132922104-0500.26083711009449194.seq.gz_%24folder%24' - Unexpected response code 404, expected 200
          2011-02-03 13:29:22,149 INFO com.cloudera.flume.handlers.hdfs.CustomDfsSink: Opening HDFS file: s3n://aarontest/user-events-log.00000031.20110203-132922104-0500.26083711009449194.seq.gz

          The file is successfully created in s3, however.

          My sink is defined as collectorSink("s3n://aarontest","user-events-")

          Show
          Disabled imported user added a comment - I am also experiencing this issue. I am receiving the following in my logs. 2011-02-03 13:29:22,105 INFO com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink: Opening s3n://aarontest/user-events-log.00000031.20110203-132922104-0500.26083711009449194.seq 2011-02-03 13:29:22,117 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/user-events-log.00000031.20110203-132922104-0500.26083711009449194.seq.gz' - Unexpected response code 404, expected 200 2011-02-03 13:29:22,129 WARN org.jets3t.service.impl.rest.httpclient.RestS3Service: Response '/user-events-log.00000031.20110203-132922104-0500.26083711009449194.seq.gz_%24folder%24' - Unexpected response code 404, expected 200 2011-02-03 13:29:22,149 INFO com.cloudera.flume.handlers.hdfs.CustomDfsSink: Opening HDFS file: s3n://aarontest/user-events-log.00000031.20110203-132922104-0500.26083711009449194.seq.gz The file is successfully created in s3, however. My sink is defined as collectorSink("s3n://aarontest","user-events-")
          Hide
          Disabled imported user added a comment -

          I can confirm that this also occurs for our use case, and s3 returns 404 very frequently which makes s3 sink effectively useless.

          Show
          Disabled imported user added a comment - I can confirm that this also occurs for our use case, and s3 returns 404 very frequently which makes s3 sink effectively useless.

            People

            • Assignee:
              Unassigned
              Reporter:
              Disabled imported user
            • Votes:
              3 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development