Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-873

Collector closes thrift server without visible reason

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Won't Fix
    • 0.9.4
    • 0.9.5
    • Sinks+Sources
    • None
    • RHEL 5

    Description

      We have
      2 agent nodes
      1 collector
      1 master
      8 flows

      First all of those works fine and thrift servers are started for each flow:
      grep "Starting blocking thread pool server on port" flume-flume-node-server6.log.2011-11-29
      2011-11-29 13:21:24,562 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35855...
      2011-11-29 13:21:54,572 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35857...
      2011-11-29 13:22:24,581 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35853...
      2011-11-29 13:22:54,589 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35858...
      2011-11-29 13:23:24,597 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35860...
      2011-11-29 13:23:54,607 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35859...
      2011-11-29 13:24:24,615 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35854...
      2011-11-29 13:24:54,625 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35856...

      At some point after start two of those has stopped without any visible reason:
      flume-flume-node-server6.log.2011-12-01:2011-12-01 17:27:15,523 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35858...
      flume-flume-node-server6.log.2011-12-05:2011-12-05 02:50:56,748 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35857...

      And thus stopping data flow from two of the sources.

      Log collection was configured like this (for both aa and bb) using "flume
      shell -c server5 -s flume-aa.txt":

      cat flume-aa.txt
      exec map server3 aa-agent-http-fe-1
      exec map server3 aa-agent-http-fe-2
      exec map server3 aa-agent-https-fe-1
      exec map server3 aa-agent-https-fe-2
      exec map server3 aa-agent-http-error-fe-1
      exec map server3 aa-agent-http-error-fe-2
      exec map server3 aa-agent-https-error-fe-1
      exec map server3 aa-agent-https-error-fe-2

      exec map server6 aa-collector-http-fe
      exec map server6 aa-collector-https-fe
      exec map server6 aa-collector-http-error-fe
      exec map server6 aa-collector-https-error-fe

      1. HTTP
        exec config aa-agent-http-fe-1 aa-flow-http-fe
        'tailDir("/logs/aa/httpd-fe-1/", "aa_access_log-
        d {4}-
        d{2}-
        d{2}$",
        true)' autoE2EChain
        exec config aa-agent-http-fe-2 aa-flow-http-fe
        'tailDir("/logs/aa/httpd-fe-2/", "aa_access_log-
        d{4}

        -
        d

        {2}-
        d{2}

        $",
        true)' autoE2EChain

      exec config aa-collector-http-fe aa-flow-http-fe autoCollectorSource
      'collectorSink("hdfs://hfds-server:8020/flume/aa/httpd-fe/%Y-%m-%d/",
      "%

      {host}access")'

      # HTTPS
      exec config aa-agent-https-fe-1 aa-flow-https-fe
      'tailDir("/logs/aa/httpd-fe-1/", "ssl-aa_access_log-\\d{4}-
      d{2}-
      d{2}$",
      true)' autoE2EChain
      exec config aa-agent-https-fe-2 aa-flow-https-fe
      'tailDir("/logs/aa/httpd-fe-2/", "ssl-aa_access_log-
      d{4}-
      d{2}-
      d{2}$",
      true)' autoE2EChain

      exec config aa-collector-https-fe aa-flow-https-fe autoCollectorSource
      'collectorSink("hdfs://hdfs-server:8020/flume/aa/httpd-fe/%Y-%m-%d/",
      "%{host}

      ssl-access")'

      1. HTTP ERROR
        exec config aa-agent-http-error-fe-1 aa-flow-http-error-fe
        'tailDir("/logs/aa/httpd-fe-1/", "aa_error_log-
        d {4}-
        d{2}-
        d{2}$",
        true)' autoE2EChain
        exec config aa-agent-http-error-fe-2 aa-flow-http-error-fe
        'tailDir("/logs/aa/httpd-fe-2/", "aa_error_log-
        d{4}

        -
        d

        {2}-
        d{2}

        $",
        true)' autoE2EChain
        exec config aa-collector-http-error-fe aa-flow-http-error-fe
        autoCollectorSource
        'collectorSink("hdfs://hdfs-server:8020/flume/aa/httpd-fe/%Y-%m-%d/",
        "%

        {host}error")'

        # HTTPS ERROR
        exec config aa-agent-https-error-fe-1 aa-flow-https-error-fe
        'tailDir("/logs/aa/httpd-fe-1/", "ssl-aa_error_log-\\d{4}-
        d{2}-
        d{2}$",
        true)' autoE2EChain
        exec config aa-agent-https-error-fe-2 aa-flow-https-error-fe
        'tailDir("/logs/aa/httpd-fe-2/", "ssl-aa_error_log-
        d{4}-
        d{2}-
        d{2}$",
        true)' autoE2EChain
        exec config aa-collector-https-error-fe aa-flow-https-error-fe
        autoCollectorSource
        'collectorSink("hdfs://hdfs-server:8020/flume/aa/httpd-fe/%Y-%m-%d/",
        "%{host}

        ssl-error")'

      waitForNodesActive 0 aa-agent-http-fe-1 aa-agent-http-fe-2
      aa-agent-https-fe-1 aa-agent-https-fe-2 aa-agent-http-error-fe-1
      aa-agent-http-error-fe-2 aa-agent-https-error-fe-1
      aa-agent-https-error-fe-2 aa-collector-http-fe aa-collector-https-fe
      aa-collector-http-error-fe aa-collector-https-error-fe

      exec refreshAll

      A bit more info from email thread:
      http://mail-archives.apache.org/mod_mbox/incubator-flume-user/201111.mbox/%3CCAOwicogsR_PZ3fY4TnRGOWwLwmjATJZ5LkMLTKrAbc74Ce5PKw@mail.gmail.com%3E

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            lossil Ossi L
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment