Flume
  1. Flume
  2. FLUME-1037

NETCAT handler theads terminate under stress test

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: v1.1.0
    • Fix Version/s: v1.2.0
    • Component/s: Sinks+Sources
    • Labels:
      None
    • Environment:

      [CentOS 6.2 64-bit]

      Description

      Steps:

      1. Use a props file such as the following:
      # a = agent
      # r = source
      # c = channel
      # k = sink
      a1.sources = r1
      a1.channels = c1
      a1.sinks = k1
      # ===SOURCES===
      a1.sources.r1.type = NETCAT
      a1.sources.r1.channels = c1
      a1.sources.r1.bind = localhost
      a1.sources.r1.port = 1473
      # ===CHANNELS===
      a1.channels.c1.type = MEMORY
      # ===SINKS===
      a1.sinks.k1.type = NULL
      a1.sinks.k1.channel = c1

      2. Set the FLUME_CONF_DIR to point to your conf dir
      [will@localhost flume-1.2.0-incubating-SNAPSHOT]$ export FLUME_CONF_DIR=/home/will/git/apache/flume/flume-1.2.0-incubating-SNAPSHOT/conf

      3. Create a flume-env.sh file
      [will@localhost flume-1.2.0-incubating-SNAPSHOT]$ cp conf/flume-env.sh.template conf/flume-env.sh

      4. Adjust the memory size within flume-env.sh (this file will be automatically sourced when calling bin/flume-ng, but only if you've specified the FLUME_CONF_DIR env var)
      (here, I went to the extreme and I set the min and max heap to 1GB. I also specified a YourKit profiler agent)
      Sample contents of flume-env.sh:
      export JAVA_OPTS="-Xms1024m -Xmx1024m -agentpath:/home/will/tools/yjp-10.0.6/bin/linux-x86-64/libyjpagent.so=tracing,noj2ee"

      5. Run the flume NG agent:
      bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1

      6. Open-up 10 terminal windows (on the same host) to connect to the netcat server port. Sent continuous output in each terminal. I chose to use the command:
      yes | nc localhost 1473
      The "yes" unix cmd will continuously output 'y' char, followed by newline char. If you use YourKit and go into the Threads view, you'll see that after a while (possibly need to wait up to 10 mins) after a netcat handler thread has continuously been alternating between Runnable and Blocked states (blocking due to org.apache.klog4j.Category.log(..), but that's beside the point), the netcat handler thread enters a continuous wait state for exactly 1 minute, and then terminates (while its associated 'yes | nc localhost 1473' command is still running).

      I haven't done further analysis. My first thought was a thread safety issue. Note that there are no property file reconfigurations done during this test – I leave the props file alone.

      I welcome your ideas/comments. I initially ran this test with the default -Xmx20m but it ran out of memory. For a future test I might lower the Xmx/Xms from 1GB to maybe 128MB.

        Activity

        Will McQueen created issue -
        Will McQueen made changes -
        Field Original Value New Value
        Comment [ Note that after waiting long enough, all 10 threads suffered the same fate of 1-minute waiting period, followed by termination. ]
        Will McQueen made changes -
        Description Steps:

        1. Use a props file such as the following:
        # a = agent
        # r = source
        # c = channel
        # k = sink
        a1.sources = r1
        a1.channels = c1
        a1.sinks = k1
        # ===SOURCES===
        a1.sources.r1.type = NETCAT
        a1.sources.r1.channels = c1
        a1.sources.r1.bind = localhost
        a1.sources.r1.port = 1473
        # ===CHANNELS===
        a1.channels.c1.type = MEMORY
        # ===SINKS===
        a1.sinks.k1.type = NULL
        a1.sinks.k1.channel = c1

        2. Set the FLUME_CONF_DIR to point to your conf dir
        [will@localhost flume-1.2.0-incubating-SNAPSHOT]$ export FLUME_CONF_DIR=/home/will/git/apache/flume/flume-1.2.0-incubating-SNAPSHOT/conf

        3. Create a flume-env.sh file
        [will@localhost flume-1.2.0-incubating-SNAPSHOT]$ cp conf/flume-env.sh.template conf/flume-env.sh

        4. Adjust the memory size within flume-env.sh (this file will be automatically sourced when calling bin/flume-ng, but only if you've specified the FLUME_CONF_DIR env var)
        (here, I went to the extreme and I set the min and max heap to 1GB. I also specified a YourKit profiler agent)
        Sample contents of flume-env.sh:
        export JAVA_OPTS="-Xms1024m -Xmx1024m -agentpath:/home/will/tools/yjp-10.0.6/bin/linux-x86-64/libyjpagent.so=tracing,noj2ee"

        5. Run the flume NG agent:
        bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1

        6. Open-up 10 terminal windows (on the same host) to connect to the netcat server port. Sent continuous output in each terminal. I chose to use the command:
             yes | nc localhost 1473
        The "yes" unix cmd will continuously output 'y' char, followed by newline char. If you use YourKit and go into the Threads view, you'll see that after a while (possibly need to wait up to 10 mins) after a netcat handler thread has continuously been alternating between Runnable and Blocked states (blocking due to org.apache.klog4j.Category.log(..), but that's beside the point), the netcat handler thread enters a continuous wait state for exactly 1 minute, and then terminates (while its associated 'yes | nc localhost 1473' command is still running).

        I haven't done further analysis. My first thought was a thread safety issue. Note that there are no property file reconfigurations done during this test -- I leave the props file alone.

        I welcome your ideas/comments. I initially ran this test with the default -Xmx20m but it ran out of memory. For a future test I might lower the Xmx/Xms from 1GB to maybe 128MB.
        Steps:

        1. Use a props file such as the following:
        \# a = agent
        \# r = source
        \# c = channel
        \# k = sink
        a1.sources = r1
        a1.channels = c1
        a1.sinks = k1
        \# ===SOURCES===
        a1.sources.r1.type = NETCAT
        a1.sources.r1.channels = c1
        a1.sources.r1.bind = localhost
        a1.sources.r1.port = 1473
        \# ===CHANNELS===
        a1.channels.c1.type = MEMORY
        # ===SINKS===
        a1.sinks.k1.type = NULL
        a1.sinks.k1.channel = c1

        2. Set the FLUME_CONF_DIR to point to your conf dir
        [will@localhost flume-1.2.0-incubating-SNAPSHOT]$ export FLUME_CONF_DIR=/home/will/git/apache/flume/flume-1.2.0-incubating-SNAPSHOT/conf

        3. Create a flume-env.sh file
        [will@localhost flume-1.2.0-incubating-SNAPSHOT]$ cp conf/flume-env.sh.template conf/flume-env.sh

        4. Adjust the memory size within flume-env.sh (this file will be automatically sourced when calling bin/flume-ng, but only if you've specified the FLUME_CONF_DIR env var)
        (here, I went to the extreme and I set the min and max heap to 1GB. I also specified a YourKit profiler agent)
        Sample contents of flume-env.sh:
        export JAVA_OPTS="-Xms1024m -Xmx1024m -agentpath:/home/will/tools/yjp-10.0.6/bin/linux-x86-64/libyjpagent.so=tracing,noj2ee"

        5. Run the flume NG agent:
        bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1

        6. Open-up 10 terminal windows (on the same host) to connect to the netcat server port. Sent continuous output in each terminal. I chose to use the command:
             yes | nc localhost 1473
        The "yes" unix cmd will continuously output 'y' char, followed by newline char. If you use YourKit and go into the Threads view, you'll see that after a while (possibly need to wait up to 10 mins) after a netcat handler thread has continuously been alternating between Runnable and Blocked states (blocking due to org.apache.klog4j.Category.log(..), but that's beside the point), the netcat handler thread enters a continuous wait state for exactly 1 minute, and then terminates (while its associated 'yes | nc localhost 1473' command is still running).

        I haven't done further analysis. My first thought was a thread safety issue. Note that there are no property file reconfigurations done during this test -- I leave the props file alone.

        I welcome your ideas/comments. I initially ran this test with the default -Xmx20m but it ran out of memory. For a future test I might lower the Xmx/Xms from 1GB to maybe 128MB.
        Will McQueen made changes -
        Description Steps:

        1. Use a props file such as the following:
        \# a = agent
        \# r = source
        \# c = channel
        \# k = sink
        a1.sources = r1
        a1.channels = c1
        a1.sinks = k1
        \# ===SOURCES===
        a1.sources.r1.type = NETCAT
        a1.sources.r1.channels = c1
        a1.sources.r1.bind = localhost
        a1.sources.r1.port = 1473
        \# ===CHANNELS===
        a1.channels.c1.type = MEMORY
        # ===SINKS===
        a1.sinks.k1.type = NULL
        a1.sinks.k1.channel = c1

        2. Set the FLUME_CONF_DIR to point to your conf dir
        [will@localhost flume-1.2.0-incubating-SNAPSHOT]$ export FLUME_CONF_DIR=/home/will/git/apache/flume/flume-1.2.0-incubating-SNAPSHOT/conf

        3. Create a flume-env.sh file
        [will@localhost flume-1.2.0-incubating-SNAPSHOT]$ cp conf/flume-env.sh.template conf/flume-env.sh

        4. Adjust the memory size within flume-env.sh (this file will be automatically sourced when calling bin/flume-ng, but only if you've specified the FLUME_CONF_DIR env var)
        (here, I went to the extreme and I set the min and max heap to 1GB. I also specified a YourKit profiler agent)
        Sample contents of flume-env.sh:
        export JAVA_OPTS="-Xms1024m -Xmx1024m -agentpath:/home/will/tools/yjp-10.0.6/bin/linux-x86-64/libyjpagent.so=tracing,noj2ee"

        5. Run the flume NG agent:
        bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1

        6. Open-up 10 terminal windows (on the same host) to connect to the netcat server port. Sent continuous output in each terminal. I chose to use the command:
             yes | nc localhost 1473
        The "yes" unix cmd will continuously output 'y' char, followed by newline char. If you use YourKit and go into the Threads view, you'll see that after a while (possibly need to wait up to 10 mins) after a netcat handler thread has continuously been alternating between Runnable and Blocked states (blocking due to org.apache.klog4j.Category.log(..), but that's beside the point), the netcat handler thread enters a continuous wait state for exactly 1 minute, and then terminates (while its associated 'yes | nc localhost 1473' command is still running).

        I haven't done further analysis. My first thought was a thread safety issue. Note that there are no property file reconfigurations done during this test -- I leave the props file alone.

        I welcome your ideas/comments. I initially ran this test with the default -Xmx20m but it ran out of memory. For a future test I might lower the Xmx/Xms from 1GB to maybe 128MB.
        Steps:

        1. Use a props file such as the following:
        \# a = agent
        \# r = source
        \# c = channel
        \# k = sink
        a1.sources = r1
        a1.channels = c1
        a1.sinks = k1
        \# ===SOURCES===
        a1.sources.r1.type = NETCAT
        a1.sources.r1.channels = c1
        a1.sources.r1.bind = localhost
        a1.sources.r1.port = 1473
        \# ===CHANNELS===
        a1.channels.c1.type = MEMORY
        \# ===SINKS===
        a1.sinks.k1.type = NULL
        a1.sinks.k1.channel = c1

        2. Set the FLUME_CONF_DIR to point to your conf dir
        [will@localhost flume-1.2.0-incubating-SNAPSHOT]$ export FLUME_CONF_DIR=/home/will/git/apache/flume/flume-1.2.0-incubating-SNAPSHOT/conf

        3. Create a flume-env.sh file
        [will@localhost flume-1.2.0-incubating-SNAPSHOT]$ cp conf/flume-env.sh.template conf/flume-env.sh

        4. Adjust the memory size within flume-env.sh (this file will be automatically sourced when calling bin/flume-ng, but only if you've specified the FLUME_CONF_DIR env var)
        (here, I went to the extreme and I set the min and max heap to 1GB. I also specified a YourKit profiler agent)
        Sample contents of flume-env.sh:
        export JAVA_OPTS="-Xms1024m -Xmx1024m -agentpath:/home/will/tools/yjp-10.0.6/bin/linux-x86-64/libyjpagent.so=tracing,noj2ee"

        5. Run the flume NG agent:
        bin/flume-ng node --conf conf --conf-file conf/a1.properties --name a1

        6. Open-up 10 terminal windows (on the same host) to connect to the netcat server port. Sent continuous output in each terminal. I chose to use the command:
             yes | nc localhost 1473
        The "yes" unix cmd will continuously output 'y' char, followed by newline char. If you use YourKit and go into the Threads view, you'll see that after a while (possibly need to wait up to 10 mins) after a netcat handler thread has continuously been alternating between Runnable and Blocked states (blocking due to org.apache.klog4j.Category.log(..), but that's beside the point), the netcat handler thread enters a continuous wait state for exactly 1 minute, and then terminates (while its associated 'yes | nc localhost 1473' command is still running).

        I haven't done further analysis. My first thought was a thread safety issue. Note that there are no property file reconfigurations done during this test -- I leave the props file alone.

        I welcome your ideas/comments. I initially ran this test with the default -Xmx20m but it ran out of memory. For a future test I might lower the Xmx/Xms from 1GB to maybe 128MB.
        Mike Percy made changes -
        Assignee Mike Percy [ mpercy ]
        Mike Percy made changes -
        Attachment FLUME-1037-6.patch [ 12520350 ]
        Mike Percy made changes -
        Affects Version/s v1.2.0 [ 12320243 ]
        Fix Version/s v1.2.0 [ 12320243 ]
        Status Open [ 1 ] Patch Available [ 10002 ]
        Affects Version/s v1.1.0 [ 12319284 ]
        Arvind Prabhakar made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]

          People

          • Assignee:
            Mike Percy
            Reporter:
            Will McQueen
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development