ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1043

Looped NPE at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.3.3
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      Sparc Solaris 10
      Java 6u17 64 bits
      5 nodes ensemble

      Description

      I'm sorry but I only have this log (which belongs to a "follower" node) and a previous message Unexpected NodeCreated event after a reconnection. where I describe a potential side-effect at client side.

      2011-04-04 09:31:09,608 - INFO  [Snapshot Thread:FileTxnSnapLog@208][] - Snapshotting: 1700527e36
      2011-04-04 09:31:09,653 - INFO  [SyncThread:1:FileTxnLog@197][] - Creating new log file: log.1700527e38
      2011-04-04 10:13:39,287 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /XXX.XXX.XXX.69:1093
      2011-04-04 10:13:39,371 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@777][] - Client attempting to establish new session at /XXX.XXX.XXX.69:1093
      2011-04-04 10:13:39,376 - INFO  [CommitProcessor:1:NIOServerCnxn@1580][] - Established session 0x12ee79c4a720022 with negotiated timeout 20000 for client /XXX.XXX.XXX.69:1093
      2011-04-04 12:04:11,131 - INFO  [SyncThread:1:FileTxnLog@197][] - Creating new log file: log.170053bf15
      2011-04-04 12:04:11,131 - INFO  [Snapshot Thread:FileTxnSnapLog@208][] - Snapshotting: 170053bf17
      2011-04-04 12:13:10,779 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /XXX.XXX.XXX.63:1817
      2011-04-04 12:13:10,790 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@777][] - Client attempting to establish new session at /XXX.XXX.XXX.63:1817
      2011-04-04 12:13:10,794 - INFO  [CommitProcessor:1:NIOServerCnxn@1580][] - Established session 0x12ee79c4a720023 with negotiated timeout 20000 for client /XXX.XXX.XXX.63:1817
      2011-04-04 12:13:10,814 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@634][] - EndOfStreamException: Unable to read additional data from client sessionid 0x12ee79c4a720023, likely client has closed socket
      2011-04-04 12:13:10,816 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@1435][] - Closed socket connection for client /XXX.XXX.XXX.63:1817 which had sessionid 0x12ee79c4a720023
      2011-04-04 12:13:10,839 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /XXX.XXX.XXX.63:1814
      2011-04-04 12:13:10,840 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@274][] - Ignoring exception
      java.net.SocketException: Invalid argument
              at sun.nio.ch.Net.setIntOption0(Native Method)
              at sun.nio.ch.Net.setIntOption(Unknown Source)
              at sun.nio.ch.SocketChannelImpl$1.setInt(Unknown Source)
              at sun.nio.ch.SocketOptsImpl.setBoolean(Unknown Source)
              at sun.nio.ch.SocketOptsImpl$IP$TCP.noDelay(Unknown Source)
              at sun.nio.ch.OptionAdaptor.setTcpNoDelay(Unknown Source)
              at sun.nio.ch.SocketAdaptor.setTcpNoDelay(Unknown Source)
              at org.apache.zookeeper.server.NIOServerCnxn.<init>(NIOServerCnxn.java:1367)
              at org.apache.zookeeper.server.NIOServerCnxn$Factory.createConnection(NIOServerCnxn.java:215)
              at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:256)
      2011-04-04 12:13:10,841 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
      java.lang.NullPointerException
              at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
      2011-04-04 12:13:10,841 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
      java.lang.NullPointerException
              at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
      2011-04-04 12:13:10,842 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
      java.lang.NullPointerException
              at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
      ...
      ...
      ...
      2011-04-04 16:49:23,101 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
      java.lang.NullPointerException
              at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
      

        Activity

        Hide
        Mahadev konar added a comment -

        Cesar,
        I see the logs have nullpointer exceptions, I did read through your link that you specified but I am still unclear on what happened. Can you please provide a detailed view of what happened on the client/server (or the logs). You mentioned that you get a node created event even though you are just using getchildren and getdata? Can you please make sure you are not using exists in your code?

        Show
        Mahadev konar added a comment - Cesar, I see the logs have nullpointer exceptions, I did read through your link that you specified but I am still unclear on what happened. Can you please provide a detailed view of what happened on the client/server (or the logs). You mentioned that you get a node created event even though you are just using getchildren and getdata? Can you please make sure you are not using exists in your code?
        Hide
        César Álvarez Núñez added a comment -

        Hi Mahadev,

        I apologize for my poor description. I will try to do it better this time.

        The bug has two parts: Server-side and Client-side.

        Few days ago (Mon, Mar 14, 2011) one of our production "Zookeeper server" was restarted because it was consuming too much CPU and flooding log files with millions of lines like the following.

        2011-04-04 12:13:10,841 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
        java.lang.NullPointerException
                at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
        

        Unfortunately I was not able to recover the log content previous to the NPE.
        But it responds fine to an "ruok" and "stat".

        A few hours later, the content of one ZNode was modified resulting on the behavior described in the email.
        I've reviewed the code that makes use of the affected ZNodes and I'm sure that it does not invoke any "exists" method.

        Today the "Zookeeper server" on the same physical server than before start again to flooding log files.
        Fortunately this time I have recovered the log entries (which are included on bug description) previous to the NPE loop.
        The "Zookeeper server" was shutdown at 16:19+- and restarted at 16:23:50

        From the "zookeeper client" side:

        • Only one client shows a reconnection following by unexpected NodeCreated.
        2011-04-04 16:19:58,307 I [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:Disconnected type:None path:null)
        2011-04-04 16:19:58,864 I [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:None path:null)
        2011-04-04 16:19:58,870 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/AP/subscriptionLegalConditions)
        2011-04-04 16:19:58,871 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Certification/fileTemp)
        2011-04-04 16:19:58,871 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Pdf/urlFont)
        2011-04-04 16:19:58,872 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/ActiveX/classid)
        2011-04-04 16:19:58,872 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Signature/timeOut)
        2011-04-04 16:19:58,873 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Locale/locales)
        2011-04-04 16:19:58,873 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/pc/locales)
        2011-04-04 16:19:58,874 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Deh/aceptarNotComLegalConditions)
        2011-04-04 16:19:58,874 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/pc/pathImagenIssuingEntity)
        2011-04-04 16:19:58,875 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/pc/context)
        2011-04-04 16:19:58,875 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/ActiveX/xadesActivex)
        2011-04-04 16:19:58,876 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/pc/maxMostReadedFaqs)
        2011-04-04 16:19:58,876 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Deh/createUserLegalConditions)
        2011-04-04 16:19:58,877 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/pc/hostAlternative)
        2011-04-04 16:19:58,877 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Deh/modificationUserLegalConditions)
        2011-04-04 16:19:58,878 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Pdf/imageLogo)
        2011-04-04 16:19:58,879 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Pdf/font)
        2011-04-04 16:19:58,879 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/pc/host)
        2011-04-04 16:19:58,880 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Avisos/fechaFin)
        
        • The above log belongs to a client which was restarted at 13:06.
        • The "/path/Configuration/Avisos/fechaFin" has as ctime "10-nov-2010 18:00:53" and as mtime "11-mar-2011 14:07:14"
        • I've modified one ZNode which should have triggered a NodeDataChanged on the tree "zookeeper client" instances; but no one has received any WatchedEvent.
        Show
        César Álvarez Núñez added a comment - Hi Mahadev, I apologize for my poor description. I will try to do it better this time. The bug has two parts: Server-side and Client-side. Few days ago (Mon, Mar 14, 2011) one of our production "Zookeeper server" was restarted because it was consuming too much CPU and flooding log files with millions of lines like the following. 2011-04-04 12:13:10,841 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception java.lang.NullPointerException at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244) Unfortunately I was not able to recover the log content previous to the NPE. But it responds fine to an "ruok" and "stat". A few hours later, the content of one ZNode was modified resulting on the behavior described in the email. I've reviewed the code that makes use of the affected ZNodes and I'm sure that it does not invoke any "exists" method. Today the "Zookeeper server" on the same physical server than before start again to flooding log files. Fortunately this time I have recovered the log entries (which are included on bug description) previous to the NPE loop. The "Zookeeper server" was shutdown at 16:19+- and restarted at 16:23:50 From the "zookeeper client" side: Only one client shows a reconnection following by unexpected NodeCreated. 2011-04-04 16:19:58,307 I [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:Disconnected type:None path:null) 2011-04-04 16:19:58,864 I [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:None path:null) 2011-04-04 16:19:58,870 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/AP/subscriptionLegalConditions) 2011-04-04 16:19:58,871 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Certification/fileTemp) 2011-04-04 16:19:58,871 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Pdf/urlFont) 2011-04-04 16:19:58,872 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/ActiveX/classid) 2011-04-04 16:19:58,872 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Signature/timeOut) 2011-04-04 16:19:58,873 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Locale/locales) 2011-04-04 16:19:58,873 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/pc/locales) 2011-04-04 16:19:58,874 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Deh/aceptarNotComLegalConditions) 2011-04-04 16:19:58,874 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/pc/pathImagenIssuingEntity) 2011-04-04 16:19:58,875 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/pc/context) 2011-04-04 16:19:58,875 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/ActiveX/xadesActivex) 2011-04-04 16:19:58,876 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/pc/maxMostReadedFaqs) 2011-04-04 16:19:58,876 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Deh/createUserLegalConditions) 2011-04-04 16:19:58,877 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/pc/hostAlternative) 2011-04-04 16:19:58,877 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Deh/modificationUserLegalConditions) 2011-04-04 16:19:58,878 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Pdf/imageLogo) 2011-04-04 16:19:58,879 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Pdf/font) 2011-04-04 16:19:58,879 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/pc/host) 2011-04-04 16:19:58,880 D [main-EventThread] e.z.ReadOnlyConfiguration@bc794cc - process(WatchedEvent state:SyncConnected type:NodeCreated path:/path/Configuration/Avisos/fechaFin) The above log belongs to a client which was restarted at 13:06. The "/path/Configuration/Avisos/fechaFin" has as ctime "10-nov-2010 18:00:53" and as mtime "11-mar-2011 14:07:14" I've modified one ZNode which should have triggered a NodeDataChanged on the tree "zookeeper client" instances; but no one has received any WatchedEvent.
        Hide
        César Álvarez Núñez added a comment -

        Today we have had the same problem and the origin seems to be the followin exception:

        2011-07-04 10:39:55,515 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /YYY.YYY.YYY.67:1500
        2011-07-04 10:39:55,581 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@274][] - Ignoring exception
        java.net.SocketException: Invalid argument
                at sun.nio.ch.Net.setIntOption0(Native Method)
                at sun.nio.ch.Net.setIntOption(Unknown Source)
                at sun.nio.ch.SocketChannelImpl$1.setInt(Unknown Source)
                at sun.nio.ch.SocketOptsImpl.setBoolean(Unknown Source)
                at sun.nio.ch.SocketOptsImpl$IP$TCP.noDelay(Unknown Source)
                at sun.nio.ch.OptionAdaptor.setTcpNoDelay(Unknown Source)
                at sun.nio.ch.SocketAdaptor.setTcpNoDelay(Unknown Source)
                at org.apache.zookeeper.server.NIOServerCnxn.<init>(NIOServerCnxn.java:1367)
                at org.apache.zookeeper.server.NIOServerCnxn$Factory.createConnection(NIOServerCnxn.java:215)
                at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:256)
        

        As you can see, it seems to exist some correlation between the SocketException and the looped NPE.

        2011-07-04 02:48:07,663 - INFO  [Snapshot Thread:FileTxnSnapLog@208][] - Snapshotting: 1801267fa2
        2011-07-04 02:48:07,674 - INFO  [SyncThread:2:FileTxnLog@197][] - Creating new log file: log.1801267fa4
        2011-07-04 08:30:43,812 - INFO  [Snapshot Thread:FileTxnSnapLog@208][] - Snapshotting: 18012774dd
        2011-07-04 08:30:43,815 - INFO  [SyncThread:2:FileTxnLog@197][] - Creating new log file: log.18012774df
        2011-07-04 09:23:40,586 - INFO  [Snapshot Thread:FileTxnSnapLog@208][] - Snapshotting: 180128bb4d
        2011-07-04 09:23:40,596 - INFO  [SyncThread:2:FileTxnLog@197][] - Creating new log file: log.180128bb4f
        2011-07-04 10:09:23,753 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /XXX.XXX.XXX.245:49993
        2011-07-04 10:09:23,754 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@1237][] - Processing stat command from /XXX.XXX.XXX.245:49993
        2011-07-04 10:09:23,755 - INFO  [Thread-15:NIOServerCnxn$StatCommand@1153][] - Stat command output
        2011-07-04 10:09:23,758 - INFO  [Thread-15:NIOServerCnxn@1435][] - Closed socket connection for client /XXX.XXX.XXX.245:49993 (no session established for client)
        2011-07-04 10:20:03,391 - INFO  [Snapshot Thread:FileTxnSnapLog@208][] - Snapshotting: 180129e6b4
        2011-07-04 10:20:03,397 - INFO  [SyncThread:2:FileTxnLog@197][] - Creating new log file: log.180129e6b6
        2011-07-04 10:24:34,042 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /YYY.YYY.YYY.69:1093
        2011-07-04 10:24:34,044 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@1237][] - Processing stat command from /YYY.YYY.YYY.69:1093
        2011-07-04 10:24:34,045 - INFO  [Thread-16:NIOServerCnxn$StatCommand@1153][] - Stat command output
        2011-07-04 10:24:34,048 - INFO  [Thread-16:NIOServerCnxn@1435][] - Closed socket connection for client /YYY.YYY.YYY.69:1093 (no session established for client)
        2011-07-04 10:39:55,515 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /YYY.YYY.YYY.67:1500
        2011-07-04 10:39:55,581 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@274][] - Ignoring exception
        java.net.SocketException: Invalid argument
                at sun.nio.ch.Net.setIntOption0(Native Method)
                at sun.nio.ch.Net.setIntOption(Unknown Source)
                at sun.nio.ch.SocketChannelImpl$1.setInt(Unknown Source)
                at sun.nio.ch.SocketOptsImpl.setBoolean(Unknown Source)
                at sun.nio.ch.SocketOptsImpl$IP$TCP.noDelay(Unknown Source)
                at sun.nio.ch.OptionAdaptor.setTcpNoDelay(Unknown Source)
                at sun.nio.ch.SocketAdaptor.setTcpNoDelay(Unknown Source)
                at org.apache.zookeeper.server.NIOServerCnxn.<init>(NIOServerCnxn.java:1367)
                at org.apache.zookeeper.server.NIOServerCnxn$Factory.createConnection(NIOServerCnxn.java:215)
                at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:256)
        2011-07-04 10:39:55,601 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
        java.lang.NullPointerException
                at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262)
        2011-07-04 10:39:55,602 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
        java.lang.NullPointerException
                at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262)
        2011-07-04 10:39:55,602 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
        java.lang.NullPointerException
                at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
        2011-07-04 10:39:55,603 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
        java.lang.NullPointerException
                at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
        2011-07-04 10:39:55,603 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
        java.lang.NullPointerException
                at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
        2011-07-04 10:39:55,604 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
        java.lang.NullPointerException
                at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
        2011-07-04 10:39:55,604 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
        java.lang.NullPointerException
                at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
        2011-07-04 10:39:55,605 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
        java.lang.NullPointerException
                at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262)
        2011-07-04 10:39:55,605 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
        java.lang.NullPointerException
        ...
        

        Any help,
        /César.

        Show
        César Álvarez Núñez added a comment - Today we have had the same problem and the origin seems to be the followin exception: 2011-07-04 10:39:55,515 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /YYY.YYY.YYY.67:1500 2011-07-04 10:39:55,581 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@274][] - Ignoring exception java.net.SocketException: Invalid argument at sun.nio.ch.Net.setIntOption0(Native Method) at sun.nio.ch.Net.setIntOption(Unknown Source) at sun.nio.ch.SocketChannelImpl$1.setInt(Unknown Source) at sun.nio.ch.SocketOptsImpl.setBoolean(Unknown Source) at sun.nio.ch.SocketOptsImpl$IP$TCP.noDelay(Unknown Source) at sun.nio.ch.OptionAdaptor.setTcpNoDelay(Unknown Source) at sun.nio.ch.SocketAdaptor.setTcpNoDelay(Unknown Source) at org.apache.zookeeper.server.NIOServerCnxn.<init>(NIOServerCnxn.java:1367) at org.apache.zookeeper.server.NIOServerCnxn$Factory.createConnection(NIOServerCnxn.java:215) at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:256) As you can see, it seems to exist some correlation between the SocketException and the looped NPE. 2011-07-04 02:48:07,663 - INFO [Snapshot Thread:FileTxnSnapLog@208][] - Snapshotting: 1801267fa2 2011-07-04 02:48:07,674 - INFO [SyncThread:2:FileTxnLog@197][] - Creating new log file: log.1801267fa4 2011-07-04 08:30:43,812 - INFO [Snapshot Thread:FileTxnSnapLog@208][] - Snapshotting: 18012774dd 2011-07-04 08:30:43,815 - INFO [SyncThread:2:FileTxnLog@197][] - Creating new log file: log.18012774df 2011-07-04 09:23:40,586 - INFO [Snapshot Thread:FileTxnSnapLog@208][] - Snapshotting: 180128bb4d 2011-07-04 09:23:40,596 - INFO [SyncThread:2:FileTxnLog@197][] - Creating new log file: log.180128bb4f 2011-07-04 10:09:23,753 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /XXX.XXX.XXX.245:49993 2011-07-04 10:09:23,754 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@1237][] - Processing stat command from /XXX.XXX.XXX.245:49993 2011-07-04 10:09:23,755 - INFO [Thread-15:NIOServerCnxn$StatCommand@1153][] - Stat command output 2011-07-04 10:09:23,758 - INFO [Thread-15:NIOServerCnxn@1435][] - Closed socket connection for client /XXX.XXX.XXX.245:49993 (no session established for client) 2011-07-04 10:20:03,391 - INFO [Snapshot Thread:FileTxnSnapLog@208][] - Snapshotting: 180129e6b4 2011-07-04 10:20:03,397 - INFO [SyncThread:2:FileTxnLog@197][] - Creating new log file: log.180129e6b6 2011-07-04 10:24:34,042 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /YYY.YYY.YYY.69:1093 2011-07-04 10:24:34,044 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@1237][] - Processing stat command from /YYY.YYY.YYY.69:1093 2011-07-04 10:24:34,045 - INFO [Thread-16:NIOServerCnxn$StatCommand@1153][] - Stat command output 2011-07-04 10:24:34,048 - INFO [Thread-16:NIOServerCnxn@1435][] - Closed socket connection for client /YYY.YYY.YYY.69:1093 (no session established for client) 2011-07-04 10:39:55,515 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /YYY.YYY.YYY.67:1500 2011-07-04 10:39:55,581 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@274][] - Ignoring exception java.net.SocketException: Invalid argument at sun.nio.ch.Net.setIntOption0(Native Method) at sun.nio.ch.Net.setIntOption(Unknown Source) at sun.nio.ch.SocketChannelImpl$1.setInt(Unknown Source) at sun.nio.ch.SocketOptsImpl.setBoolean(Unknown Source) at sun.nio.ch.SocketOptsImpl$IP$TCP.noDelay(Unknown Source) at sun.nio.ch.OptionAdaptor.setTcpNoDelay(Unknown Source) at sun.nio.ch.SocketAdaptor.setTcpNoDelay(Unknown Source) at org.apache.zookeeper.server.NIOServerCnxn.<init>(NIOServerCnxn.java:1367) at org.apache.zookeeper.server.NIOServerCnxn$Factory.createConnection(NIOServerCnxn.java:215) at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:256) 2011-07-04 10:39:55,601 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception java.lang.NullPointerException at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262) 2011-07-04 10:39:55,602 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception java.lang.NullPointerException at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262) 2011-07-04 10:39:55,602 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception java.lang.NullPointerException at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244) 2011-07-04 10:39:55,603 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception java.lang.NullPointerException at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244) 2011-07-04 10:39:55,603 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception java.lang.NullPointerException at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244) 2011-07-04 10:39:55,604 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception java.lang.NullPointerException at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244) 2011-07-04 10:39:55,604 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception java.lang.NullPointerException at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244) 2011-07-04 10:39:55,605 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception java.lang.NullPointerException at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:262) 2011-07-04 10:39:55,605 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception java.lang.NullPointerException ... Any help, /César.
        Hide
        César Álvarez Núñez added a comment -

        Helful link: http://java.net/jira/browse/GLASSFISH-5342

        Reviewing "org.apache.zookeeper.server.NIOServerCnxn" source code, the exception is thrown at line 256 which is captured at line 274, starting the while loop.

        Show
        César Álvarez Núñez added a comment - Helful link: http://java.net/jira/browse/GLASSFISH-5342 Reviewing "org.apache.zookeeper.server.NIOServerCnxn" source code, the exception is thrown at line 256 which is captured at line 274, starting the while loop.
        Hide
        Rakesh R added a comment -

        Hi, we have had the similar problem in very high load. Unforunately I couldn't able to re-produce it.

        2011-09-22 04:17:30,396 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:65170:NIOServerCnxn@1585] - Established session 0x1328c8973a9003e with negotiated timeout 20000 for client /10.18.52.253:44857
        2011-09-22 04:17:30,405 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:65170:NIOServerCnxn$Factory@253] - Accepted socket connection from /10.18.52.108:19351
        2011-09-22 04:17:30,412 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:65170:NIOServerCnxn$Factory@274] - Ignoring unexpected runtime exception
        java.nio.channels.CancelledKeyException
        	at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
        	at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:69)
        	at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:242)
        2011-09-22 04:17:30,470 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:65170:NIOServerCnxn$Factory@274] - Ignoring unexpected runtime exception
        java.lang.NullPointerException
        	at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:245)
        2011-09-22 04:17:30,518 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:65170:NIOServerCnxn$Factory@274] - Ignoring unexpected runtime exception
        java.lang.NullPointerException
        	at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:245)
        
        

        My observation is evenif we have the registered OP_ACCEPT 'ss.register(selector, SelectionKey.OP_ACCEPT);' there is a chance of returning null by the accept() api in very high load. (Also free swap space was very less).

        Thanks,
        Rakesh

        Show
        Rakesh R added a comment - Hi, we have had the similar problem in very high load. Unforunately I couldn't able to re-produce it. 2011-09-22 04:17:30,396 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:65170:NIOServerCnxn@1585] - Established session 0x1328c8973a9003e with negotiated timeout 20000 for client /10.18.52.253:44857 2011-09-22 04:17:30,405 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:65170:NIOServerCnxn$Factory@253] - Accepted socket connection from /10.18.52.108:19351 2011-09-22 04:17:30,412 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:65170:NIOServerCnxn$Factory@274] - Ignoring unexpected runtime exception java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:69) at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:242) 2011-09-22 04:17:30,470 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:65170:NIOServerCnxn$Factory@274] - Ignoring unexpected runtime exception java.lang.NullPointerException at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:245) 2011-09-22 04:17:30,518 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:65170:NIOServerCnxn$Factory@274] - Ignoring unexpected runtime exception java.lang.NullPointerException at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:245) My observation is evenif we have the registered OP_ACCEPT 'ss.register(selector, SelectionKey.OP_ACCEPT);' there is a chance of returning null by the accept() api in very high load. (Also free swap space was very less). Thanks, Rakesh
        Hide
        J Ch added a comment -

        you got a java.net.SocketException: Invalid argument when try to set options to a closed socket.
        http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6378870

        Show
        J Ch added a comment - you got a java.net.SocketException: Invalid argument when try to set options to a closed socket. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6378870
        Hide
        J Ch added a comment -

        Looped NPE because of a bug in NIOServerCnxn.java::run();

        if any exception raised inside the ACCPET event handle block(for example, exception on createConnection(sc, sk), the event will not be removed from the SelectionKeySet, which resulted looped NPE.

        Show
        J Ch added a comment - Looped NPE because of a bug in NIOServerCnxn.java::run(); if any exception raised inside the ACCPET event handle block(for example, exception on createConnection(sc, sk) , the event will not be removed from the SelectionKeySet, which resulted looped NPE.
        Hide
        Sergey Maslyakov added a comment -

        I observe very similar symptoms on a 3.4.5 Zookeeper that runs on Solaris10/x86 and JRE 1.6.0_27-b07. Please see the log below. The explanation that I found is same as what "J Ch" is talking about. First, Socket.setTcpNodelay() throws an exception that is not properly handled, which results in not attaching a socket to a selector, and then a flood of log messages caused by an NPE.

        I would say it is a critical problem as I have not seen Zookeeper to leave this state on its own.

        Sample log
        2013-04-24 03:07:04,480 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.64.133.196:54055
        2013-04-24 03:07:04,481 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12181:NIOServerCnxnFactory@220] - Ignoring exception
        java.net.SocketException: Invalid argument
                at sun.nio.ch.Net.setIntOption0(Native Method)
                at sun.nio.ch.Net.setIntOption(Net.java:157)
                at sun.nio.ch.SocketChannelImpl$1.setInt(SocketChannelImpl.java:406)
                at sun.nio.ch.SocketOptsImpl.setBoolean(SocketOptsImpl.java:38)
                at sun.nio.ch.SocketOptsImpl$IP$TCP.noDelay(SocketOptsImpl.java:284)
                at sun.nio.ch.OptionAdaptor.setTcpNoDelay(OptionAdaptor.java:48)
                at sun.nio.ch.SocketAdaptor.setTcpNoDelay(SocketAdaptor.java:268)
                at org.apache.zookeeper.server.NIOServerCnxn.<init>(NIOServerCnxn.java:107)
                at org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:161)
                at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:202)
                at java.lang.Thread.run(Thread.java:619)
        2013-04-24 03:07:04,482 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12181:NIOServerCnxnFactory@218] - Ignoring unexpected runtime exception
        java.lang.NullPointerException
                at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:190)
                at java.lang.Thread.run(Thread.java:619) 
        

        The fix seems to be pretty simple, however, it is very difficult to reproduce the problem for testing purposes.

        Show
        Sergey Maslyakov added a comment - I observe very similar symptoms on a 3.4.5 Zookeeper that runs on Solaris10/x86 and JRE 1.6.0_27-b07. Please see the log below. The explanation that I found is same as what "J Ch" is talking about. First, Socket.setTcpNodelay() throws an exception that is not properly handled, which results in not attaching a socket to a selector, and then a flood of log messages caused by an NPE. I would say it is a critical problem as I have not seen Zookeeper to leave this state on its own. Sample log 2013-04-24 03:07:04,480 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.64.133.196:54055 2013-04-24 03:07:04,481 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12181:NIOServerCnxnFactory@220] - Ignoring exception java.net.SocketException: Invalid argument at sun.nio.ch.Net.setIntOption0(Native Method) at sun.nio.ch.Net.setIntOption(Net.java:157) at sun.nio.ch.SocketChannelImpl$1.setInt(SocketChannelImpl.java:406) at sun.nio.ch.SocketOptsImpl.setBoolean(SocketOptsImpl.java:38) at sun.nio.ch.SocketOptsImpl$IP$TCP.noDelay(SocketOptsImpl.java:284) at sun.nio.ch.OptionAdaptor.setTcpNoDelay(OptionAdaptor.java:48) at sun.nio.ch.SocketAdaptor.setTcpNoDelay(SocketAdaptor.java:268) at org.apache.zookeeper.server.NIOServerCnxn.<init>(NIOServerCnxn.java:107) at org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:161) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:202) at java.lang.Thread.run(Thread.java:619) 2013-04-24 03:07:04,482 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12181:NIOServerCnxnFactory@218] - Ignoring unexpected runtime exception java.lang.NullPointerException at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:190) at java.lang.Thread.run(Thread.java:619) The fix seems to be pretty simple, however, it is very difficult to reproduce the problem for testing purposes.

          People

          • Assignee:
            Unassigned
            Reporter:
            César Álvarez Núñez
          • Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development