Uploaded image for project: 'Thrift'
  1. Thrift
  2. THRIFT-1423

Binary transport break up by overlapping of TFrame headers in scribe.AsyncClient

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.5, 0.6, 0.6.1, 0.7
    • Fix Version/s: None
    • Component/s: Java - Library
    • Labels:
    • Environment:

      Description

      I have got strange behavior of async client in libthrift. Building up described configuration on ec2, i have got a lot of moments when thrift server says that it has broken transport. I thought it is scribe problem. But digging inside of tcpdump i have found one strange thing with thrift async client (scribe.AsyncClient).

      There is no doubt that async client support for multi-threaded access. Yep, it is implemented as single threaded fst. But it has concurrent incoming pending calls queue as you know.

      So, what i have found and that is the source of connections transports breaking off:

      17:10:17.815425 IP (tos 0x0, ttl 64, id 38164, offset 0, flags [DF], proto TCP (6), length 56)
      localhost.39412 > localhost.1463: Flags [P.], cksum 0xfe2c (incorrect -> 0x1371), seq 1110:1114, ack 337, win 257, options [nop,nop,TS val 34123260 ecr 34122685], length 4
      0x0000: 4500 0038 9514 4000 4006 a7a9 7f00 0001
      0x0010: 7f00 0001 99f4 05b7 7417 1499 747b 6961
      0x0020: 8018 0101 fe2c 0000 0101 080a 0208 adfc
      0x0030: 0208 abbd 0000 003a
      17:10:17.815534 IP (tos 0x0, ttl 64, id 38165, offset 0, flags [DF], proto TCP (6), length 56)
      localhost.39412 > localhost.1463: Flags [P.], cksum 0xfe2c (incorrect -> 0x134b), seq 1114:1118, ack 337, win 257, options [nop,nop,TS val 34123260 ecr 34122685], length 4
      0x0000: 4500 0038 9515 4000 4006 a7a8 7f00 0001
      0x0010: 7f00 0001 99f4 05b7 7417 149d 747b 6961
      0x0020: 8018 0101 fe2c 0000 0101 080a 0208 adfc
      0x0030: 0208 abbd 0000 005c
      17:10:17.815648 IP (tos 0x0, ttl 64, id 60631, offset 0, flags [DF], proto TCP (6), length 52)
      localhost.1463 > localhost.39412: Flags [.], cksum 0xfe28 (incorrect -> 0x1171), seq 337, ack 1118, win 256, options [nop,nop,TS val 34123260 ecr 34123260], length 0
      0x0000: 4500 0034 ecd7 4000 4006 4fea 7f00 0001
      0x0010: 7f00 0001 05b7 99f4 747b 6961 7417 14a1
      0x0020: 8010 0100 fe28 0000 0101 080a 0208 adfc
      0x0030: 0208 adfc
      17:10:17.815658 IP (tos 0x0, ttl 64, id 38166, offset 0, flags [DF], proto TCP (6), length 144)
      localhost.39412 > localhost.1463: Flags [P.], cksum 0xfe84 (incorrect -> 0xe7af), seq 1118:1210, ack 337, win 257, options [nop,nop,TS val 34123260 ecr 34123260], length 92
      0x0000: 4500 0090 9516 4000 4006 a74f 7f00 0001
      0x0010: 7f00 0001 99f4 05b7 7417 14a1 747b 6961
      0x0020: 8018 0101 fe84 0000 0101 080a 0208 adfc
      0x0030: 0208 adfc 8001 0001 0000 0003 4c6f 6700
      0x0040: 0000 000f 0001 0c00 0000 010b 0001 0000
      0x0050: 0005 6974 656d 730b 0002 0000 0030 2863
      0x0060: 6c75 622e 7573 652c 2036 3031 322c 2030
      0x0070: 2c20 302c 2032 3830 2c20 302c 2074 7275
      0x0080: 652c 2031 3332 3035 3838 3631 3729 0000
      17:10:17.815971 IP (tos 0x0, ttl 64, id 60632, offset 0, flags [DF], proto TCP (6), length 52)
      localhost.1463 > localhost.39412: Flags [R.], cksum 0xfe28 (incorrect -> 0x1111), seq 337, ack 1210, win 256, options [nop,nop,TS val 34123260 ecr 34123260], length 0
      0x0000: 4500 0034 ecd8 4000 4006 4fe9 7f00 0001
      0x0010: 7f00 0001 05b7 99f4 747b 6961 7417 14fd
      0x0020: 8014 0100 fe28 0000 0101 080a 0208 adfc
      0x0030: 0208 adfc

      There are two sequentional tframedtransport headers/packets in the dump. then ack from scribe. then psh of Log_call. it is clearly ok that scribe said it is broken transport caused by tframes are mixed. 1 packet is TFrame data size for message = 3a, 2 packet is TFrame data size = 5c. Third is ACK. Forth is Log_call length of 5c. Fifth packed is ACK with RST from scribe cause it found error in binary stream. There is no Log_call packet of length 3a between two first packets. Two sequentional tframe headers are in separate packets from it's data only because of TCP_NODELAY and separated channel.write(TFrameSizeBuffer) and channel.write(TFrameDataBuffer).

      Why there are separated states and buffers to write BODY_SIZE and BODY_DATA in the client (TAsyncMethodCall)? Why there are separated states for READS? The only thing the BODY_SIZE is for, is emulation of TFramedTransport, but why it writes it's header asynchronously and separatelly from BODY_DATA? Sync version of client writes TFrameHeader and it's data in one single TCP packet, not as async client. Any way, this follows transport to mess it's data.

      Am i right this is bug of implementation of tframed support in async client?

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sitano Ivan Prisyazhniy
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated: