weixin_39647977
weixin_39647977
2020-12-29 10:26

Clients randomly throw SCTP_CANT_STR_ASSOC

Found another issue. When many clients are quickly connecting to a server one after another, some of them randomly throw SCTP_CANT_STR_ASSOC after init timer expires, but the first client always connected successfully in my tests.

Note that I left debugging messages around WSARecvMsg from the previous issue:

Debug logs (Successful connection)

Before WSARecvMsg
Address set!
Client ready for connection
SCTP: add HMAC id 1 to list
Bind called port: 0
Addr: IPv6 address: 0:0:0:0:0:0:0:0:port:0 scope:0
Main hash to bind at head:0000000000DE5EE8, bound port:49616 - in tcp_pool=0
Allocate an association for peer:IPv4 address: 192.168.1.2:9500
Port:9500
Adding an address (from:1) to the peer: IPv4 address: 192.168.1.2:9500
Association 0000000000674680 now allocated
Sending INIT
Sending INIT - calls lowlevel_output
Select source addr for:IPv4 address: 192.168.1.2:9500
ifn from route:0000000000000000 ifn_index:1
No ifn emit interface?
Trying Plan B
Examine interface {8B0878A2-ABFC-4045-AC13-B242D16856CE}
Is destination preferred:IPv4 address: 192.168.1.2:0
src_loop:0 src_priv:1 src_glob:0
dest_loop:0 dest_priv:1 dest_glob:0
YES
Found ifn:0000000000000000 1 preferred source addresses
num preferred:1 on interface:0000000000DE9900 cur_addr_num:0
Is destination preferred:IPv4 address: 192.168.1.2:0
src_loop:0 src_priv:1 src_glob:0
dest_loop:0 dest_priv:1 dest_glob:0
YES
we selected 0
Source:IPv4 address: 192.168.1.2:0
Dest:IPv4 address: 192.168.1.2:9500
Calling ipv4 output routine from low level src addr:c0a80102
Destination is c0a80102
RTP route is 00000000006754B0 through
IP output returns 0
Client trying to establish a connection to 192.168.1.2
WSARecvMsg() returned 0
WSAGetLastError() returned 0 (0)
recv_function_udp: Received 388 bytes. - calling sctp_common_input_processing with off=12
Ok, Common input processing called, m:0000000000DE9E00 iphlen:0 offset:12 length:388 stcb:0000000000674680
stcb:0000000000674680 state:2
sctp_process_control: iphlen=0, offset=12, length=388 stcb:0000000000674680
sctp_process_control: processing a chunk type=2, len=376
SCTP_INIT_ACK
sctp_handle_init_ack: handling INIT-ACK
Check for unrecognized param's
Hit default param 8004
move on
SCTP: add HMAC id 1 to list
SCTP: negotiated peer HMAC id 1
moving to COOKIE-ECHOED state
Leaving handle-init-ack end
Calling ipv4 output routine from low level src addr:c0a80102
Destination is c0a80102
RTP route is 00000000006754B0 through
IP output returns 0
m-c-o put out 0
Ok, we have put out 0 chunks
Check for chunk output prw:1048576 tqe:0 tf=0
Calling chunk OUTPUT
m-c-o put out 0
Ok, we have put out 0 chunks
chunk OUTPUT returns
Before WSARecvMsg
WSARecvMsg() returned 0
WSAGetLastError() returned 0 (0)
recv_function_udp: Received 16 bytes. - calling sctp_common_input_processing with off=12
Ok, Common input processing called, m:0000000000662B50 iphlen:0 offset:12 length:16 stcb:0000000000674680
stcb:0000000000674680 state:4
sctp_process_control: iphlen=0, offset=12, length=16 stcb:0000000000674680
sctp_process_control: processing a chunk type=11, len=4
SCTP_COOKIE_ACK, stcb 0000000000674680
sctp_handle_cookie_ack: handling COOKIE-ACK
moving to OPEN state
Check for chunk output prw:1048576 tqe:1 tf=0
Send called addr:0000000000000000 send length 1
Calling ipv4 output routine from low level src addr:c0a80102
Destination is c0a80102
RTP route is 00000000006754B0 through
IP output returns 0
m-c-o put out 1
Ok, we have put out 1 chunks
USR Send complete qo:0 prw:1048300 unsent:1 tf:17 cooq:1 toqs:17 err:0
Client connected to server
Client status - SRTT: 109, RTO: 271, MTU: 1460
Before WSARecvMsg
WSARecvMsg() returned 0
WSAGetLastError() returned 0 (0)
recv_function_udp: Received 28 bytes. - calling sctp_common_input_processing with off=12
Ok, Common input processing called, m:0000000000662B50 iphlen:0 offset:12 length:28 stcb:0000000000674680
stcb:0000000000674680 state:8
sctp_process_control: iphlen=0, offset=12, length=28 stcb:0000000000674680
sctp_process_control: processing a chunk type=3, len=16
SCTP_SACK
SCTP_SACK process cum_ack:d5c45ff6 num_seg:0 a_rwnd:1048039
Check for chunk output prw:1048039 tqe:1 tf=0
Before WSARecvMsg
WSARecvMsg() returned 0
WSAGetLastError() returned 0 (0)
recv_function_udp: Received 32 bytes. - calling sctp_common_input_processing with off=12
Ok, Common input processing called, m:00000000006632C0 iphlen:0 offset:12 length:32 stcb:0000000000674680
stcb:0000000000674680 state:8
chunk_flags:0x3 look for control on queues 0000000000000000
chunk_flags: 0x3 ordered: 1 MID: 0 control: 0000000000DE9E00
Queue control: 0000000000DE9E00 for reordering MID: 0
Check for chunk output prw:1048039 tqe:0 tf=0
Calling chunk OUTPUT
Calling ipv4 output routine from low level src addr:c0a80102
Destination is c0a80102
RTP route is 00000000006754B0 through
IP output returns 0
m-c-o put out 1
Ok, we have put out 1 chunks
chunk OUTPUT returns
Before WSARecvMsg
WSARecvMsg() returned -1
WSAGetLastError() returned 10060 (274c)
Before WSARecvMsg
WSARecvMsg() returned -1
WSAGetLastError() returned 10060 (274c)
Before WSARecvMsg
WSARecvMsg() returned -1
WSAGetLastError() returned 10060 (274c)
Before WSARecvMsg
WSARecvMsg() returned -1
WSAGetLastError() returned 10060 (274c)
Before WSARecvMsg
WSARecvMsg() returned -1
WSAGetLastError() returned 10060 (274c)
Before WSARecvMsg
WSARecvMsg() returned -1
WSAGetLastError() returned 10060 (274c)
Before WSARecvMsg
WSARecvMsg() returned -1
WSAGetLastError() returned 10060 (274c)
Before WSARecvMsg
WSARecvMsg() returned -1
WSAGetLastError() returned 10060 (274c)
Before WSARecvMsg
WSARecvMsg() returned -1
WSAGetLastError() returned 10060 (274c)
Before WSARecvMsg
WSARecvMsg() returned -1
WSAGetLastError() returned 10060 (274c)
Before WSARecvMsg
WSARecvMsg() returned -1
WSAGetLastError() returned 10060 (274c)
Before WSARecvMsg
WSARecvMsg() returned 0
WSAGetLastError() returned 0 (0)
recv_function_udp: Received 56 bytes. - calling sctp_common_input_processing with off=12
Ok, Common input processing called, m:0000000000661B60 iphlen:0 offset:12 length:56 stcb:0000000000674680
Timer type 5 goes off
stcb:0000000000674680 state:8
sctp_process_control: iphlen=0, offset=12, length=56 stcb:0000000000674680
sctp_process_control: processing a chunk type=4, len=44
SCTP_HEARTBEAT
Check for chunk output prw:1048039 tqe:0 tf=0
Calling chunk OUTPUT
Calling ipv4 output routine from low level src addr:c0a80102
Destination is c0a80102
RTP route is 00000000006754B0 through
IP output returns 0
m-c-o put out 1
Ok, we have put out 1 chunks
chunk OUTPUT returns
Calling ipv4 output routine from low level src addr:c0a80102
Destination is c0a80102
RTP route is 00000000006754B0 through
Before WSARecvMsg
IP output returns 0
m-c-o put out 1
Ok, we have put out 1 chunks
Timer now complete (type = 5)
WSARecvMsg() returned 0
WSAGetLastError() returned 0 (0)
recv_function_udp: Received 56 bytes. - calling sctp_common_input_processing with off=12
Ok, Common input processing called, m:0000000000662C60 iphlen:0 offset:12 length:56 stcb:0000000000674680
stcb:0000000000674680 state:8
sctp_process_control: iphlen=0, offset=12, length=56 stcb:0000000000674680
sctp_process_control: processing a chunk type=5, len=44
SCTP_HEARTBEAT_ACK
Check for chunk output prw:1048039 tqe:1 tf=0
Before WSARecvMsg
WSARecvMsg() returned -1
WSAGetLastError() returned 10060 (274c)
Before WSARecvMsg
WSARecvMsg() returned -1
WSAGetLastError() returned 10060 (274c)
Before WSARecvMsg
WSARecvMsg() returned -1
WSAGetLastError() returned 10060 (274c)
Before WSARecvMsg
WSARecvMsg() returned -1
WSAGetLastError() returned 10060 (274c)
Debug logs (Unsuccessful connection)

Address set!
Client ready for connection
SCTP: add HMAC id 1 to list
Bind called port: 0
Addr: IPv6 address: 0:0:0:0:0:0:0:0:port:0 scope:0
Main hash to bind at head:0000000000EB7020, bound port:59598 - in tcp_pool=0
Allocate an association for peer:IPv4 address: 192.168.1.2:9500
Port:9500
Adding an address (from:1) to the peer: IPv4 address: 192.168.1.2:9500
Association 0000000000EBAE90 now allocated
Sending INIT
Sending INIT - calls lowlevel_output
Select source addr for:IPv4 address: 192.168.1.2:9500
ifn from route:0000000000000000 ifn_index:1
No ifn emit interface?
Trying Plan B
Examine interface {8B0878A2-ABFC-4045-AC13-B242D16856CE}
Is destination preferred:IPv4 address: 192.168.1.2:0
src_loop:0 src_priv:1 src_glob:0
dest_loop:0 dest_priv:1 dest_glob:0
YES
Found ifn:0000000000000000 1 preferred source addresses
num preferred:1 on interface:0000000000EB9900 cur_addr_num:0
Is destination preferred:IPv4 address: 192.168.1.2:0
src_loop:0 src_priv:1 src_glob:0
dest_loop:0 dest_priv:1 dest_glob:0
YES
we selected 0
Source:IPv4 address: 192.168.1.2:0
Dest:IPv4 address: 192.168.1.2:9500
Calling ipv4 output routine from low level src addr:c0a80102
Destination is c0a80102
RTP route is 0000000000EBBCC0 through
IP output returns 0
Client trying to establish a connection to 192.168.1.2
Timer type 2 goes off
Error count for 0000000000EBBA50 now 1 thresh:5
Overall error count for 0000000000EBAEE8 now 1 thresh:8 state:1
Sending INIT
Sending INIT - calls lowlevel_output
Calling ipv4 output routine from low level src addr:c0a80102
Destination is c0a80102
RTP route is 0000000000EBBCC0 through
IP output returns 0
Timer now complete (type = 2)
Timer type 2 goes off
Error count for 0000000000EBBA50 now 2 thresh:5
Overall error count for 0000000000EBAEE8 now 2 thresh:8 state:1
Sending INIT
Sending INIT - calls lowlevel_output
Calling ipv4 output routine from low level src addr:c0a80102
Destination is c0a80102
RTP route is 0000000000EBBCC0 through
IP output returns 0
Timer now complete (type = 2)

Captured packets: sctp_capture_debug.zip (there are first three successful connections, the fourth was failed)

该提问来源于开源项目:sctplab/usrsctp

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

10条回答

  • weixin_39683978 weixin_39683978 4月前

    Leave it open. I can try to see if I can improve the seeding of the random number generator. Give me a day or two...

    点赞 评论 复制链接分享
  • weixin_39647977 weixin_39647977 4月前

    Actually, I reproduced this on Ubuntu as well. All ports are unique, two of the clients was unable to connect to the server in this test run:

    test

    点赞 评论 复制链接分享
  • weixin_39647977 weixin_39647977 4月前

    This problem occurs only when clients are connecting quickly and sequentially, it never happens with a single client.

    点赞 评论 复制链接分享
  • weixin_39683978 weixin_39683978 4月前

    Can you describe how to reproduce the problem on a Linux system?

    点赞 评论 复制链接分享
  • weixin_39683978 weixin_39683978 4月前

    Another note: The intended usage is one stack per host, not one stack per application. So if your multiple clients use different UDP ports but use the same local SCTP port, that would not work. However, you should see some packets on the wire. The SCTP port number collisions could happen due to the random number generator initialisations based on the time... So can you check that not only the UDP port numbers (you control) are different on the client side and the SCTP port numbers on the client side (which you might control or not, depending on your usage of sctp_bind() calls on the client side) are different?

    点赞 评论 复制链接分享
  • weixin_39647977 weixin_39647977 4月前

    The SCTP port number collisions could happen due to the random number generator initialisations based on the time...

    It seems that this is the reason why it happens, when I add a one second delay between processes spawning, then everything works fine 64/64 is connected successfully.

    Can you describe how to reproduce the problem on a Linux system?

    I'm using a standard one to one setup with up-calls and then just spawning client processes via bash script sequentially. They don't send/receive any messages, just connecting to the server and transmit heartbeat packets with a delay of 5 seconds, all other parameters are default.

    点赞 评论 复制链接分享
  • weixin_39647977 weixin_39647977 4月前

    SCTP port numbers on the client side (which you might control or not, depending on your usage of sctp_bind() calls on the client side) are different?

    Indeed, I just checked it and found two clients with the same port numbers, both clients were spawned with a difference of less than a second:

    
    Main hash to bind at head:00000000001B6FC0, bound port:62686 - in tcp_pool=0
    Main hash to bind at head:00000000001B6FC0, bound port:62686 - in tcp_pool=0
    
    点赞 评论 复制链接分享
  • weixin_39647977 weixin_39647977 4月前

    Should I close this issue or we can increase the precision of internal port number generator?

    点赞 评论 复制链接分享
  • weixin_39647977 weixin_39647977 4月前

    Yes, I'm generating random encapsulation port for each client process, they should not collide each other, but I'll double check that...

    点赞 评论 复制链接分享
  • weixin_39683978 weixin_39683978 4月前

    You are running the multiple clients by running an application multiple times, right? If that is true, are you making sure the local UDP ports are all different? Can you try to reproduce this?

    点赞 评论 复制链接分享

相关推荐