[e2e] query on behaviour of tcp_keepalive and tcp retransmit on Linux based systems

Tue Feb 22 06:55:13 PST 2011

Hi

According to your description, the expected behavior should be as follows.
At the beginning senders at one side can send data to the receivers at 
the other side, and the receivers can receive data without any problem. 
When some of the receivers become off-line, the affected senders should 
no long receive positive acknowledgments, therefore, lowering their 
congestion windows (i.e., sending rate). Since in your case the receiver 
is off forever, some senders should further experience timeout events. 
After a few timeouts, the sender should CLOSE this connection itself.

As far as I know, the whole procedure above should be automatically 
invoked in the sender side. This is how TCP (sender) handles exceptions.

My suggestion is that you run a simple experiment on your side to see if 
TCP in your machine can work that way. The test can be done using i-perf 
to send a long long live TCP flow, and then take off the receiver in the 
middle of the transmission. The connection is expected to be closed very 
soon after the receiver is off.

Hope it helpful.
Yan
On 2/22/2011 4:24 AM, Zama Ques wrote:
> We need some clarifications on TCP_keepalive .  We are facing some 
> issues on our Prod servers related to TCP functionality .
>
> The issue is like this.
>
> We have some machines at one end sending data in real time to another 
> group of machines on the other hand .  Now due to some hardware issues 
> on the other hand , some of the machines becomes unresponsive/crashes. 
> The client system which pumps data never came to know that the server 
> went unresponsive . The connection remains in
> ESTABLISHED state and the client always tries to send data thinking 
> that the connection is alive because of which we are seeing backlog on 
> client sides.
>
> Our understanding is like this on how TCP will handle the connection.
>
>
> Q 1) Since  the server went down , the client will try to the 
> retransmit the data until it times out. What is the behavior of TCP 
> after the timeout? Need clarification on
> the following things.
> a) Will the kernel will close the established connection after the 
> timeout . Looks like no in our case as we still see the connection 
> still in ESTABLISHED state after around more
> than 2 hours.
> b) Are there any kernel parameters which decides the when the client 
> is timeout after retransmission fails. What is the behavior of TCP 
> after the client retransmission timeouts.
>
>
> Q 2 ) There is something called tcp_keepalive which if implemented in 
> the kernel , by default it's there and comes to be around 2 hrs 2 
> minsutes , i think  ,  the client will send some TCP probes after the 
> keepalive time ineterval and if it cannot reach the server , then the 
> established connection in the client side will be closed by the kernel 
> . This is my understanding. But I can see that the connection still 
> remains in established after the tcp_keepalive time . We waited for 
> around 2 hrs 30 minutes but the connection remains in established 
> state only. Tried reducing the keepalive time to be around 10 minutes 
> , but the connection remains in ESTABLISHED state in client side .
>
>
> Where I went wrong .Please clarify my doubts raised above . What 
> should we do to resolve the problem we are seeing above . Any help 
> will be highly appreciated as we are going through a hard time to 
> resolve the issue .
>
> Thanks in Advance
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20110222/50be8540/attachment.html