TCP Man
TCP Man
Name
tcp - TCP protocol
Synopsis
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
tcp_socket = socket(AF_INET, SOCK_STREAM, 0);
Description
This is an implementation of the TCP protocol defined in RFC 793, RFC
1122 and RFC 2001 with the NewReno and SACK extensions. It provides
a reliable, stream-oriented, full-duplex connection between two sockets
on top of ip(7), for both v4 and v6 versions. TCP guarantees that the data
arrives in order and retransmits lost packets. It generates and checks a
per-packet checksum to catch transmission errors. TCP does not
preserve record boundaries.
A newly created TCP socket has no remote or local address and is not
fully specified. To create an outgoing TCP connection use connect(2) to
establish a connection to another TCP socket. To receive new incoming
connections, firstbind(2) the socket to a local address and port and then
call listen(2) to put the socket into the listening state. After that a new
socket for each incoming connection can be accepted using accept(2). A
socket which has had accept(2) orconnect(2) successfully called on it is
fully specified and may transmit data. Data cannot be transmitted on
listening or not yet connected sockets.
Linux supports RFC 1323 TCP high performance extensions. These
include Protection Against Wrapped Sequence Numbers (PAWS),
Window Scaling and Timestamps. Window scaling allows the use of large
(> 64K) TCP windows in order to support links with high latency or
bandwidth. To make use of them, the send and receive buffer sizes must
be increased. They can be set globally with
the /proc/sys/net/ipv4/tcp_wmem and /proc/sys/net/ipv4/tcp_rmem files,
/proc interfaces
System-wide TCP parameter settings can be accessed by files in the
directory /proc/sys/net/ipv4/. In addition, most IP /procinterfaces also
apply to TCP; see ip(7). Variables described as Boolean take an integer
value, with a nonzero value ("true") meaning that the corresponding
option is enabled, and a zero value ("false") meaning that the option is
disabled.
tcp_abc (Integer; default: 0; since Linux 2.6.15)
Control the Appropriate Byte Count (ABC), defined in RFC 3465.
ABC is a way of increasing the congestion window (cwnd) more
slowly in response to partial acknowledgments. Possible values
are:
0
increase cwnd once per acknowledgment (no ABC)
1
increase cwnd once per acknowledgment of full sized segment
2
allow increase cwnd by two if acknowledgment is of two segments
to compensate for delayed acknowledgments.
tcp_abort_on_overflow (Boolean; default: disabled; since Linux 2.4)
Enable resetting connections if the listening service is too slow and
unable to keep up and accept them. It means that if overflow
occurred due to a burst, the connection will recover. Enable this
option only if you are really sure that the listening daemon cannot
be tuned to accept connections faster. Enabling this option can
harm the clients of your server.
tcp_adv_win_scale (integer; default: 2; since Linux 2.4)
Count buffering overhead as bytes/2^tcp_adv_win_scale,
if tcp_adv_win_scale is greater than 0; or bytes-bytes/2^(tcp_adv_win_scale), if tcp_adv_win_scale is less than or equal to
zero.
The socket receive buffer space is shared between the application
and kernel. TCP maintains part of the buffer as the TCP window,
this is the size of the receive window advertised to the other end.
The rest of the space is used as the "application" buffer, used to
isolate the network from scheduling and application latencies.
The tcp_adv_win_scaledefault value of 2 implies that the space
used for the application buffer is one fourth that of the total.
the maximum size of the receive buffer used by each TCP socket.
This value does not override the globalnet.core.rmem_max. This is
not used to limit the size of the receive buffer declared
using SO_RCVBUF on a socket. The default value is calculated
using the formula
max(87380, min(4MB, tcp_mem[1]*PAGE_SIZE/128))
(On Linux 2.4, the default is 87380*2 bytes, lowered to 87380 in
low-memory systems).
tcp_sack (Boolean; default: enabled; since Linux 2.2)
Enable RFC 2018 TCP Selective Acknowledgements.
tcp_slow_start_after_idle (Boolean; default: enabled; since Linux 2.6.18)
If enabled, provide RFC 2861 behavior and time out the congestion
window after an idle period. An idle period is defined as the current
RTO (retransmission timeout). If disabled, the congestion window
will not be timed out after an idle period.
tcp_stdurg (Boolean; default: disabled; since Linux 2.2)
If this option is enabled, then use the RFC 1122 interpretation of the
TCP urgent-pointer field. According to this interpretation, the urgent
pointer points to the last byte of urgent data. If this option is
disabled, then use the BSD-compatible interpretation of the urgent
pointer: the urgent pointer points to the first byte after the urgent
data. Enabling this option may lead to interoperability problems.
tcp_syn_retries (integer; default: 5; since Linux 2.2)
The maximum number of times initial SYNs for an active TCP
connection attempt will be retransmitted. This value should not be
higher than 255. The default value is 5, which corresponds to
approximately 180 seconds.
tcp_synack_retries (integer; default: 5; since Linux 2.2)
The maximum number of times a SYN/ACK segment for a passive
TCP connection will be retransmitted. This number should not be
higher than 255.
tcp_syncookies (Boolean; since Linux 2.2)
Enable TCP syncookies. The kernel must be compiled
with CONFIG_SYN_COOKIES. Send out syncookies when the syn
backlog queue of a socket overflows. The syncookies feature
attempts to protect a socket from a SYN flood attack. This should
be used as a last resort, if at all. This is a violation of the TCP
protocol, and conflicts with other areas of TCP such as TCP
extensions. It can cause problems for clients and relays. It is not
end support it. Normally, the 16 bit window length field in the TCP
header limits the window size to less than 64K bytes. If larger
windows are desired, applications can increase the size of their
socket buffers and the window scaling option will be employed.
If tcp_window_scaling is disabled, TCP will not negotiate the use of
window scaling with the other end during connection setup.
tcp_wmem (since Linux 2.4)
This is a vector of 3 integers: [min, default, max]. These parameters
are used by TCP to regulate send buffer sizes. TCP dynamically
adjusts the size of the send buffer from the default values listed
below, in the range of these values, depending on memory
available.
min
Minimum size of the send buffer used by each TCP socket. The
default value is the system page size. (On Linux 2.4, the default
value is 4K bytes.) This value is used to ensure that in memory
pressure mode, allocations below this size will still succeed. This is
not used to bound the size of the send buffer declared
using SO_SNDBUF on a socket.
default
The default size of the send buffer for a TCP socket. This value
overwrites the initial default buffer size from the generic
global /proc/sys/net/core/wmem_default defined for all protocols.
The default value is 16K bytes. If larger send buffer sizes are
desired, this value should be increased (to affect all sockets). To
employ large TCP windows,
the/proc/sys/net/ipv4/tcp_window_scaling must be set to a nonzero
value (default).
max
The maximum size of the send buffer used by each TCP socket.
This value does not override the value
in/proc/sys/net/core/wmem_max. This is not used to limit the size of
the send buffer declared using SO_SNDBUF on a socket. The
default value is calculated using the formula
max(65536, min(4MB, tcp_mem[1]*PAGE_SIZE/128))
(On Linux 2.4, the default value is 128K bytes, lowered 64K
depending on low-memory systems.)
tcp_workaround_signed_windows (Boolean; default: disabled; since Linux
2.6.26)
Sockets API
TCP provides limited support for out-of-band data, in the form of (a single
byte of) urgent data. In Linux this means if the other end sends newer
out-of-band data the older urgent data is inserted as normal data into the
stream (even whenSO_OOBINLINE is not set). This differs from BSDbased stacks.
Linux uses the BSD compatible interpretation of the urgent pointer field by
default. This violates RFC 1122, but is required for interoperability with
other stacks. It can be changed via /proc/sys/net/ipv4/tcp_stdurg.
It is possible to peek at out-of-band data using
the recv(2) MSG_PEEK flag.
Since version 2.4, Linux supports the use of MSG_TRUNC in
the flags argument of recv(2) (and recvmsg(2)). This flag causes the
received bytes of data to be discarded, rather than passed back in a
caller-supplied buffer. Since Linux 2.4.4,MSG_PEEK also has this effect
when used in conjunction with MSG_OOB to receive out-of-band data.
Ioctls
The following ioctl(2) calls return information in value. The correct syntax
is:
int value;error = ioctl(tcp_socket, ioctl_type, &value);
not set, and SIOCATMARK returns true, then the next read from
the socket will return the bytes following the urgent data (to actually
read the urgent data requires therecv(MSG_OOB) flag).
Note that a read never reads across the urgent mark. If an
application is informed of the presence of urgent data
viaselect(2) (using the exceptfds argument) or through delivery of
a SIGURG signal, then it can advance up to the mark using a loop
which repeatedly tests SIOCATMARK and performs a read
(requesting any number of bytes) as long asSIOCATMARK returns
false.
SIOCOUTQ
Returns the amount of unsent data in the socket send queue. The
socket must not be in LISTEN state, otherwise an error (EINVAL) is
returned. SIOCOUTQ is defined in <linux/sockios.h>. Alternatively,
you can use the synonymousTIOCOUTQ, defined in <sys/ioctl.h>.
Error handling
When a network error occurs, TCP tries to resend the packet. If it doesn't
succeed after some time, either ETIMEDOUT or the last received error on
this connection is reported.
Some applications require a quicker error notification. This can be
enabled with the IPPROTO_IP level IP_RECVERRsocket option. When
this option is enabled, all incoming errors are immediately passed to the
user program. Use this option with care -- it makes TCP less tolerant to
routing changes and other normal network conditions.
Errors
EAFNOTSUPPORT
Passed socket address type in sin_family was not AF_INET.
EPIPE
The other end closed the socket unexpectedly or a read is executed
on a shut down socket.
ETIMEDOUT
The other end didn't acknowledge retransmitted data after some
time.
Any errors defined for ip(7) or the generic socket layer may also be
returned for TCP.
Versions
Support for Explicit Congestion Notification, zero-copy sendfile(2),
reordering support and some SACK extensions (DSACK) were introduced
in 2.4. Support for forward acknowledgement (FACK), TIME_WAIT
recycling, and per-connection keepalive socket options were introduced in
2.3.
Bugs
Not all errors are documented.
IPv6 is not described.
See Also
accept(2), bind(2), connect(2), getsockopt(2), listen(2), recvmsg(2), sendfil
e(2), sendmsg(2), socket(2), ip(7),socket(7)