TCP接続が多すぎると切断が発生します

Question

TCP接続で実行するゲームサーバーがあります。サーバーはユーザーをランダムに切断します。TCPサーバーの設定に関連すると思います。

ローカル開発環境では、記述されたコードは、（localhostで）切断やエラーなしに8000以上の同時ユーザーを処理できます。

しかし、実際に配備されたCentos 5 64ビットサーバーでは、サーバーは同時TCP接続量から独立してこれらの切断を作成します。

サーバーはスループットを処理できないようです。

netstat -s -t IcmpMsg: InType0: 31 InType3: 87717 InType4: 699 InType5: 2 InType8: 1023781 InType11: 7211 OutType0: 1023781 OutType3: 603 Tcp: 8612766 active connections openings 14255236 passive connection openings 12174 failed connection attempts 319225 connection resets received 723 connections established 6351090913 segments received 6180297746 segments send out 45791634 segments retransmited 0 bad segments received. 1664280 resets sent TcpExt: 46244 invalid SYN cookies received 3745 resets received for embryonic SYN_RECV sockets 327 ICMP packets dropped because they were out-of-window 1 ICMP packets dropped because socket was locked 11475281 TCP sockets finished time wait in fast timer 140 time wait sockets recycled by time stamp 1569 packets rejects in established connections because of timestamp 103783714 delayed acks sent 6929 delayed acks further delayed because of locked socket Quick ack mode was activated 6210096 times 1806 times the listen queue of a socket overflowed 1806 SYNs to LISTEN sockets ignored 1080380601 packets directly queued to recvmsg prequeue. 31441059 packets directly received from backlog 5272599307 packets directly received from prequeue 324498008 packets header predicted 1143146 packets header predicted and directly queued to user 3217838883 acknowledgments not containing data received 1027969883 predicted acknowledgments 395 times recovered from packet loss due to fast retransmit 257420 times recovered from packet loss due to SACK data 5843 bad SACKs received Detected reordering 29 times using FACK Detected reordering 12 times using SACK Detected reordering 1 times using reno fast retransmit Detected reordering 809 times using time stamp 1602 congestion windows fully recovered 1917 congestion windows partially recovered using Hoe heuristic TCPDSACKUndo: 8196226 7850525 congestion windows recovered after partial ack 139681 TCP data loss events TCPLostRetransmit: 26 10139 timeouts after reno fast retransmit 2802678 timeouts after SACK recovery 86212 timeouts in loss state 273698 fast retransmits 19494 forward retransmits 2637236 retransmits in slow start 33381883 other TCP timeouts TCPRenoRecoveryFail: 92 19488 sack retransmits failed 7 times receiver scheduled too late for direct processing 6354641 DSACKs sent for old packets 333 DSACKs sent for out of order packets 20615579 DSACKs received 2724 DSACKs for out of order packets received 123034 connections reset due to unexpected data 91876 connections reset due to early user close 169244 connections aborted due to timeout 28736 times unabled to send RST due to no memory IpExt: InMcastPkts: 2

私が考えさせているのは、これらは非常に問題があるようです。

123034 connections reset due to unexpected data 91876 connections reset due to early user close 28736 times unabled to send RST due to no memory

これらのエラーを修正するにはどうすればよいですか？ TCPチューニングする必要がありますか？

編集：いくつかのsysctl情報：

sysctl -A | grep net | grep mem net.ipv4.udp_wmem_min = 4096 net.ipv4.udp_rmem_min = 4096 net.ipv4.udp_mem = 772704 1030272 1545408 net.ipv4.tcp_rmem = 4096 87380 4194304 net.ipv4.tcp_wmem = 4096 16384 4194304 net.ipv4.tcp_mem = 196608 262144 393216 net.ipv4.igmp_max_memberships = 20 net.core.optmem_max = 20480 net.core.rmem_default = 129024 net.core.wmem_default = 129024 net.core.rmem_max = 131071 net.core.wmem_max = 131071

編集：検出された2つのイーサネットカードのethtool情報：

Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Link detected: yes Settings for eth1: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: Unknown! Duplex: Half Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Link detected: no

POLLOX · Accepted Answer

FD制限を増やしますか？ここでいくつかの情報を取得できます http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/

Trevor Benson · Answer

「サーバーがユーザーをランダムに切断する」とは、クライアントが予期したFIN、ACK、RST通信なしで切断することを意味する場合、特に開発環境に両方のNICが全二重である場合は、最初に半二重インターフェースを解決します。 Auto-negotiation = onのときにeth1インターフェイスが半二重になっているのは、通常、次のいずれかが原因です。

スイッチとサーバー間の自動ネゴシエーションが失敗しました。
自動ネゴシエーションを無効にし、ポートの速度とデュプレックスを明示的に設定しているスイッチ。

私は状況2でより頻繁に見ますが、それはおそらく、オートネゴシエーションの検査の失敗を故意に発見してから10年以上経過しているためと考えられます。片側が自動で、もう一方がハードコーディングされている（または応答に失敗した）場合のイーサネット自動ネゴシエーション動作は、自動側が半二重モードにドロップするためのものです。

簡単に言えば、Eth1が半二重になっていると、サーバーは送信と受信ではなく、インターフェースを介してデータの送受信のみを行います。ハードコードされた側は引き続き全二重モードであり、サーバーからデータを受信している間にサーバーにデータを送信しようとします。ただし、全二重が衝突ドメインを排除する衝突ドメインを想定しているため、サーバーはこれを衝突と見なします。サーバーは、バックオフアルゴリズムを使用して再送信をスケジュールします。サーバーは、衝突と思われるものを経験し続けた場合、データの再送信を待機する時間を増やし続けます。

したがって、半二重と全二重のパートナーがあると、クライアントの切断、スループットまたはパフォーマンスの問題、レイテンシスパイクなどのさまざまな問題が発生しやすくなります。