Recommended consultants for TCP message loss issue?

Rich K rk5devmail at gmail.com
Wed Apr 13 13:15:53 UTC 2022


We have a strange data loss issue that so far we haven't been able to get
to the root cause.

The main point of this question is to get recommendations on consultants we
can hire to help us troubleshoot. We're willing to bring in paid experts
here, but not sure about what company to even ask.

Problem details in case anyone has any suggestions:

We have an application that sends json messages over TCP to an Ubuntu 18.04
box as a place to aggregate logs. We use syslog-ng on the system to collect
the logs and then forward them downstream to another system. We've deployed
the same setup at multiple customer sites, but this particular site we are
missing messages after they enter the server. How do we know?

- Each message has a monotonically increasing sequence number. As a simple
example, we can see that we receive messages 1,2,3 and then miss 4,5,6, and
then get 7,8,9. Average message size is 1kb, occasionally 10kb but that
size is rare.

- Using tcpdump we can also see that ALL the messages come into the
server which we believe rules out packet loss (retrans rate is less than
.1%). 100% is entering the NIC as best we can tell

- Thinking this was an issue with syslog-ng we replaced it with fluentd,
but still experienced message loss (we tried a bunch of additional
syslog-ng tuning - flow control etc. also)

- The tcp buffer is set to 40 gigabits which should be more than enough for
our traffic volume.

- There are no obvious errors that we see, but happy to check other logs we
may have missed if anyone has recommendations.

- Resource usage seems normal and the server does not appear stressed

- This is a VM running on ESXi version ESXi6.5u2-10719125_20190308.

netstat -s output:

 Ip:
    Forwarding: 1
    4593164861 total packets received
    4142886767 forwarded
    0 incoming packets discarded
    300256068 incoming packets delivered
    4592170310 requests sent out
    6968 dropped because of missing route
    157 fragments failed
Icmp:
    14625192 ICMP messages received
    14392207 input ICMP message failed
        echo replies: 93936
        timestamp request: 18
        address mask request: 9
    14500776 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 14401103
        echo requests: 99529
        echo replies: 126
        timestamp replies: 18
IcmpMsg:
        InType0: 93936
        InType3: 14411285
        InType8: 126
        InType11: 119818
        InType13: 18
        InType17: 9
        OutType0: 126
        OutType3: 14401103
        OutType8: 99529
        OutType14: 18
Tcp:
    49655 active connection openings
    922 passive connection openings
    25581 failed connection attempts
    24 connection resets received
    4 connections established
    255130475 segments received
    418782284 segments sent out
    297424 segments retransmitted
    84 bad segments received
    44527 resets sent
Udp:
    9568375 packets received
    771038 packets to unknown port received
    0 packet receive errors
    30315599 packets sent
    0 receive buffer errors
    0 send buffer errors
    IgnoredMulti: 20160522
UdpLite:
TcpExt:
    160 resets received for embryonic SYN_RECV sockets
    43 ICMP packets dropped because they were out-of-window
    160 TCP sockets finished time wait in fast timer
    83 packetes rejected in established connections because of timestamp
    10465104 delayed acks sent
    14675 delayed acks further delayed because of locked socket
    Quick ack mode was activated 7761 times
    30653113 packet headers predicted
    1 congestion windows fully recovered without slow start
    1 congestion windows partially recovered using Hoe heuristic
    TCPDSACKUndo: 1053
    16215 congestion windows recovered without slow start after partial ack
    TCPLostRetransmit: 218
    TCPSackFailures: 2562
    62 timeouts in loss state
    12712 fast retransmits
    805 retransmits in slow start
    TCPTimeouts: 259506
    TCPLossProbes: 51861
    TCPLossProbeRecovery: 175
    TCPSackRecoveryFail: 72
    TCPDSACKOldSent: 7762
    TCPDSACKOfoSent: 150
    TCPDSACKRecv: 67553
    TCPDSACKOfoRecv: 122
    20 connections reset due to unexpected data
    10 connections reset due to early user close
    25203 connections aborted due to timeout
    TCPDSACKIgnoredOld: 875
    TCPDSACKIgnoredNoUndo: 59255
    TCPSpuriousRTOs: 180
    TCPSackShifted: 2918
    TCPSackMerged: 28194
    TCPSackShiftFallback: 20170
    IPReversePathFilter: 28269
    TCPRcvCoalesce: 919433
    TCPOFOQueue: 94994
    TCPOFOMerge: 150
    TCPChallengeACK: 81
    TCPSYNChallenge: 85
    TCPSpuriousRtxHostQueues: 23
    TCPAutoCorking: 19246152
    TCPFromZeroWindowAdv: 64
    TCPToZeroWindowAdv: 64
    TCPWantZeroWindowAdv: 155
    TCPSynRetrans: 214009
    TCPOrigDataSent: 393676699
    TCPHystartTrainDetect: 4
    TCPHystartTrainCwnd: 107
    TCPHystartDelayDetect: 1450
    TCPHystartDelayCwnd: 29424
    TCPACKSkippedPAWS: 59
    TCPACKSkippedSeq: 73
    TCPACKSkippedChallenge: 4
    TCPWinProbe: 437
    TCPKeepAlive: 474
IpExt:
    InMcastPkts: 343
    OutMcastPkts: 19138
    InBcastPkts: 20160522
    InOctets: 2691133820485
    OutOctets: 5716198198735
    InMcastOctets: 12348
    OutMcastOctets: 2334836
    InBcastOctets: 1672743338
    InNoECTPkts: 7175709883
    InECT0Pkts: 90382743
Sctp:
    0 Current Associations
    0 Active Associations
    0 Passive Associations
    0 Number of Aborteds
    0 Number of Graceful Terminations
    0 Number of Out of Blue packets
    0 Number of Packets with invalid Checksum
    0 Number of control chunks sent
    0 Number of ordered chunks sent
    0 Number of Unordered chunks sent
    0 Number of control chunks received
    0 Number of ordered chunks received
    0 Number of Unordered chunks received
    0 Number of messages fragmented
    0 Number of messages reassembled
    0 Number of SCTP packets sent
    0 Number of SCTP packets received
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20220413/fcf03474/attachment.html>


More information about the ubuntu-users mailing list