Monday 2018-09-03

Stateless UDP can anycast easily -- problems arise when trying to anycast TCP, witness an ECMP + TTL mangling issue:

Starting with our core router, I confirmed that its ECMP hashing was consistent such that Fastly-bound traffic always went to border router 1 or border router 2. Then I looked at the ECMP hashing scheme on our border routers and noticed something unique - by default Arista also uses TTL:

IPv4 hash fields: Source IPv4 Address is ON, Protocol is ON, Time-To-Live is ON, Destination IPv4 Address is ON

Since the source and destination IPs and protocol weren't changing, perhaps the TTL was not consistent? I opened the first packet trace in Wireshark and jackpot - the TTL value was 128 on SYN but 127 on the TLS/SSL Client Hello. I adjusted the Arista load-balancing profile not to use TTL and immediately my MTR in the background changed and all the sites on the lab machine that couldn't load before . were now loading.


Funny that CDNs have pejoratively labelled this situation "spraying flows" which they are supposed to transparently handle. Perhaps their edge solutions are not as complete as expected.

Bill Herrin suggests sequence number ranges.