We have implemented Coral Talk in AWS Fargate. An Application load balancer sits in front and serves Talk content to the world. We have been experiencing bursts of 502’s randomnly throughout the day with no visible pattern. We believe our problem possibly lies within an incorrect timeout configuration. The keep_alive timeout is 30 seconds by default. The idle timeout on the ALB is 60 seconds. Stickyness on the ALB is active and is set to one second less than keep alive, 29 seconds. The ALB can be configured to use websockets rather than http, but currently it is using http. A CDN sits in front of the ALB. We bypass websockets in the CDN with the coraltalk configuration that does so.
Has anyone had this experience in AWS or any advice on timeouts to help our error rates.