View : 0

12/04/2026 18:16pm

EP.123 Load Balancing & Sticky Sessions for WebSocket (Production-Ready Guide)

EP.123 Load Balancing & Sticky Sessions for WebSocket (Production-Ready Guide)

#Go

#Golang

#Real-time

#Kubernetes

#WebSocket

When your WebSocket server starts handling a large number of concurrent users, simply adding more servers or pods is no longer enough.

 

In real-world systems, common problems often appear:

  • Clients get disconnected unexpectedly
  • Messages are delayed or never delivered
  • Clients keep reconnecting in loops, eventually overloading the system

 

In most cases, the root cause is the same: 👉 a load balancer that does not properly understand WebSocket connections and lacks sticky session support

 

This article walks you through everything you need to know from core concepts, to practical implementation, to production-grade best practices.

 

🎯 Goals of This Article

 

After reading this article, you will understand:

  • Why WebSocket requires sticky sessions
  • Which types of load balancers truly support WebSocket
  • How to scale your system without dropping connections
  • How to design a WebSocket server that is ready for production

 

🧠 Why WebSocket Is Not HTTP

 

HTTP (Stateless)

  • Request → Response → Done
  • Load balancers can freely distribute requests
  • Each request may hit a different server without affecting the user

 

WebSocket (Stateful)

  • Connect once → stay connected for a long time
  • The connection is bound to a single server
  • If traffic is routed to another server → ❌ the connection breaks immediately

 

🔑 A WebSocket connection must stay on the same server for its entire lifetime.

 

🍪 What Is a Sticky Session?

 

A sticky session tells the load balancer:

“This client must always be routed to the same backend server.”

 

Common sticky session strategies used in practice:

  • Cookie-based affinity
  • IP hash
  • Header-based routing

 

👉 For WebSocket, sticky sessions are mandatory not optional.

 

❌ What Happens Without Sticky Sessions?

 

A very common real-world scenario:

  1. A client connects to WebSocket via a load balancer
  2. The load balancer routes it to Pod A
  3. The system scales and adds Pod B
  4. The next packet is routed to Pod B
  5. ❌ The WebSocket connection breaks instantly

 

What users experience:

  • Random disconnects
  • Endless reconnect loops
  • Lost messages

 

⚖️ Load Balancer Types That Work with WebSocket

 

1. Layer 4 Load Balancer (TCP)

The best option for WebSocket

 

Examples:

  • AWS Network Load Balancer (NLB)
  • GCP TCP Load Balancer
  • HAProxy (TCP mode)

 

Advantages:

  • No protocol inspection
  • Stable, long-lived connections
  • Sticky behavior by nature

 

2. Layer 7 Load Balancer (HTTP)

Can be used, but only with proper configuration

 

Examples:

  • Nginx
  • AWS Application Load Balancer (ALB)
  • Traefik

 

Required settings:

  • Support Upgrade: websocket
  • Sticky sessions enabled
  • Long idle timeout (at least 1–2 hours)

 

⚙️ Sticky Session Example (Nginx)

 

upstream websocket_backend {
    ip_hash;
    server ws1:8080;
    server ws2:8080;
}

server {
    location /ws {
        proxy_pass http://websocket_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_read_timeout 1h;
    }
}

 

ip_hash provides a simple and effective form of sticky sessions that works well in many scenarios.

 

📦 Sticky Sessions on Kubernetes

 

1. Service with Session Affinity

spec:
  sessionAffinity: ClientIP

 

2. Ingress (NGINX Ingress)

nginx.ingress.kubernetes.io/affinity: "cookie"
nginx.ingress.kubernetes.io/session-cookie-name: "ws-session"

 

🔧 Example: Production-Friendly WebSocket Server in Go

 

package main

import (
	"log"
	"net/http"

	"github.com/gorilla/websocket"
)

var upgrader = websocket.Upgrader{
	CheckOrigin: func(r *http.Request) bool {
		return true
	},
}

func wsHandler(w http.ResponseWriter, r *http.Request) {
	conn, err := upgrader.Upgrade(w, r, nil)
	if err != nil {
		log.Println("upgrade error:", err)
		return
	}
	defer conn.Close()

	for {
		msgType, msg, err := conn.ReadMessage()
		if err != nil {
			log.Println("read error:", err)
			break
		}

		if err := conn.WriteMessage(msgType, msg); err != nil {
			log.Println("write error:", err)
			break
		}
	}
}

func main() {
	http.HandleFunc("/ws", wsHandler)
	log.Println("WebSocket server started on :8080")
	http.ListenAndServe(":8080", nil)
}

 

⚠️ Important:
When a load balancer sits in front of this server, sticky sessions must be enabled otherwise the connection will not be routed back to the same server.

 

📈 Scaling Without Dropping Connections

 

❌ What Not to Do

  • Kill pods immediately
  • Perform rolling updates without considering active connections

 

✅ Correct Approach

  • Mark pods as draining
  • Stop accepting new connections
  • Wait for clients to disconnect naturally
  • Then terminate the pod

 

🧠 Key Design Principle: Stateless WebSocket Server

 

To scale effectively:

  • Do not store critical state only in memory
  • Use Redis or an external data store
  • Allow clients to reconnect and recover session state

 

🔁 Safe Reconnection Strategy

 

Production systems should support:

  • Client-side reconnect logic
  • Session resumption
  • Token-based authentication
  • Idempotent message handling

 

🧪 What to Test Before Production

 

  • Scale pods while users are actively connected
  • Mass reconnect scenarios
  • Chaos testing (random pod termination)
  • Message loss and duplication detection

 

🚀 Challenge: Try It Yourself

 

  • Enable sticky sessions on your load balancer
  • Scale your WebSocket servers during real usage
  • Observe whether users get disconnected
  • Tune timeouts and draining behavior

 

If users don’t notice anything at all,
your system has truly passed the production test ✅

 


 

🔮 Coming Next EP.124: Advanced Security & Authentication for WebSocket

 

Next episode, we’ll dive into:

  • JWT and token strategies
  • Preventing WebSocket hijacking
  • Enterprise-grade secure handshakes