Setup CockroachDB cluster with HAProxy load balancing

Each CockroachDB node is an equally suitable SQL gateway to your cluster, but to ensure client performance and reliability, it’s important to use load balancing.

  • Performance: Load balancers spread client traffic across nodes. This prevents any one node from being overwhelmed by requests and improves overall cluster performance (queries per second).

  • Reliability: Load balancers decouple client health from the health of a single CockroachDB node. In cases where a node fails, the load balancer redirects client traffic to available nodes.

HAProxy is one of the most popular open-source TCP load balancers, and CockroachDB includes a built-in command for generating a configuration file that is preset to work with your running cluster.

With a single load balancer, client connections are resilient to node failure, but the load balancer itself is a point of failure. It’s therefore best to make load balancing resilient as well by using multiple load balancing instances, with a mechanism like floating IPs or DNS to select load balancers for clients.

For performance and availability reasons, it is not recommended to run the HAproxy on the same node as the cockroachDB.

This post shows a detailed steps how to deploy a HAProxy load balancer for CockroachDB cluster.

Deploy the CockroachDB cluster

[root@host1 ~]# ps -ef | grep cockroach | grep -v grep
root      4783     1 99 18:59 ?        2-17:35:59 cockroach start --log-dir=/var/log/cockroachdb_logs/1 --store=/mnt/cockroachdb_mnt1 --insecure --listen-addr=host1:26257 --http-addr=host1:8080 --join=host1:26257,host1:26258,host1:26259,host1:26260,host1:26261,host1:26262,host1:26263,host1:26264,host2:26257,host2:26258,host2:26259,host2:26260,host2:26261,host2:26262,host2:26263,host2:26264,host3:26257,host3:26258,host3:26259,host3:26260,host3:26261,host3:26262,host3:26263,host3:26264 --cache=2GiB --max-sql-memory=2GiB
root      4876     1 95 18:59 ?        03:33:42 cockroach start --log-dir=/var/log/cockroachdb_logs/2 --store=/mnt/cockroachdb_mnt2 --insecure --listen-addr=host1:26258 --http-addr=host1:8081 --join=host1:26257,host1:26258,host1:26259,host1:26260,host1:26261,host1:26262,host1:26263,host1:26264,host2:26257,host2:26258,host2:26259,host2:26260,host2:26261,host2:26262,host2:26263,host2:26264,host3:26257,host3:26258,host3:26259,host3:26260,host3:26261,host3:26262,host3:26263,host3:26264 --cache=2GiB --max-sql-memory=2GiB
root      4932     1 77 18:59 ?        02:53:13 cockroach start --log-dir=/var/log/cockroachdb_logs/3 --store=/mnt/cockroachdb_mnt3 --insecure --listen-addr=host1:26259 --http-addr=host1:8082 --join=host1:26257,host1:26258,host1:26259,host1:26260,host1:26261,host1:26262,host1:26263,host1:26264,host2:26257,host2:26258,host2:26259,host2:26260,host2:26261,host2:26262,host2:26263,host2:26264,host3:26257,host3:26258,host3:26259,host3:26260,host3:26261,host3:26262,host3:26263,host3:26264 --cache=2GiB --max-sql-memory=2GiB
root      5049     1 99 18:59 ?        06:02:17 cockroach start --log-dir=/var/log/cockroachdb_logs/4 --store=/mnt/cockroachdb_mnt4 --insecure --listen-addr=host1:26260 --http-addr=host1:8083 --join=host1:26257,host1:26258,host1:26259,host1:26260,host1:26261,host1:26262,host1:26263,host1:26264,host2:26257,host2:26258,host2:26259,host2:26260,host2:26261,host2:26262,host2:26263,host2:26264,host3:26257,host3:26258,host3:26259,host3:26260,host3:26261,host3:26262,host3:26263,host3:26264 --cache=2GiB --max-sql-memory=2GiB
root      5131     1 94 18:59 ?        03:31:51 cockroach start --log-dir=/var/log/cockroachdb_logs/5 --store=/mnt/cockroachdb_mnt5 --insecure --listen-addr=host1:26261 --http-addr=host1:8084 --join=host1:26257,host1:26258,host1:26259,host1:26260,host1:26261,host1:26262,host1:26263,host1:26264,host2:26257,host2:26258,host2:26259,host2:26260,host2:26261,host2:26262,host2:26263,host2:26264,host3:26257,host3:26258,host3:26259,host3:26260,host3:26261,host3:26262,host3:26263,host3:26264 --cache=2GiB --max-sql-memory=2GiB
root      5220     1 72 18:59 ?        02:42:39 cockroach start --log-dir=/var/log/cockroachdb_logs/6 --store=/mnt/cockroachdb_mnt6 --insecure --listen-addr=host1:26262 --http-addr=host1:8085 --join=host1:26257,host1:26258,host1:26259,host1:26260,host1:26261,host1:26262,host1:26263,host1:26264,host2:26257,host2:26258,host2:26259,host2:26260,host2:26261,host2:26262,host2:26263,host2:26264,host3:26257,host3:26258,host3:26259,host3:26260,host3:26261,host3:26262,host3:26263,host3:26264 --cache=2GiB --max-sql-memory=2GiB
root      5309     1 94 18:59 ?        03:30:22 cockroach start --log-dir=/var/log/cockroachdb_logs/7 --store=/mnt/cockroachdb_mnt7 --insecure --listen-addr=host1:26263 --http-addr=host1:8086 --join=host1:26257,host1:26258,host1:26259,host1:26260,host1:26261,host1:26262,host1:26263,host1:26264,host2:26257,host2:26258,host2:26259,host2:26260,host2:26261,host2:26262,host2:26263,host2:26264,host3:26257,host3:26258,host3:26259,host3:26260,host3:26261,host3:26262,host3:26263,host3:26264 --cache=2GiB --max-sql-memory=2GiB
root      5358     1 58 18:59 ?        02:11:20 cockroach start --log-dir=/var/log/cockroachdb_logs/8 --store=/mnt/cockroachdb_mnt8 --insecure --listen-addr=host1:26264 --http-addr=host1:8087 --join=host1:26257,host1:26258,host1:26259,host1:26260,host1:26261,host1:26262,host1:26263,host1:26264,host2:26257,host2:26258,host2:26259,host2:26260,host2:26261,host2:26262,host2:26263,host2:26264,host3:26257,host3:26258,host3:26259,host3:26260,host3:26261,host3:26262,host3:26263,host3:26264 --cache=2GiB --max-sql-memory=2GiB

[root@host1 ~]# cockroach node status --insecure --host=host1:26257
  id |       address       |     sql_address     |  build  |         started_at         |         updated_at         | locality | is_available | is_live
-----+---------------------+---------------------+---------+----------------------------+----------------------------+----------+--------------+----------
   1 | host1:26257 | host1:26257 | v22.1.6 | 2022-09-17 18:59:37.989384 | 2022-09-17 22:43:39.507057 |          | true         | true
   2 | host1:26258 | host1:26258 | v22.1.6 | 2022-09-17 18:59:39.128398 | 2022-09-17 22:43:40.640431 |          | true         | true
   3 | host1:26259 | host1:26259 | v22.1.6 | 2022-09-17 18:59:39.374127 | 2022-09-17 22:43:40.885678 |          | true         | true
   4 | host1:26260 | host1:26260 | v22.1.6 | 2022-09-17 18:59:39.726667 | 2022-09-17 22:43:36.73931  |          | true         | true
   5 | host1:26261 | host1:26261 | v22.1.6 | 2022-09-17 18:59:39.971549 | 2022-09-17 22:43:36.983538 |          | true         | true
   6 | host1:26262 | host1:26262 | v22.1.6 | 2022-09-17 18:59:40.225358 | 2022-09-17 22:43:37.235768 |          | true         | true
   7 | host1:26263 | host1:26263 | v22.1.6 | 2022-09-17 18:59:40.447688 | 2022-09-17 22:43:37.459962 |          | true         | true
   8 | host1:26264 | host1:26264 | v22.1.6 | 2022-09-17 18:59:40.677217 | 2022-09-17 22:43:37.690883 |          | true         | true
   9 | host2:26257 | host2:26257 | v22.1.6 | 2022-09-17 18:59:41.604348 | 2022-09-17 22:43:38.613662 |          | true         | true
  10 | host2:26258 | host2:26258 | v22.1.6 | 2022-09-17 18:59:41.904071 | 2022-09-17 22:43:38.912773 |          | true         | true
  11 | host2:26259 | host2:26259 | v22.1.6 | 2022-09-17 18:59:42.194446 | 2022-09-17 22:43:39.202688 |          | true         | true
  12 | host2:26260 | host2:26260 | v22.1.6 | 2022-09-17 18:59:42.512201 | 2022-09-17 22:43:39.519727 |          | true         | true
  13 | host2:26261 | host2:26261 | v22.1.6 | 2022-09-17 18:59:42.850974 | 2022-09-17 22:43:39.860063 |          | true         | true
  14 | host2:26262 | host2:26262 | v22.1.6 | 2022-09-17 18:59:43.177162 | 2022-09-17 22:43:40.191082 |          | true         | true
  15 | host2:26263 | host2:26263 | v22.1.6 | 2022-09-17 18:59:43.543973 | 2022-09-17 22:43:40.553146 |          | true         | true
  16 | host2:26264 | host2:26264 | v22.1.6 | 2022-09-17 18:59:43.858993 | 2022-09-17 22:43:40.86705  |          | true         | true
  17 | host3:26257 | host3:26257 | v22.1.6 | 2022-09-17 18:59:44.807641 | 2022-09-17 22:43:37.315927 |          | true         | true
  18 | host3:26258 | host3:26258 | v22.1.6 | 2022-09-17 18:59:45.119533 | 2022-09-17 22:43:37.628497 |          | true         | true
  19 | host3:26259 | host3:26259 | v22.1.6 | 2022-09-17 18:59:45.508535 | 2022-09-17 22:43:38.015714 |          | true         | true
  20 | host3:26260 | host3:26260 | v22.1.6 | 2022-09-17 18:59:45.901631 | 2022-09-17 22:43:38.409443 |          | true         | true
  21 | host3:26261 | host3:26261 | v22.1.6 | 2022-09-17 18:59:46.191724 | 2022-09-17 22:43:38.700531 |          | true         | true
  22 | host3:26262 | host3:26262 | v22.1.6 | 2022-09-17 18:59:46.540504 | 2022-09-17 22:43:39.048942 |          | true         | true
  23 | host3:26263 | host3:26263 | v22.1.6 | 2022-09-17 18:59:46.944353 | 2022-09-17 22:43:39.453175 |          | true         | true
  24 | host3:26264 | host3:26264 | v22.1.6 | 2022-09-17 18:59:47.274458 | 2022-09-17 22:43:39.782378 |          | true         | true
(24 rows)

Generate HAProxy configuration file

Install HAProxy on one of the CockroachDB nodes:

[root@host1 ~]# yum install haproxy
[root@host1 ~]# haproxy -v
HA-Proxy version 1.5.18 2016/05/10
Copyright 2000-2016 Willy Tarreau <willy@haproxy.org>

Generate the HAProxy configuration file:

[root@host1 ~]# cockroach gen haproxy --host=host1:26257 --insecure

[root@host1 ~]# cat haproxy.cfg

global
  maxconn 4096

defaults
    mode                tcp

    # Timeout values should be configured for your specific use.
    # See: https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#4-timeout%20connect

    # With the timeout connect 5 secs,
    # if the backend server is not responding, haproxy will make a total
    # of 3 connection attempts waiting 5s each time before giving up on the server,
    # for a total of 15 seconds.
    retries             2
    timeout connect     5s

    # timeout client and server govern the maximum amount of time of TCP inactivity.
    # The server node may idle on a TCP connection either because it takes time to
    # execute a query before the first result set record is emitted, or in case of
    # some trouble on the server. So these timeout settings should be larger than the
    # time to execute the longest (most complex, under substantial concurrent workload)
    # query, yet not too large so truly failed connections are lingering too long
    # (resources associated with failed connections should be freed reasonably promptly).
    timeout client      10m
    timeout server      10m

    # TCP keep-alive on client side. Server already enables them.
    option              clitcpka

listen psql
    bind :26257
    mode tcp
    balance roundrobin
    option httpchk GET /health?ready=1
    server cockroach1 host1:26257 check port 8080
    server cockroach2 host1:26258 check port 8081
    server cockroach3 host1:26259 check port 8082
    server cockroach4 host1:26260 check port 8083
    server cockroach5 host1:26261 check port 8084
    server cockroach6 host1:26262 check port 8085
    server cockroach7 host1:26263 check port 8086
    server cockroach8 host1:26264 check port 8087
    server cockroach9 host2:26257 check port 8080
    server cockroach10 host2:26258 check port 8081
    server cockroach11 host2:26259 check port 8082
    server cockroach12 host2:26260 check port 8083
    server cockroach13 host2:26261 check port 8084
    server cockroach14 host2:26262 check port 8085
    server cockroach15 host2:26263 check port 8086
    server cockroach16 host2:26264 check port 8087
    server cockroach17 host3:26257 check port 8080
    server cockroach18 host3:26258 check port 8081
    server cockroach19 host3:26259 check port 8082
    server cockroach20 host3:26260 check port 8083
    server cockroach21 host3:26261 check port 8084
    server cockroach22 host3:26262 check port 8085
    server cockroach23 host3:26263 check port 8086
    server cockroach24 host3:26264 check port 8087

Start HAProxy

It’s not recommended to deploy HAProxy on the same node as CockroachDB for performance and availability reasons. Thus, we are going to start the HAProxy on a dedicated node.

Connect to the dedicated HAProxy node and install the proxy on it:

[root@host4 ~]# yum install haproxy

Start HAproxy with the generated configuration file:

[root@host4 ~]# scp host1:/root/haproxy.cfg ./
[root@host4 ~]# haproxy -f haproxy.cfg

Verify the HAProxy

The Load Balancer distributes the requests sent to the cluster equally between the different nodes.

From primary node, we can check which node is being connected as below:

[root@host1 ~]# cockroach sql --host=host4 --insecure
root@host4:26257/defaultdb> SHOW node_id;
  node_id
-----------
  1

root@host4:26257/defaultdb> exit
[root@host1 ~]# cockroach sql --host=host4 --insecure
root@host4:26257/defaultdb> SHOW node_id;
  node_id
-----------
  2

[root@host1 ~]# cockroach sql --host=host4 --insecure
root@host4:26257/defaultdb> SHOW node_id;
  node_id
-----------
  3

Run TPCC through the HAProxy load balancer

[root@host4 cockroachdb]# cockroach workload run tpcc --warehouses=5000 --split --scatter --ramp=30s --duration=5m 'postgresql://root@host4:26257/tpcc?sslmode=disable'

Reference