# High Availability

This page covers strategies for running ilagent in a highly available setup depending on your consumer type.

## Kafka

Kafka has built-in consumer group support, making HA straightforward.

Run multiple ilagent instances with the **same `--kafka_group_id`**. Kafka automatically distributes partitions across consumers in the group. If one instance dies, its partitions are reassigned to the remaining consumers.

```sh
# Instance 1
ilagent daemon --kafka_brokers kafka:9092 --kafka_group_id ilagent -e 'events'

# Instance 2
ilagent daemon --kafka_brokers kafka:9092 --kafka_group_id ilagent -e 'events'
```

Requirements:

* The number of topic partitions must be equal to or greater than the number of instances
* All instances must use the same `--kafka_group_id`

No additional configuration is needed — Kafka handles rebalancing, offset tracking, and failover automatically.

## HTTP proxy

Run multiple ilagent instances behind a load balancer. Each instance maintains its own SQLite retry queue.

```
clients ── load balancer ── ilagent-1 (own SQLite)
                        └── ilagent-2 (own SQLite)
```

ilert deduplicates events via `alertKey`, so if both instances receive and forward the same event, it will only create a single alert. Make sure your events include a consistent `alertKey` for deduplication to work correctly.

## MQTT

MQTT is the most nuanced case because the protocol does not have a native consumer group concept. There are several strategies, each with different trade-offs.

### Option 1: Shared subscriptions (recommended)

MQTT v5 introduced [shared subscriptions](https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901250), which distribute messages across subscribers in a named group — similar to Kafka consumer groups. The broker delivers each message to only one subscriber in the group.

ilagent supports this via the `--mqtt_shared_group` flag:

```sh
# Instance 1
ilagent daemon -m broker:1883 -e 'ilert/events' --mqtt_shared_group ilagent --mqtt_qos 1

# Instance 2
ilagent daemon -m broker:1883 -e 'ilert/events' --mqtt_shared_group ilagent --mqtt_qos 1
```

Under the hood, ilagent subscribes to `$share/ilagent/ilert/events` instead of `ilert/events`. The broker handles load balancing — messages are distributed across instances and delivered to exactly one consumer in the group.

**Requirements:**

* Your MQTT broker must support MQTT v5 shared subscriptions (Mosquitto 2.x, HiveMQ, EMQX, VerneMQ, and most modern brokers)
* All instances must use the same `--mqtt_shared_group` value
* Combine with `--mqtt_qos 1` to ensure at-least-once delivery

### Option 2: Active-passive failover

Run two instances, but only one actively subscribes. Use ilert heartbeat monitoring (`-b il1hbt123...`) on the active instance — if the heartbeat stops, your orchestration layer (Kubernetes, systemd, etc.) switches to the standby.

```sh
# Active instance
ilagent daemon -m broker:1883 -e 'ilert/events' -b il1hbt123... --mqtt_qos 1 --mqtt_buffer

# Standby instance (started by orchestrator on failover)
```

This approach is simple and works with any MQTT broker, but has a failover gap while the switch happens. Using `--mqtt_buffer` ensures that events received before a crash are persisted in SQLite and retried on restart.

### Option 3: Active-active with idempotency

Run multiple instances, all subscribing to the same topics. Every instance receives and processes every message.

```sh
# Instance 1
ilagent daemon -m broker:1883 -e 'ilert/events' --mqtt_qos 1

# Instance 2
ilagent daemon -m broker:1883 -e 'ilert/events' --mqtt_qos 1
```

This works because:

* **Events** — ilert deduplicates on `alertKey`, so duplicate submissions are harmless
* **Escalation policy updates** — the PUT calls are idempotent (setting the same user on the same level twice has no side effect)

The trade-off is doubled API traffic and processing load. This approach requires no broker-side support and works with any MQTT version.

### Option 4: Topic partitioning

Split your messages across different topics by site, zone, or function. Each ilagent instance handles a dedicated subset:

```sh
# Instance 1 — handles site A
ilagent daemon -m broker:1883 -e 'site-a/events'

# Instance 2 — handles site B
ilagent daemon -m broker:1883 -e 'site-b/events'
```

This requires cooperation from the publisher side but gives you precise control over load distribution. Failure of one instance only affects its assigned topics.

## Recommendations

| Setup                    | Strategy                       | Broker requirement |
| ------------------------ | ------------------------------ | ------------------ |
| **Kafka**                | Consumer groups (built-in)     | —                  |
| **HTTP**                 | Load balancer + alertKey dedup | —                  |
| **MQTT** (modern broker) | Shared subscriptions           | MQTT v5            |
| **MQTT** (any broker)    | Active-active with idempotency | None               |
| **MQTT** (simple)        | Active-passive with heartbeat  | None               |

For most MQTT deployments, we recommend **shared subscriptions** (`--mqtt_shared_group`) combined with `--mqtt_qos 1` and `--mqtt_buffer` for the strongest delivery guarantees.
