# High Availability

This page covers strategies for running ilagent in a highly available setup depending on your consumer type.

## Kafka

Kafka has built-in consumer group support, making HA straightforward.

Run multiple ilagent instances with the **same `--kafka_group_id`**. Kafka automatically distributes partitions across consumers in the group. If one instance dies, its partitions are reassigned to the remaining consumers.

```sh
# Instance 1
ilagent daemon --kafka_brokers kafka:9092 --kafka_group_id ilagent -e 'events'

# Instance 2
ilagent daemon --kafka_brokers kafka:9092 --kafka_group_id ilagent -e 'events'
```

Requirements:

* The number of topic partitions must be equal to or greater than the number of instances
* All instances must use the same `--kafka_group_id`

No additional configuration is needed — Kafka handles rebalancing, offset tracking, and failover automatically.

## HTTP proxy

Run multiple ilagent instances behind a load balancer. Each instance maintains its own SQLite retry queue.

```
clients ── load balancer ── ilagent-1 (own SQLite)
                        └── ilagent-2 (own SQLite)
```

ilert deduplicates events via `alertKey`, so if both instances receive and forward the same event, it will only create a single alert. Make sure your events include a consistent `alertKey` for deduplication to work correctly.

## MQTT

MQTT is the most nuanced case because the protocol does not have a native consumer group concept. There are several strategies, each with different trade-offs.

### Option 1: Shared subscriptions (recommended)

MQTT v5 introduced [shared subscriptions](https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901250), which distribute messages across subscribers in a named group — similar to Kafka consumer groups. The broker delivers each message to only one subscriber in the group.

ilagent supports this via the `--mqtt_shared_group` flag:

```sh
# Instance 1
ilagent daemon -m broker:1883 -e 'ilert/events' --mqtt_shared_group ilagent --mqtt_qos 1

# Instance 2
ilagent daemon -m broker:1883 -e 'ilert/events' --mqtt_shared_group ilagent --mqtt_qos 1
```

Under the hood, ilagent subscribes to `$share/ilagent/ilert/events` instead of `ilert/events`. The broker handles load balancing — messages are distributed across instances and delivered to exactly one consumer in the group.

**Requirements:**

* Your MQTT broker must support MQTT v5 shared subscriptions (Mosquitto 2.x, HiveMQ, EMQX, VerneMQ, and most modern brokers)
* All instances must use the same `--mqtt_shared_group` value
* Combine with `--mqtt_qos 1` to ensure at-least-once delivery

### Option 2: Active-passive failover

Run two instances, but only one actively subscribes. Use ilert heartbeat monitoring (`-b il1hbt123...`) on the active instance — if the heartbeat stops, your orchestration layer (Kubernetes, systemd, etc.) switches to the standby.

```sh
# Active instance
ilagent daemon -m broker:1883 -e 'ilert/events' -b il1hbt123... --mqtt_qos 1 --mqtt_buffer

# Standby instance (started by orchestrator on failover)
```

This approach is simple and works with any MQTT broker, but has a failover gap while the switch happens. Using `--mqtt_buffer` ensures that events received before a crash are persisted in SQLite and retried on restart.

### Option 3: Active-active with idempotency

Run multiple instances, all subscribing to the same topics. Every instance receives and processes every message.

```sh
# Instance 1
ilagent daemon -m broker:1883 -e 'ilert/events' --mqtt_qos 1

# Instance 2
ilagent daemon -m broker:1883 -e 'ilert/events' --mqtt_qos 1
```

This works because:

* **Events** — ilert deduplicates on `alertKey`, so duplicate submissions are harmless
* **Escalation policy updates** — the PUT calls are idempotent (setting the same user on the same level twice has no side effect)

The trade-off is doubled API traffic and processing load. This approach requires no broker-side support and works with any MQTT version.

### Option 4: Topic partitioning

Split your messages across different topics by site, zone, or function. Each ilagent instance handles a dedicated subset:

```sh
# Instance 1 — handles site A
ilagent daemon -m broker:1883 -e 'site-a/events'

# Instance 2 — handles site B
ilagent daemon -m broker:1883 -e 'site-b/events'
```

This requires cooperation from the publisher side but gives you precise control over load distribution. Failure of one instance only affects its assigned topics.

## Recommendations

| Setup                    | Strategy                       | Broker requirement |
| ------------------------ | ------------------------------ | ------------------ |
| **Kafka**                | Consumer groups (built-in)     | —                  |
| **HTTP**                 | Load balancer + alertKey dedup | —                  |
| **MQTT** (modern broker) | Shared subscriptions           | MQTT v5            |
| **MQTT** (any broker)    | Active-active with idempotency | None               |
| **MQTT** (simple)        | Active-passive with heartbeat  | None               |

For most MQTT deployments, we recommend **shared subscriptions** (`--mqtt_shared_group`) combined with `--mqtt_qos 1` and `--mqtt_buffer` for the strongest delivery assurance.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.ilert.com/developer-docs/client-libraries/ilagent/high-availability.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
