CollectorCtrl Overview & Core Philosophy
Welcome to the CollectorCtrl documentation hub. CollectorCtrl is the definitive, enterprise-grade, on-prem control plane designed to centrally manage, dynamically configure, and actively observe your sprawling OpenTelemetry (OTel) collector infrastructure at global scale.
By shifting telemetry pipeline administration from manual, error-prone file edits on individual nodes to a centralized, policy-driven control plane, CollectorCtrl ensures absolute configuration integrity, drift prevention, and unified governance.
Core Value Proposition
Modern observability platforms charge premium ingestion fees, prompting organizations to optimize their telemetry pipelines. However, managing OTel collectors across thousands of virtual machines, bare-metal servers, and Kubernetes clusters quickly becomes a logistical nightmare.
CollectorCtrl solves this by providing:
- Zero-Drift Enforced Governance: Edge configurations are locked down and reconciled instantly if modified locally.
- Instant Dynamic Reconfiguration: Telemetry pipelines are updated in real time via secure network channels without requiring rolling service restarts.
- Vendor-Agnostic Routing: Arbitrarily route telemetry streams to Datadog, Coralogix, Dynatrace, Elastic, or local ClickHouse clusters from a single panel.
- Centrally Enforced Cost Optimisation: Reduce observability spend directly at the collection layer. Define fleet-wide sampling rates, drop-filter policies, and attribute-scrubbing rules through reusable templates — pushed instantly to every collector via the control plane. A single policy change can eliminate redundant metric cardinality, strip high-volume debug traces, or enforce tail-sampling rules across your entire fleet without touching a single agent manually.
Core Architecture Terminology
To understand how CollectorCtrl governs your telemetry data planes, it is important to familiarize yourself with our system components:
1. Management Server (Main Control Plane)
The Management Server is the heart of the platform. It is run on-premise or within your private cloud. It exposes:
- Admin UI Console: A modern, secure, web-based dashboard for administration, policy design, and telemetry stream visualization.
- REST API & SDKs: Full programmatical access to manage configurations, query logs, trigger rollouts, and sync package assets.
- OpAMP Gateway: High-performance WebSocket endpoint executing the OpenTelemetry Agent Management Protocol.
- Metadata Database: Standard PostgreSQL engine (with SQLite support for single-node developer instances) tracking fleet state, users, roles, audit trails, and versioned policy history.
2. Supervisor Agent
The Supervisor is an extremely lightweight, OS-native daemon (run as a Windows Service or Linux systemd unit) deployed alongside the OTel Collector on your target nodes. Its primary responsibilities include:
- Process Lifecycle Management: Spawns, monitors, and restarts the OTel Collector process if it crashes or stalls.
- OpAMP Client Connection: Maintains a secure, bidirectional WebSocket channel to the Management Server.
- Active Reconciler: Pulls assigned configurations from the control plane, writes them to a local scratch path, and signals the Collector to hot-reload dynamically.
- Local Drift Guardian: Continuously watches file integrity; any manual configuration edits on-disk are instantly overwritten by the Supervisor with the server's authorized snapshot.
3. Collector (Sidecar / Managed Process)
The Collector refers to the actual OpenTelemetry Collector binary execution process. This can be the upstream OTel Core distribution, OTel Contrib, a vendor-supported binary (such as the Coralogix or Dynatrace agent), or a custom-compiled binary created via the Custom Builder. The Supervisor handles this process as a child worker.
Scalability & Production Datastores
By default, developer trials of CollectorCtrl run out-of-the-box using an embedded SQLite database. While excellent for localized testing and low-footprint single-node setups, production deployments must be scaled using PostgreSQL.
PostgreSQL Sizing Guidelines
For enterprise production fleets, configuring a highly-available Postgres cluster is required. The database processes metadata transactions, active OpAMP agent heartbeats, templating patterns, and audit trails.
| Fleet Size (Active Agents) | Recommended CPU (vCPUs) | Recommended RAM (GB) | Storage Engine IOPS |
|---|---|---|---|
| Developer / Sandbox (< 50) | 2 | 4 | 500 (General SSD) |
| Mid-Scale Enterprise (50 - 1,000) | 4 | 8 | 3,000 (Provisioned) |
| Global Infrastructure (1,000 - 10,000+) | 8 - 16 | 16 - 32 | 10,000+ (High Performance) |
For larger environments exceeding 10,000 concurrent supervisors, configure read replicas to offload API query operations and reporting analytics.
CollectorCtrl