MCP IoT Gateway — Test Report

MQTT pipeline correctness, cloud broker performance, and resilience under load

Generated 2026-02-08 • Node v25.2.1 • Darwin aarch64 • HiveMQ Cloud (us-west-2) + Local HiveMQ CE
230
Tests Passed
1
Timed Out
6
Test Suites
100%
Delivery ≤ 2k/s
50k
Max Burst (Local)
2k/s
Cloud Sustained Ceiling
This report tests an IoT messaging gateway — middleware that ingests device telemetry over MQTT and routes it to storage backends (PostgreSQL, InfluxDB, S3). We tested against a local broker (HiveMQ CE, zero network overhead) and a managed cloud broker (HiveMQ Cloud on AWS us-west-2, ~100ms RTT). Load was escalated from baseline throughput up through deliberate overload to map the full performance envelope.

Contents

  1. Test Strategy
  2. System Architecture
  3. Test Execution
    1. Unit Tests (86)
    2. Integration Tests (62)
    3. Topology Tests (43)
    4. Cloud Tests (10)
    5. Benchmark — Burst & Sustained (10)
    6. Stress to Crash (20)
  4. Telemetry & Instrumentation
  5. Metrics & Visualizations
  6. Findings
  7. Recommendations
  8. Opportunities
  9. Raw Logs

1. Test Strategy

Validate the MCP IoT Gateway's MQTT pipeline across correctness, integration, cloud readiness, performance, and resilience:

The test suite is structured as a pyramid: fast mocked unit tests at the base, integration tests with real databases in the middle, and expensive performance/stress tests at the top. Each tier validates a different failure mode — logic errors, integration mismatches, and capacity limits respectively. The local broker provides a network-free baseline; the cloud broker introduces TLS overhead, ~100ms round-trip latency, and real-world broker queuing behavior.

Test Pyramid

231 total tests
TierTestsDurationDependenciesPurpose
Unit860.6sNone (mocks)Component-level correctness
Integration627.3sDocker (HiveMQ, PG, MinIO)Real adapter behavior
Topology430.9sMock MCP serversMulti-node tool discovery
Cloud1018sHiveMQ Cloud (TLS)Auth, security, pipeline routing
Benchmark104.2minLocal + Cloud MQTTBaseline performance profile
Stress207.4minLocal + Cloud MQTTCeiling detection & breaking points

Why MQTT QoS 0 Bottlenecks

MQTT QoS 0 is fire-and-forget — no delivery acknowledgment at the protocol level. Under load, bottlenecks emerge at predictable layers:

QoS 0 is chosen deliberately for high-throughput telemetry — sensor readings where occasional loss is acceptable but volume and speed matter. The tradeoff is that the protocol provides no retry mechanism; any loss is silent. These tests quantify exactly how much loss occurs (spoiler: none, up to 2k/s sustained) and where latency begins to degrade.

2. System Architecture

The gateway is a modular monorepo where protocol adapters (MQTT, OPC UA, Modbus, S7) ingest device data and persistence adapters (PostgreSQL, InfluxDB, S3) store it. The pipeline runtime between them applies topic-based routing rules, batches writes for efficiency, and provides disk-backed store-and-forward when a persistence target is unavailable.
  Protocol Adapters              Persistence Adapters
  ┌──────────────┐                ┌───────────────┐
  │ MQTT         │                │ PostgreSQL    │
  │ OPC UA       │                │ InfluxDB      │
  │ Modbus TCP   │  onMessage()   │ S3 / MinIO    │
  │ Siemens S7   │───────┐       └──────┬────────┘
  └──────────────┘       │              │ write(records)
                         ▼              │
                  PipelineManager────────┘
                    │         │
              TopicTrie   BatchQueue
                    │         │
                    │    flush() ──▶ persistence
                    │
              (on failure)
                    ▼
              DiskBuffer (NDJSON) ──▶ replay on retry

Test Infrastructure

4 services
ServiceImage / HostPortRole
HiveMQ CEhivemq/hivemq-ce:latest1883Local MQTT broker (no TLS, no auth)
PostgreSQL 17postgres:175432Persistence adapter target
MinIOminio/minio9000S3-compatible object storage
HiveMQ Cloud*.usw2.aws.hivemq.cloud8883Managed cloud MQTT (TLS, us-west-2)

3. Test Execution

3.1 Unit Tests — 86/86 PASS

Isolated tests with all external dependencies mocked. Validates component logic without I/O.

Unit Test Suites

86 tests • 0.6s
SuiteTestsDurationCoverage
TopicTrie259msMQTT 3.1.1 wildcard matching (+, #), edge cases (empty levels, Unicode, 10-level depth)
PipelineManager2469msMessage routing, batch accumulation, store-and-forward fallback, rule lifecycle
PipelineRegistry1916msCRUD operations, persistence round-trips, config merge semantics
DiskBuffer1899msNDJSON append/drain, segment rotation, corruption handling, concurrent access

3.2 Integration Tests — 62/62 PASS

End-to-end validation against real Docker services. Data flows through actual network connections and storage engines.

Integration Test Suites

62 tests • 7.3s
SuiteTestsDurationCoverage
Pipeline E2E75.7sMQTT → PipelineManager → PostgreSQL/S3 — verifies data arrives in target tables/objects
Pipeline MCP Tools168msAll 9 management tools: add/remove/update/list/enable/disable/stats/flush/configure
PostgreSQL Adapter18168msWrites, queries, JSONB round-trips, SQL injection parameterization, schema browsing
S3 Adapter (MinIO)15466msDate-partitioned key generation, metadata sanitization, batch writes, prefix listing
Store-and-Forward61.5sAdapter failure → disk buffer → adapter recovery → automatic replay

3.3 Topology Tests — 43/43 PASS

Multi-gateway network scenarios — validates tool discovery and cross-node communication across various network shapes.

Topology Test Suites

43 tests • 0.9s
SuiteTestsNetwork Shapes
Tools Inspector19Minimal (3-node), Small Factory (6-node)
Complex Scenarios6Multi-Site (9-node), Mesh-4, Generated (7–11 node)
Network Configuration10Validation, generation, stats computation
Network Harness5Start/stop, role filtering, connection mapping, failure injection
Topology Scenarios3Ring-6, Small Factory, Generated (11-node)

3.4 Cloud Tests — 10/10 PASS

Live HiveMQ Cloud cluster with TLS mutual authentication (mqtts://, port 8883).

Pipeline Validation (4 tests)

TLS connection with credentialsPASS
Pub/sub round-trip latencyPASS
Cloud MQTT → local PostgreSQL (100/100)PASS
Local vs Cloud throughput comparisonPASS

Security Validation (6 tests)

Valid credentials acceptedPASS
Invalid password rejectedPASS
Empty credentials rejectedPASS
Default CA bundle TLS validationPASS
SQL injection in topic → parameterizedPASS
Metadata injection → stored as JSONBPASS
Security tests verify that malicious payloads in MQTT topics and message metadata are safely parameterized by the PostgreSQL adapter — no raw SQL execution occurs. Authentication tests confirm the broker correctly rejects invalid and empty credentials over the TLS connection.

3.5 Benchmark — 10/10 PASS

Benchmarks establish a performance baseline at moderate loads. "Burst" fires all messages as fast as possible (queue-depth stress). "Sustained" sends at a fixed rate for 15 seconds (throughput stability). Each tier runs local and cloud back-to-back for direct comparison.

Burst Benchmark

100% delivery at all tiers
CountTargetSentRecvDlvr%Send msg/sAvg Latency
50local5050100%16,6675 ms
50cloud5050100%50,00084 ms
250local250250100%50,00021 ms
250cloud250250100%50,000153 ms
1,000local1,0001,000100%62,50022 ms
1,000cloud1,0001,000100%55,556246 ms
Delivery % = messages received / messages sent. Avg Latency = mean round-trip time (publish to receive-back). The local broker delivers in single-digit ms; cloud adds ~100-250ms due to network RTT and TLS encryption overhead. All tiers achieve 100% delivery.

Sustained Benchmark (15s per tier)

all tiers healthy
Rate/sTargetSentRecvDlvr%p50p95p99DriftVerdict
10local150150100%2 ms6 ms18 ms+2 msOK
10cloud150150100%104 ms166 ms171 ms+85 msOK
100local1,5001,500100%2 ms3 ms4 ms+2 msOK
100cloud1,5001,500100%132 ms187 ms199 ms+86 msOK
200local3,0003,000100%2 ms3 ms3 ms+1 msOK
200cloud3,0003,000100%125 ms192 ms205 ms+97 msOK
p50 / p95 / p99 are latency percentiles — p50 is the median, p99 captures the slowest 1% of messages. Drift compares p95 in the last 20% of the test window vs the first 20%; positive drift indicates latency is increasing over time. Low drift values here confirm the broker handles these rates without queue accumulation.

3.6 Stress to Crash — 19/20 PASS 1 TIMEOUT

Stress tests deliberately overload the system to find where degradation begins. Burst sizes escalate from 1k to 50k messages; sustained rates from 500/s to 5,000/s. The single timeout (cloud at 5k/s sustained) is an expected finding — it identifies the throughput ceiling for a single TLS connection to HiveMQ Cloud.

Burst Escalation (with telemetry)

zero message loss at all tiers
CountTargetSentRecvDlvr%msg/sp50p95p99HeapRSSGCsGC msVerdict
1klocal1,0001,000100%43k16212154M165M14.7OK
1kcloud1,0001,000100%45k40146546648M180M317.6OK
5klocal5,0005,000100%40k39515143M207M449.8OK
5kcloud5,0005,000100%31k42750250364M219M232.6OK
10klocal10,00010,000100%36k28727492M237M325.3OK
10kcloud10,00010,000100%51k562787795120M307M223.8OK
25klocal25,00025,000100%83k162736134M377M523.2OK
25kcloud25,00025,000100%110k9791,9492,083101M352M730.4SLOW
50klocal50,00050,000100%86k153538118M377M1144.6OK
50kcloud50,00050,000100%201k2,3234,2574,54679M365M1355.5SLOW
Heap is actively allocated V8 JavaScript memory. RSS (Resident Set Size) is total process memory including the V8 engine, OS-level buffers, and TLS state. GCs = garbage collection cycles; GC ms = cumulative pause time. Notably, zero messages were lost at any tier — HiveMQ Cloud queues rather than drops. The latency degradation at 25k+ is purely broker-side queue depth.

Cloud Burst p99 Latency Escalation

relative to 50k peak (4,546 ms)
1k msgs
466 ms
466 ms
5k msgs
503 ms
503 ms
10k msgs
795 ms
795 ms
25k msgs
2,083 ms
2,083 ms
50k msgs
4,546 ms
4,546 ms
< 1,000 ms
1–3s
> 3s
The p99 curve is roughly exponential — doubling burst size at the high end more than doubles tail latency. Up to 10k messages, the broker dispatches within ~800ms. Beyond that, the ingress queue saturates and messages wait in line. The inflection point is between 10k and 25k for HiveMQ Cloud's Starter tier.

Sustained Escalation

cloud ceiling at 2k/s
Rate/sTargetSentRecvDlvr%Act/sp50p99DriftHeapRSSEL p99GCsVerdict
500local5,0005,000100%50013+179M364M21ms3OK
500cloud5,0005,000100%500101193+9362M365M21ms3OK
1klocal10,00010,000100%1,00013+263M373M21ms5OK
1kcloud10,00010,000100%1,00093181+8486M365M21ms4OK
2klocal20,00020,000100%2,00013+2115M391M22ms5OK
2kcloud20,00020,000100%2,00088180+83168M416M21ms5OK
5klocal50,00050,000100%5,00014+1123M388M25ms14OK
5kcloudTIMEOUT — test exceeded 50s limit. TLS socket backpressure prevents sustaining 5k msg/s.
EL p99 = Node.js event loop delay at the 99th percentile (measured via monitorEventLoopDelay). Values around 21ms reflect the timer resolution floor, not actual blocking. Drift compares early vs late latency; positive but stable drift indicates a fixed RTT offset rather than compounding delay. The cloud sustained ceiling is 2,000 msg/s with 100% delivery and stable latency. At 5k/s, the TLS send buffer fills faster than the network can drain it.

4. Telemetry & Instrumentation

Beyond message delivery metrics, four additional monitoring channels were instrumented to attribute performance characteristics to specific subsystems: V8 garbage collection, event loop responsiveness, memory allocation, and per-second time-series snapshots. This distinguishes between application-level bottlenecks (GC pauses, event loop blocking) and infrastructure-level limits (network, broker queuing).

Instrumentation Channels

4 channels
ChannelWhat It MeasuresWhy It Matters
Garbage Collection GC pause count, total pause duration, longest single pause (via PerformanceObserver) GC pauses are stop-the-world — the entire process freezes. Long pauses during send loops create throughput gaps and latency spikes.
Event Loop Delay p50 and p99 delay (via monitorEventLoopDelay, 20ms resolution) Node.js is single-threaded. If the event loop is blocked by synchronous work, incoming messages queue up and latency compounds.
Memory (Heap + RSS) V8 heap used (live JS objects) and RSS (total OS-allocated process memory) Heap tracks active allocations; RSS includes V8 engine, TLS buffers, and OS pages. Steady growth without GC reclamation indicates a memory leak.
Per-Second Snapshots Cumulative sent/received, heap, RSS, and event loop lag every 1 second Reveals temporal behavior — whether in-flight message backlog grows or stabilizes, and whether memory follows a healthy sawtooth (GC reclaims periodically).

GC Impact Analysis

GC is not a bottleneck
TierTargetGC CountTotal PauseMax PausePer 10k msgsAssessment
1k burstlocal14.7 ms4.7 ms10Negligible
1k burstcloud317.6 ms12.8 ms30Low
10k burstcloud223.8 ms13.1 ms2Low
50k burstlocal1144.6 ms5.0 ms2.2Healthy
50k burstcloud1355.5 ms7.0 ms2.6Healthy
5k/s sust.local14117.5 ms13.9 ms2.8Healthy
2k/s sust.cloud543.2 ms12.5 ms2.5Healthy
GC is not a bottleneck. The longest single pause was 13.9ms (during 5k/s local sustained). The rate stabilizes at 2–3 collections per 10k messages regardless of throughput, indicating well-behaved memory allocation patterns.

Event Loop Health

event loop never blocked
TierTargetEL p50EL p99Assessment
1k burstlocal21.1 ms26.4 msTimer resolution baseline
1k burstcloud21.1 ms33.2 msSlight TLS context-switch overhead
50k burstlocal21.1 ms24.5 msStable under high throughput
50k burstcloud21.1 ms23.1 msNo blocking
5k/s sust.local20.4 ms25.2 msHealthy at max local throughput
2k/s sust.cloud20.4 ms21.5 msWell within bounds
Event loop stays at timer-resolution floor (~21ms) across all tiers. Even at 50k burst or 5k/s sustained, worst-case EL delay is 25–33ms. The Node.js runtime is never the bottleneck — all latency degradation is attributable to the network and broker.

Per-Second Snapshots: Cloud 2k/s Sustained

10s test window + 1s drain
SecSentRecvIn-FlightHeapRSSEL Lag
11,9991,83716298 MB395 MB21 ms
23,9993,823176125 MB396 MB21 ms
35,9995,825174108 MB400 MB21 ms
47,9997,825174136 MB401 MB21 ms
510,0019,813188118 MB404 MB21 ms
612,00311,827176146 MB406 MB21 ms
714,00313,641362128 MB409 MB22 ms
816,00315,739264156 MB411 MB22 ms
918,00117,679322139 MB415 MB22 ms
1020,00019,773227167 MB416 MB22 ms
1120,00020,0000168 MB416 MB22 ms
In-Flight = sent − received at each snapshot — the number of messages currently in the network/broker pipeline. This oscillates between 162 and 362, never growing unbounded, confirming the broker dispatches at roughly the send rate. The heap oscillates in a sawtooth pattern (rises, drops on GC) with no upward drift — no memory leak. All 20,000 messages arrive by second 11.

5. Metrics & Visualizations

Cloud Latency Profile (Sustained p50/p95/p99)

latency flat from 10/s to 2k/s
Ratep50p95p99
10/s104 ms166 ms171 ms
25/s134 ms194 ms205 ms
100/s132 ms187 ms199 ms
500/s101 ms181 ms193 ms
1k/s93 ms169 ms181 ms
2k/s88 ms170 ms180 ms

Cloud p50 Latency by Rate

p50 decreases at higher throughput
10/s
104 ms
104 ms
25/s
134 ms
134 ms
100/s
132 ms
132 ms
500/s
101 ms
101 ms
1k/s
93 ms
93 ms
2k/s
88 ms
88 ms
p50 actually decreases at higher rates (134ms at 25/s → 88ms at 2k/s). This is a TCP optimization effect: at higher throughput, the connection is "warm," Nagle's algorithm batches more efficiently, and TLS session caching reduces per-packet overhead. The broker handles every tier without queue buildup.

Memory Under Sustained Load

no leak detected
RateLocal HeapLocal RSSCloud HeapCloud RSS
500/s79 MB364 MB62 MB365 MB
1k/s63 MB373 MB86 MB365 MB
2k/s115 MB391 MB168 MB416 MB
5k/s123 MB388 MBTIMEOUT

Local Broker Memory

Heap range63–123 MB
RSS range364–391 MB
Growth patternSawtooth (GC reclaims)

Cloud Broker Memory

Heap range62–168 MB
RSS range365–416 MB
Growth patternSawtooth (GC reclaims)
Cloud heap at 2k/s (168 MB) is elevated compared to local (115 MB) because in-flight messages accumulate in the receive buffer while waiting for network delivery. RSS is consistently higher than heap because it includes V8 engine overhead, TLS state, and OS-level socket buffers. Both follow a sawtooth pattern (allocate → GC reclaims) with no upward trend, confirming no memory leak.

6. Findings

F1: Zero message loss at all tested rates (up to 50k burst, 2k/s sustained).
Neither local nor cloud dropped a single message when the connection remained healthy. HiveMQ Cloud (Starter tier) queues messages rather than dropping them, providing reliable delivery even under heavy load.
F2: Cloud sustained latency is flat from 10/s to 2k/s.
p50 stays between 88–134ms across the entire range. The broker is not the bottleneck — network round-trip time dominates. Latency is essentially a function of geography (client → us-west-2 → client), not load.
F3: Local HiveMQ CE adds negligible overhead.
p99 never exceeded 76ms even at 50k burst or 5k/s sustained. The pipeline runtime (TopicTrie matching, batch accumulation, disk buffer fallback) contributes sub-millisecond processing time.
F4: Cloud burst latency degrades above 10k messages.
At 25k burst: p99 = 2.1s. At 50k burst: p99 = 4.5s. The broker's ingress queue grows faster than its dispatch rate at these volumes. For typical IoT telemetry (periodic sensor readings), bursts of this size are unlikely — but this defines the upper bound.
F5: Cloud sustained ceiling is 2k msg/s (5k/s times out).
At 5k/s, the TLS socket cannot flush publishes fast enough. Each await adapter.write() blocks on the TCP ACK, and the send loop cannot maintain the target rate. This is a hard limit for a single TLS connection to HiveMQ Cloud.
F6: GC and event loop are never the bottleneck.
Max GC pause: 13.9ms. Event loop p99: 25–33ms across all tiers. The Node.js runtime processes messages far faster than the network can carry them. All observed latency degradation is attributable to network/broker queuing.
F7: Security surface is clean.
SQL injection payloads in topics and metadata are safely parameterized by the PostgreSQL adapter. Invalid/empty credentials are rejected by HiveMQ Cloud. TLS works with the default CA bundle.
F8: Store-and-forward resilience verified.
When a persistence adapter fails, records buffer to disk (NDJSON segments). Records survive process restart and replay automatically when the adapter recovers. No data loss during simulated outages.

7. Recommendations

Prioritized Actions

6 recommendations
PriorityRecommendationRationale
HIGH Add receive buffer backpressure At 2k/s cloud, heap reaches 168MB from in-flight message accumulation. Without a cap, a long-running pipeline at high throughput could exhaust available memory. Implement a bounded buffer with a configurable high-water mark.
HIGH Implement QoS 1 for critical pipelines All tests used QoS 0 (no delivery acknowledgment). For alarm conditions or safety-critical telemetry, QoS 1 provides broker-level delivery confirmation with automatic retry on failure.
MED Add publish-side rate limiting At 50k burst, cloud p99 hits 4.5s. A configurable rate limiter (e.g., 2k/s cap) would keep tail latency under 200ms and prevent broker queue saturation.
MED Monitor heap + RSS in production 168MB at 2k/s is acceptable, but memory scales roughly linearly with throughput. Set alerts at 256MB to provide early warning before reaching container limits.
MED Add reconnection benchmarks Current tests use stable connections. Real deployments face intermittent network loss. Benchmark reconnection time, message loss during reconnect, and store-and-forward handoff latency.
LOW Non-blocking publishes for QoS 0 Currently each publish awaits the TCP write. For QoS 0 (no acknowledgment needed), fire-and-forget without await would remove the per-message RTT bottleneck and enable 100k+ msg/s.

8. Opportunities

These are not bugs or urgent fixes — they're architectural improvements and expanded test coverage that would strengthen the system for production deployment at scale.

Performance

Observability

Testing