OpenClaw Security Analysis → B4M CLI Design Principles

What this is. A teardown of OpenClaw's architectural security failures and the design principles we're committing to instead for B4M CLI. This is foundational reasoning that should shape every agent-runtime decision we make.

Related. Connects directly to the B4M V3 Blueprint work — the same trust-boundary thinking shows up in how V3 packages and blueprints separate stable infrastructure (security must be identical everywhere) from product-specific business logic.

The fundamental architecture problem

OpenClaw runs as a single long-lived Node.js process (the “Gateway”) that handles everything: channel connections, session state, the agent loop, model calls, tool execution, and memory persistence. This is a monolith with god-mode permissions. Every integration, every skill, every message flows through one trust boundary.

The result: anyone who can influence the agent's input effectively inherits the agent's permissions. Sophos coined this the “lethal trifecta”:

Access to private data
Ability to communicate externally
Ability to ingest untrusted content

When all three exist in the same execution context, you are a single prompt injection away from full compromise.

The five critical security failures

1. The web server / gateway exposure problem

OpenClaw's Gateway binds to localhost and treats 127.0.0.1 connections as trusted (no auth required). But when deployed behind a reverse proxy (as most people do for remote access), all external requests get forwarded as localhost traffic, bypassing authentication entirely.

CVE-2026-25253 (CVSS 8.8): The Control UI accepted a gatewayUrl query parameter from the URL without validation, auto-initiating a WebSocket connection and transmitting the auth token. Three-stage attack chain completes in milliseconds: token exfil → authenticated connection → full RCE.
40,000+ instances exposed on the internet (SecurityScorecard), 63% vulnerable, 12,812 exploitable via RCE.
API keys and credentials leaked in plaintext via control panels.

This is exactly the right instinct: why does an autonomous agent need a web server? The answer is: it doesn't. The web server exists for the Control UI and webhook callbacks from messaging platforms. Both can be handled without binding a general-purpose HTTP server.

2. Prompt injection as a structural, unsolved problem

OpenClaw's “guardrails” are prompt instructions — soft guidance that any sufficiently crafted input can override. This is not a bug; it's a category error.

Demonstrated attacks: an email containing a prompt injection caused the agent to exfiltrate private keys from the host machine.
A user sent themselves an email that caused the bot to forward victim emails to the attacker — no confirmation, no approval.
The agent can be steered by malicious content in anything it reads: web pages, emails, documents, images, metadata.
Microsoft's assessment: “Assume the runtime can be influenced by untrusted input, its state can be modified, and the host system can be exposed.”

The fix isn't better prompts. It's architectural: separate the reasoning layer from the execution layer with hard boundaries.

3. Skills supply chain poisoning (ClawHavoc campaign)

OpenClaw's skill system (community-contributed SKILL.md files with YAML + natural language) has become a malware distribution vector.

800+ malicious skills discovered (~20% of the ClawHub registry).
Top downloaded community skill: literal malware.
Skills can execute arbitrary shell commands, exfiltrate data silently via curl to external servers, and inject prompts that bypass safety guidelines.
No code signing, no sandboxing, no review process.
The “What Would Elon Do?” skill: functionally malware delivering Atomic macOS Stealer (AMOS).

4. No trust boundary isolation

Single-process architecture means channel adapters, session management, tool execution, and memory all share one trust domain.
No per-user isolation — cross-session data leakage documented.
sessionKey is routing/context selection, not per-user auth.
Memory persists as plaintext Markdown files on disk, accumulating sensitive data over time.
If you mix personal and company identities on one runtime, all separation collapses.

5. Excessive default permissions

Full shell access, file read/write, browser control, email access, calendar management — all enabled by default or trivially enabled.
Exec approvals are opt-in prompt-based guardrails, not hard architectural boundaries.
Sandbox mode is opt-in; if off, commands execute directly on the gateway host.
The agent can write its own “vibe code” for tasks it doesn't know — meaning it can self-extend its capabilities at runtime.

What B4M CLI does differently

The six design principles we are committing to.

Principle 1: No web server for agent coordination

Replace the HTTP Gateway pattern with:

Unix domain sockets or named pipes for local IPC — no network-bindable surface.
Outbound-only connections to messaging platforms (polling or long-poll, not webhook callbacks requiring an open port).
If a UI is needed, use a Unix socket ↔ browser bridge that cannot be proxied externally.
Zero listening ports by default. Nothing binds to any interface.

Principle 2: Hard execution boundaries (not prompt guardrails)

Separate processes for reasoning (LLM interaction) and execution (tool use).
The reasoning process produces a structured action request (JSON/YAML intent).
The execution process validates against a compiled allowlist (not prompt instructions).
High-risk actions (shell exec, file write, email send) require cryptographic approval tokens, not chat-based “reply Y.”
Consider capability-based security: the agent only has access to explicitly granted capabilities, not ambient authority.

Principle 3: Untrusted content quarantine

All external content (emails, web pages, documents) processed by a read-only, tool-disabled summarizer agent first.
Summaries passed to the main agent — never raw untrusted content.
This creates an air gap against indirect prompt injection.
Content provenance tracking: the agent knows what came from trusted vs. untrusted sources.

Principle 4: No ambient credentials

No API keys in environment variables inherited by child processes.
Use short-lived, scoped tokens minted per-action.
Credentials stored in an OS-level keychain or hardware security module, not plaintext config.
The agent process itself never holds persistent credentials.

Principle 5: Deterministic skill / extension model

Skills are code-reviewed, signed, and sandboxed — not community YAML that runs arbitrary commands.
Skills execute in isolated containers/processes with explicitly declared capabilities.
No self-extending: the agent cannot write and execute new code at runtime without human approval through an out-of-band channel.
Skill registry with cryptographic attestation, not a wiki anyone can edit.

Principle 6: Decentralized by design

Each user runs their own agent instance with their own trust boundary.
No shared runtime, no shared memory, no shared credentials.
Coordination between agents happens via message passing with verified identities, not shared process space.
This is the mycelial model: autonomous nodes communicating through defined interfaces, not a centralized gateway.

The core insight

OpenClaw proved that people desperately want autonomous AI agents. The demand is real. But its architecture treats security as a configuration problem (“just set the right flags”) rather than a structural one.

Bike4Mind CLI's opportunity: deliver the same autonomy with security as an architectural property, not a configuration option.

The web server isn't just unnecessary — it's the symptom of a design philosophy that treats the agent as a service to be exposed rather than a capability to be contained. An agent that coordinates via local IPC, connects outbound-only, quarantines untrusted input, and enforces hard execution boundaries can be just as powerful without being a sitting duck.

The fundamental architecture problem​

The five critical security failures​

1. The web server / gateway exposure problem​

2. Prompt injection as a structural, unsolved problem​

3. Skills supply chain poisoning (ClawHavoc campaign)​

4. No trust boundary isolation​

5. Excessive default permissions​

What B4M CLI does differently​

Principle 1: No web server for agent coordination​

Principle 2: Hard execution boundaries (not prompt guardrails)​

Principle 3: Untrusted content quarantine​

Principle 4: No ambient credentials​

Principle 5: Deterministic skill / extension model​

Principle 6: Decentralized by design​

The core insight​