OpenClaw Security Analysis → B4M CLI Design Principles
What this is. A teardown of OpenClaw's architectural security failures and the design principles we're committing to instead for B4M CLI. This is foundational reasoning that should shape every agent-runtime decision we make.
Related. Connects directly to the B4M V3 Blueprint work — the same trust-boundary thinking shows up in how V3 packages and blueprints separate stable infrastructure (security must be identical everywhere) from product-specific business logic.
The fundamental architecture problem
OpenClaw runs as a single long-lived Node.js process (the “Gateway”) that handles everything: channel connections, session state, the agent loop, model calls, tool execution, and memory persistence. This is a monolith with god-mode permissions. Every integration, every skill, every message flows through one trust boundary.
The result: anyone who can influence the agent's input effectively inherits the agent's permissions. Sophos coined this the “lethal trifecta”:
- Access to private data
- Ability to communicate externally
- Ability to ingest untrusted content
When all three exist in the same execution context, you are a single prompt injection away from full compromise.
The five critical security failures
1. The web server / gateway exposure problem
OpenClaw's Gateway binds to localhost and treats 127.0.0.1 connections as trusted (no auth required). But when deployed behind a reverse proxy (as most people do for remote access), all external requests get forwarded as localhost traffic, bypassing authentication entirely.
- CVE-2026-25253 (CVSS 8.8): The Control UI accepted a
gatewayUrlquery parameter from the URL without validation, auto-initiating a WebSocket connection and transmitting the auth token. Three-stage attack chain completes in milliseconds: token exfil → authenticated connection → full RCE. - 40,000+ instances exposed on the internet (SecurityScorecard), 63% vulnerable, 12,812 exploitable via RCE.
- API keys and credentials leaked in plaintext via control panels.
This is exactly the right instinct: why does an autonomous agent need a web server? The answer is: it doesn't. The web server exists for the Control UI and webhook callbacks from messaging platforms. Both can be handled without binding a general-purpose HTTP server.
2. Prompt injection as a structural, unsolved problem
OpenClaw's “guardrails” are prompt instructions — soft guidance that any sufficiently crafted input can override. This is not a bug; it's a category error.
- Demonstrated attacks: an email containing a prompt injection caused the agent to exfiltrate private keys from the host machine.
- A user sent themselves an email that caused the bot to forward victim emails to the attacker — no confirmation, no approval.
- The agent can be steered by malicious content in anything it reads: web pages, emails, documents, images, metadata.
- Microsoft's assessment: “Assume the runtime can be influenced by untrusted input, its state can be modified, and the host system can be exposed.”
The fix isn't better prompts. It's architectural: separate the reasoning layer from the execution layer with hard boundaries.
3. Skills supply chain poisoning (ClawHavoc campaign)
OpenClaw's skill system (community-contributed SKILL.md files with YAML + natural language) has become a malware distribution vector.
- 800+ malicious skills discovered (~20% of the ClawHub registry).
- Top downloaded community skill: literal malware.
- Skills can execute arbitrary shell commands, exfiltrate data silently via
curlto external servers, and inject prompts that bypass safety guidelines. - No code signing, no sandboxing, no review process.
- The “What Would Elon Do?” skill: functionally malware delivering Atomic macOS Stealer (AMOS).
4. No trust boundary isolation
- Single-process architecture means channel adapters, session management, tool execution, and memory all share one trust domain.
- No per-user isolation — cross-session data leakage documented.
sessionKeyis routing/context selection, not per-user auth.- Memory persists as plaintext Markdown files on disk, accumulating sensitive data over time.
- If you mix personal and company identities on one runtime, all separation collapses.
5. Excessive default permissions
- Full shell access, file read/write, browser control, email access, calendar management — all enabled by default or trivially enabled.
- Exec approvals are opt-in prompt-based guardrails, not hard architectural boundaries.
- Sandbox mode is opt-in; if off, commands execute directly on the gateway host.
- The agent can write its own “vibe code” for tasks it doesn't know — meaning it can self-extend its capabilities at runtime.
What B4M CLI does differently
The six design principles we are committing to.
Principle 1: No web server for agent coordination
Replace the HTTP Gateway pattern with:
- Unix domain sockets or named pipes for local IPC — no network-bindable surface.
- Outbound-only connections to messaging platforms (polling or long-poll, not webhook callbacks requiring an open port).
- If a UI is needed, use a Unix socket ↔ browser bridge that cannot be proxied externally.
- Zero listening ports by default. Nothing binds to any interface.
Principle 2: Hard execution boundaries (not prompt guardrails)
- Separate processes for reasoning (LLM interaction) and execution (tool use).
- The reasoning process produces a structured action request (JSON/YAML intent).
- The execution process validates against a compiled allowlist (not prompt instructions).
- High-risk actions (shell exec, file write, email send) require cryptographic approval tokens, not chat-based “reply Y.”
- Consider capability-based security: the agent only has access to explicitly granted capabilities, not ambient authority.
Principle 3: Untrusted content quarantine
- All external content (emails, web pages, documents) processed by a read-only, tool-disabled summarizer agent first.
- Summaries passed to the main agent — never raw untrusted content.
- This creates an air gap against indirect prompt injection.
- Content provenance tracking: the agent knows what came from trusted vs. untrusted sources.
Principle 4: No ambient credentials
- No API keys in environment variables inherited by child processes.
- Use short-lived, scoped tokens minted per-action.
- Credentials stored in an OS-level keychain or hardware security module, not plaintext config.
- The agent process itself never holds persistent credentials.
Principle 5: Deterministic skill / extension model
- Skills are code-reviewed, signed, and sandboxed — not community YAML that runs arbitrary commands.
- Skills execute in isolated containers/processes with explicitly declared capabilities.
- No self-extending: the agent cannot write and execute new code at runtime without human approval through an out-of-band channel.
- Skill registry with cryptographic attestation, not a wiki anyone can edit.
Principle 6: Decentralized by design
- Each user runs their own agent instance with their own trust boundary.
- No shared runtime, no shared memory, no shared credentials.
- Coordination between agents happens via message passing with verified identities, not shared process space.
- This is the mycelial model: autonomous nodes communicating through defined interfaces, not a centralized gateway.
The core insight
OpenClaw proved that people desperately want autonomous AI agents. The demand is real. But its architecture treats security as a configuration problem (“just set the right flags”) rather than a structural one.
Bike4Mind CLI's opportunity: deliver the same autonomy with security as an architectural property, not a configuration option.
The web server isn't just unnecessary — it's the symptom of a design philosophy that treats the agent as a service to be exposed rather than a capability to be contained. An agent that coordinates via local IPC, connects outbound-only, quarantines untrusted input, and enforces hard execution boundaries can be just as powerful without being a sitting duck.