WorldVision

Editor's note. This page is groomed from the 6-file WorldVision suite in lumina5/docs-site/docs/roadmap/world-vision/ (overview, lean MVP, legacy MVP plan, full implementation plan, post-MVP roadmap). It distills vision and ambition first, with implementation specifics treated as supporting evidence — not the centerpiece. Source documents are kept verbatim in lumina5 for engineers; this page is for strategy and shared understanding.

What it is

WorldVision turns the mobile camera into an input device for Bike4Mind. Point a phone at a chess board, a Tarot spread, a D&D battle map, a cluttered workshop, a robot's field of view — and the AI on the other end understands what it's looking at and acts on it. Capture, reason, respond. The cognitive workshop graduates from text-and-files to text-and-files-and-the-world-around-you.

This is also the bridge from the screen to the room. WorldVision is the perceptual layer that makes the Physical AI roadmap — Magic Mirrors, Mobile Mirrors, Full Robotics — real.

The capabilities

Capability	What it does
Visual understanding	GPT-4V (primary) + Gemini (fallback) over images captured on phone. The AI reads the scene, not just the pixels.
Game-aware modes	Chess (board recognition, FEN notation), Tarot (card identification, interpretation), D&D (dice, miniatures, eventually the whole table). Each mode is a system prompt + a function-call kit.
Robotics API	External REST API any robot can call. Phone-on-rover, Raspberry Pi, Jetson, ESP32-CAM, USB webcam — same endpoint, same auth, same response shape.
Continuous capture	Timer-based or motion-triggered capture for autonomous systems. Webhook out when something interesting changes.
Multimodal session	Vision analyses link back to the chat session that asked for them. The conversation remembers what it saw.
Privacy-first	GDPR right to deletion, CCPA data export, soft-delete + S3 lifecycle expiry. The user owns their images and can pull the plug.

The architecture leans on infrastructure Bike4Mind already runs: UserApiKey (with a new VISION_ANALYZE scope), apiKeyAuth, rateLimit, fabFileService (S3 + presigned URLs), and the standard service pattern. WorldVision is mostly new prompts and one new table on top of the existing B4M substrate — not a parallel stack.

The MVP

The lean MVP ships Robotics + Chess with the minimum new surface area:

One new table — vision_analyses (user, session, image reference, mode, prompt, response, tokens, soft-delete timestamp).
One new API scope — ApiKeyScope.VISION_ANALYZE.
One new service — visionAnalysisService, following the existing service pattern.
Two new endpoints — /api/vision/analyze and /api/vision/history. The upload endpoint is the existing /api/files/generate-presigned-url; no duplication.
One new UI component — VisionAnalyzer, using Capacitor's native camera on iOS/Android.

Target users for the MVP:

Hobbyist roboticists who want a one-line vision call from a phone-mounted rover, a Raspberry Pi, an ESP32-CAM, or a Jetson. The pitch is “pip install b4m-vision, point your robot at the world, ask the AI what to do.”
Chess players who want to photograph any board (real, on a screen, in a magazine), get the FEN, get a position read.
B4M users who want any object, any scene, any document held up to the camera to be readable inside their conversation.

Tarot and D&D follow on the same primitive — they are different system prompts and different toolkits against the same pipeline. The 22 major arcana ship after Chess proves the pattern; D&D in its full form is the flagship below.

Quest chain

Milestone	What ships	Depends on
M1 — Backend Foundation	`vision_analyses` table, `VISION_ANALYZE` scope, `visionAnalysisService`, `/api/vision/analyze` + `/api/vision/history`	—
M2 — Mobile UI + Testing	`VisionAnalyzer` component, Capacitor camera wired, Chess + Robotics modes, on-device testing	M1
M3 — GDPR + Polish + Launch	Delete + export endpoints, audit logging, staging, beta, production	M2

Target window: Q3 2026 MVP, with the rest of the post-MVP roadmap (Tarot, D&D, Game Template Studio, full ROS2, visual SLAM) sequenced as separate quest chains afterward.

The D&D Room — flagship use case

This is the demonstration that proves what WorldVision plus the Deep Agents stack can deliver.

The scene. A dedicated room set up to wrap a real Dungeons & Dragons session in physical-AR. The players sit at a smart table. 8K monitors are arrayed around the room — the world wraps around the party. Tablets and mobile phones in players' hands serve as character sheets, spell books, inventory, party chat. The smart table itself is the battle map — terrain, minis, line-of-sight, fog-of-war, all live. Cameras throughout the room watch the table and the players.

The physical mechanics. The game isn't just on screens. In-world actions are wired to physical mechanics in the room:

Beanbag toss for a thrown-weapon attack roll
Axe throwing for a barbarian's swing
Darts for a ranged attack or a critical-hit confirmation
— and a long, growing list of room-scale skill challenges that feel like the thing the character is doing

WorldVision sees the throw land, scores it, feeds the result into the rules engine, and the AI Dungeon Master narrates the consequence.

The AI behind the curtain. The DM is an extremely strong agentic AI running on B4M Deep Agents — DM voice, NPC voices, world simulation, narrative continuity, encounter management, balance, surprise. It calls into thousands of bespoke function calls and tools: the rules engine, the bestiary, spell effects, item effects, the campaign world state, NPC memory, faction politics, weather, lighting cues for the room, monitor scene management, music selection, sound effects. The room is the body; the agent is the mind.

Why this matters. The D&D Room is not a side project — it is the proof-of-experience for an entire B4M thesis: that the right combination of WorldVision (eyes) + Deep Agents (mind) + bespoke tool kits (hands) produces experiences that are simply unavailable from any other AI stack. Generic chat is a commodity. Running a D&D game across an 8K-monitor physical-AR room with axe-throwing as the attack roll is not. The room becomes a demo no one else can put on, and it's all stitched together from Bike4Mind primitives.

Privacy and policy posture

WorldVision processes images of people, places, and things. The posture is privacy-first from day one, not bolted on later.

Right to deletion (GDPR). One-click delete per analysis or full account purge. Hard-deletes the S3 object, soft-deletes the database row, writes an audit log entry.
Right to export (CCPA / GDPR portability). User can pull a ZIP of every analysis they've ever run — images, prompts, responses, timestamps — in machine-readable form.
Data retention. S3 lifecycle rules expire images on a 30-day default; metadata can outlive the image but is itself purgeable on request.
CCPA ADMT disclosure. Automated Decision-Making Technology notice covers the AI-vision processing path, third-party model providers (OpenAI, Google), and opt-out mechanics.
Content moderation. Automated flagging for inappropriate content + a user reporting mechanism + a human review queue for edge cases.
Encryption. TLS 1.2+ in flight, AES-256 at rest, bcrypt for API key hashes.
Rate limiting and abuse controls. Reuse of the existing Redis-backed rateLimit middleware; tier-aware limits at the user, organization, and API-key level; AWS Shield / WAF in front for DDoS.
WCAG 2.2 Level AA accessibility for all user-facing surfaces.

The privacy and compliance work isn't a separate phase — it's wired into Milestone 3 of the MVP itself. The feature is not “GA” until deletion, export, retention, audit log, and ADMT notice are all live.

Place in the broader roadmap

WorldVision is the perception layer for everything physical Bike4Mind is building. The progression:

Stage	Form factor	What it adds	Target
WorldVision	Phone (you already own it)	Eyes. The cognitive workshop sees.	Q3 2026 MVP
Magic Mirrors (Phantasia)	4K display + Jetson + depth camera + mic array + speakers	Stationary in-room presence. Kitchen, reception, retail kiosk, healthcare check-in.	Q3 2026
Mobile Mirrors	Mirror + mecanum wheel base + LiDAR	Mobility. “Rosie the Robot, Jetsons-style” — follows you, comes when called.	Q4 2026
Full Robotics	+ xArm 6 + gripper + safety	Manipulation. The brain can now do things in the world.	2027

WorldVision is the common substrate. The mirror is just a phone glued to a wall; the mobile mirror is a phone with wheels; the robot is a phone with wheels and arms. Same perceptual API, same Deep Agents mind, same B4M tool chest — the body keeps growing. See the 2026 roadmap strawman for the full physical-AI sequencing.

The D&D Room is a special case in this lineage: it's a physical-AR room where the room itself is the body. Not a single mirror or a single robot — an entire instrumented space.

Lean approach

The WorldVision suite went through one important reset: the original MVP plan duplicated infrastructure that already exists in Bike4Mind (custom robot-API-key table, custom rate-limiting, custom S3 integration, custom auth middleware). The lean MVP plan replaced all of that with one new database table, one enum value, one service, two endpoints, one UI component. Roughly 500 LOC instead of 2,000.

The principle is the same one that runs through the rest of the catalog: don't reinvent B4M primitives — extend them. The platform already has hardened API key management with bcrypt, Redis-backed rate limiting, presigned S3 uploads via fabFileService, a service pattern with adapter injection for testability, and CloudWatch monitoring. WorldVision becomes a thin feature on a thick substrate, which means:

Less code to write, less code to maintain.
Same auth, same rate limits, same monitoring as every other B4M surface.
Future features (D&D mode, Tarot mode, video, edge processing, ROS2) extend the same pipeline rather than spinning up parallel ones.
Other products on the B4M platform get camera-input as a feature for free.

The same lean discipline applies to game modes: each mode is a system prompt + a function-call kit, not a new microservice. The Game Template Studio in the post-MVP roadmap takes this even further — community-authored game modes as data, not code.

Product Finder — WorldVision card — the at-a-glance summary in the catalog.
D&D Room card — the Lab-tier entry for the flagship use case.
Magic Mirrors / Mobile Mirrors / Full Robotics — the physical-AI progression WorldVision feeds.
2026 Roadmap — Appendix C: Physical AI — the broader sequencing.
Mission, Values, Principles — including “Built with Bike4Mind”: WorldVision is itself a B4M feature that gets dogfooded on every other B4M product.
Source suite in lumina5: docs-site/docs/roadmap/world-vision/ — overview, MVP (lean), MVP (legacy), full plan, post-MVP roadmap.

What it is​

The capabilities​

The MVP​

Quest chain​

The D&D Room — flagship use case​

Privacy and policy posture​

Place in the broader roadmap​

Lean approach​

Related material​