P2POS — Sovereign Family Photo Vault (MVP Architecture & Execution Plan)¶
Status: Phases A–G are implemented in-repo: E session binding + TTL, album members + wrapped AES keys (client-side X25519 wrap), minimal blob read policy; F two-node docker-compose, scripts/demo-two-nodes.sh, README runbook; G p2pos-net with webrtc-rs negotiated data-channel blob push (≤256 KiB per message, same HMAC as HTTP), p2pos-signal WebSocket relay, per-peer webrtc_peer_id in SQLite, HTTP fallback for large blobs or ICE failures, and node ingress task writing verified ciphertext to BlobStore.
Stack: Rust (core + node runtime), React + TypeScript (web UX).
Shape: modular monolith, filesystem-backed encrypted blobs, explicit trusted-node replication.
1. Why the sovereign family photo vault is the best first stone¶
- Emotional + clear: Everyone understands “our photos, our keys, our houses”—no blockchain or currency narrative required.
- Exercises the right primitives: Identity (who is “family”), trust groups (which nodes), encrypted blobs (photos), replication (two houses), policies (who can read/write where), visibility (where copies live)—all map 1:1 to the long-term substrate.
- Honest sovereignty story: Correctness does not depend on a central SaaS; a central host is optional for convenience only.
- Small team–friendly: No LLM, no consensus research, no mobile store—ship a web UI + two node processes a technical family can run.
- Wedge without painting into a corner: Albums/photos stay in an app layer; the core stays generic (objects, capabilities, nodes, policies).
2. Strict MVP¶
In scope
- One family (single trust group) with 2–3 trusted nodes (e.g. home NAS + VPS + laptop).
- Albums and photo objects (metadata + encrypted blob reference).
- Upload from browser: client encrypts file; server/node stores ciphertext only.
- Download/view in browser: decrypt with keys held by authorized members (see trust model).
- Replication: user-configured targets; push-style sync between nodes (no global consensus).
- Visibility UI: per-object or per-album “where is this stored?” and replication status (pending / ok / failed).
- Enrollment: technical flow (token + URL, or paste public key)—no consumer onboarding polish. Product direction: “install + QR” and no home NAT/VPN setup via WebRTC only (outbound to public signaling + public STUN/TURN)—see §12. Same story for browser UX and future Android nodes: they may live entirely behind NAT; reachability does not require router config. Not required for the first technical demo.
- Single modular monolith binary for the “node” plus a separate vault web app (could be static + API to any node).
Explicit MVP success: A demo where Alice uploads at Node A, sees replication complete to Node B, opens the UI pointed at Node B, and sees the same album/photo without any third-party cloud being on the critical path.
2.5 Product personas and roadmap milestones¶
Two roles frame who runs what. Milestone 1 is the first shippable slice for technical families; Milestone 2 extends device coverage and UX without changing the operator’s core obligation (signaling + STUN/TURN).
Personas¶
| Persona | Responsibility |
|---|---|
| Operator | Hosts shared infrastructure: signaling / seeding (peer rendezvous, SDP and ICE exchange) and STUN/TURN (typically coturn or equivalent on a VPS or small cloud instance with a stable public address). The operator does not need access to family keys or photo plaintext; they provide availability of setup and relay paths only (see §12.5). |
| User | Installs and runs nodes on their own equipment—PCs, home servers, and (in later milestones) phones / Android. Uses applications built on Sover (e.g. the sovereign family vault web app today; future apps on the same substrate) to encrypt, upload, browse photos, and manage replication between trusted nodes. |
Milestone 1 — Operator-hosted signaling + STUN/TURN; user runs two nodes and the web app¶
Operator goals
- Run a reachable signaling / seeder service on the public internet.
- Run STUN and TURN so clients behind NAT can complete ICE; TURN may see ciphertext in motion only when the app uses WebRTC tunnels (application payload remains encrypted as in §9–§10).
User goals
- Deploy two nodes (e.g.
p2pos-nodein Docker on one PC or two machines) and configure them as trusted peers. - Use the vault web application to access albums and upload photos (client-side encryption unchanged).
Connectivity goals (how traffic should flow)
- ICE chooses the path; the user does not pick IPs. For any leg that uses WebRTC (browser ↔ node vault tunnel, node ↔ node replication), STUN and TURN feed candidates into ICE, which runs connectivity checks and nominates the winning pair (host, server-reflexive, or relay). The family member never has to decide “use LAN IP” vs “use public IP” vs “use TURN”—the stack selects a working path automatically. The outcome we want is: LAN when ICE can use host/local candidates, direct through NAT when reflexive pairs work, operator TURN when only relay works—all without user-supplied IP lists.
- Bootstrap is one entry, not IP hunting. The user should get online through a single stable entry (e.g. one URL from the operator or from QR / enrollment), not by manually pointing the app at each node’s LAN or WAN address. Today’s dev setups may still expose a raw
http://host:portfor the static UI; the product shape for Milestone 1 is “open this link / scan this code,” after which signaling + ICE handle peer attachment. Remaining work is mostly UX and packaging around that entry (and optional HTTP bootstrap to the same entry for auth) — not asking users to maintain IP cheat sheets. - HTTP vs WebRTC for vault API: Where the app still uses HTTP to a node (e.g. auth bootstrap), that request goes to whatever origin the user loaded the app from or the single configured base URL—again, not a separate manual IP choice per network path. Prefer aligning one origin with the primary node or gateway the family is guided to at install time.
Node-to-node replication uses the same ICE semantics: nodes gather candidates (including via operator STUN/TURN), ICE selects direct or relay automatically. HTTP fallback for large blobs or failed WebRTC setup remains valid (Phase G) and uses the peer’s configured base_url from trust setup (set once at enrollment, not per session by the end user).
Redundancy and catch-up (two-node setup)
With two trusted nodes and replication enabled, stopping one node (e.g. docker stop on one container) does not take down the whole vault for the family: the other node still serves data that was already replicated there. The product goal is that the user keeps using the same bookmark / QR / operator URL where possible, with routing or discovery sending them to a healthy node—not asking them to discover and type the other machine’s IP. (Until that layer exists, a technical user may still repoint the UI origin; Milestone 1 implementation closes that gap over time.) When the stopped node starts again, the replication worker retries pending jobs and syncs blobs and metadata that changed while it was offline—eventual consistency across nodes is restored without manual re-copy. (If only one node ever held a blob and that node stays down, that copy is unavailable until it returns or another source exists—true redundancy requires at least one live peer that already received the replicate.)
Repo alignment: docker/e2e stack, Phase G (p2pos-net, signaling, STUN/TURN), optional WebRTC vault transport from the browser, P2POS_REP_FAILED_RETRY_AFTER_SECS (default 120) automatic requeue of failed replication jobs, split Compose files and Milestone 1 runbook for operator vs family deploys.
Milestone 2 — Same operator model; users and apps on more devices¶
Operator: Unchanged from Milestone 1—still signaling / seeder + STUN/TURN on VPS-class hosting (scale and HA as the product matures).
User and product goals (extension of Milestone 1)
- Nodes on additional device classes, notably Android (and other mobile or embedded targets) as first-class mesh peers, using the same outbound-only story: dial public signaling and STUN/TURN; no user-driven home NAT configuration (§12, Phase M2).
- Simpler enrollment toward install + QR (time-limited tokens, small QR payload, paste-key fallback for power users).
- Multiple applications on top of Sover: the family vault is the first; future apps reuse identity, nodes, blobs, replication, and the same ICE semantics—LAN when possible, STUN-assisted direct when possible, operator TURN when not.
Connectivity goals: Same LAN → direct LAN, STUN-capable NAT → direct, else TURN ladder as in Milestone 1, applied consistently to browser ↔ node, Android ↔ node, and node ↔ node replication.
3. Explicitly out of scope (MVP)¶
- Currency / value / tokens.
- Local or hosted LLMs; arbitrary remote execution.
- Android node runtime, browser SDK package (beyond ad-hoc fetch in the web app).
(Architecture §12 still covers future Android peers behind NAT—same WebRTC + public signaling/STUN/TURN model as the browser.) - Consumer phone onboarding, push notifications, app store.
- Community / multi-tenant “cloud” product.
- Byzantine consensus, leader election across untrusted parties, complex CRDT graphs.
- Microservices mesh, Kafka, etc.
- Fine-grained photo editing, EXIF stripping policy, large-scale search—unless trivial.
4. Overall architecture (three surfaces, one repo)¶
┌─────────────────────────────────────────────────────────────┐
│ Layer 3 — UX: vault-web (React + TS) │
│ Talks to one node’s HTTP API; keys in memory; often NAT’d │
│ (mesh/P2P: WebRTC to public signaling + STUN/TURN — §12) │
└────────────────────────────┬────────────────────────────────┘
│ HTTPS + JSON (+ optional SSE)
┌────────────────────────────▼────────────────────────────────┐
│ Layer 2 — App: family-vault │
│ Albums, photos, family membership projections, app policies │
└────────────────────────────┬────────────────────────────────┘
│ in-process calls
┌────────────────────────────▼────────────────────────────────┐
│ Layer 1 — P2POS core (substrate) │
│ Identity, nodes, trust groups, blob store, replication, │
│ policy engine (embryo), attestations (signed manifests) │
└─────────────────────────────────────────────────────────────┘
Deployment MVP: Each node runs the same p2pos-node process (Rust monolith). The web app is built as static assets; during demo you either proxy to one node or configure API base URL per deployment.
Critical rule: The family vault crate depends on substrate traits/types; substrate must not import vault types.
Security audit (implemented milestones A–D)¶
This section records a design-time security review of the code and configuration as shipped through Phase D. It is not a formal penetration test, compliance audit, or cryptographic proof. Use it to decide what is safe to expose and what must change before broader deployment.
Milestone coverage vs security goals¶
| Goal (from architecture) | A–D status | Notes |
|---|---|---|
| Server stores ciphertext only for file payloads | Met for blob bytes | Node persists opaque octets under blobs/; no server-side decrypt. |
| Client-side encryption for photos | Met (web) | AES-GCM in browser; IV prepended to ciphertext (see vault-web crypto module). |
| Identity tied to keys | Partial | Ed25519 proves possession of a key at login only. The session token is not bound to that public key on later requests (see gaps). |
| Trusted-node replication | Partial | HMAC with shared P2POS_REPLICATE_PSK proves knowledge of the family secret, not per-node identity. |
| Policy-based read (who may fetch which blob) | Not implemented | Any valid session can GET /v1/blobs/{id} if the blob exists on that node (Phase E target). |
| No mandatory central cloud | Met for architecture | Operators self-host nodes; security then depends on how they expose them. |
Controls that exist today¶
-
Blob confidentiality from the node (application layer)
File content is encrypted before upload; the node stores and replicates ciphertext only. Compromise of the DB does not yield photo plaintext without client keys. -
Login proof-of-key (Ed25519)
/v1/auth/challenge+/v1/auth/verify: client signs a fresh nonce. Invalid signatures are rejected. -
Opaque session token
After verify, API routes (except/healthand internal replicate) requireAuthorization: Bearer <token>; token is a random UUID-derived hex string, not a JWT. -
Inter-node replicate authenticity (symmetric)
POST /internal/v1/replicate/{blob_id}requires headerX-P2POS-Replicate-Signature: HMAC-SHA256 overblob_id (UTF-8) || body. Verification uses constant-time hex compare. Only parties withP2POS_REPLICATE_PSKshould be able to ingest blobs this way. -
Transport for node→node client
HttpPeerTransportuses rustls for HTTPS when the peerbase_urlishttps://…. -
SQLite integrity
Albums/photos use foreign keys; replication jobs reference peers.
Gaps and risks (prioritized for follow-up)¶
Critical / high
-
Session token is not bound to identity. After
verify, the server only checks membership in a session set. It does not store “this token belongs to public key X”. Any party who steals a bearer token (XSS, localStorage scrape, log leak) has the same API access as the user until restart or manual code change. Phase E+ should attach capabilities or key id to the session. -
No session expiry or rotation. Tokens live until the process ends (in-memory set). Stolen tokens do not age out.
-
GET /v1/blobs/{id}is not authorization-scoped. There is no check that the caller “owns” or is a member for that blob; album/photo linkage is not enforced on read. Metadata in SQLite is readable with a valid session but blob fetch is effectively “any blob id”. -
Shared replication PSK. All trusted peers share one secret. Leak of
P2POS_REPLICATE_PSKallows forging replicates to any peer that trusts it. There is no per-peer or per-blob capability in the HMAC. Compromise of one node’s config compromises the replication trust model for the whole mesh. -
Internal replicate endpoint is unauthenticated except HMAC. There is no rate limit; a holder of the PSK can fill disk (DoS). No TLS requirement for
http://peer URLs—ciphertext could be observed or modified on the wire unless operators use HTTPS and trust the path.
Medium
-
CORS is
allow_origin(Any)on the node. Any script that learns the bearer token (e.g. XSS on the vault origin, or token pasted into a malicious page) can call the API from another origin and read responses, because the server reflectsAccess-Control-Allow-Origin: *. For production-shaped deployments, use an origin allowlist, httpOnly + Secure cookies (or similar), and short-lived tokens. -
Default
P2POS_REPLICATE_PSKand demo keys must be changed for any real deployment; they are documented defaults. -
Album/photo metadata is cleartext in SQLite (titles, captions, blob ids). This is consistent with “metadata server” but is not “full secrecy” of everything about the vault.
-
/healthis unauthenticated (by design for probes); ensure it reveals nothing sensitive (currently OK).
Lower / operational
-
No rate limiting, request size caps (beyond Axum defaults), or audit logging of security events.
-
Browser key storage: Ed25519 and AES key material in localStorage is convenient for demos and vulnerable to XSS and physical access.
API surface (security-relevant)¶
| Surface | AuthN | AuthZ / notes |
|---|---|---|
/v1/auth/challenge, /v1/auth/verify |
N/A (verify proves key once) | Nonces are single-use when consumed. |
/v1/blobs, /v1/albums, /v1/photos, /v1/nodes, /v1/replication/status |
Bearer session | Weak authz on blob read; peers writable by any session. |
/internal/v1/replicate/{id} |
HMAC header | Not browser-facing; protect at network layer + strong PSK. |
/health |
None | OK for liveness. |
Relation to WebRTC (future)¶
Phases A–D use HTTP + TLS (optional) + HMAC for replication. WebRTC-only mesh transport (§12, Phase G / p2pos-net)—public seeding/signaling plus public STUN/TURN, no libp2p—is planned so Rust nodes, browser, and future Android peers can all sit behind NAT. It improves reachability and eventually peer identity at the transport layer (e.g. DTLS fingerprints aligned with substrate keys); it does not replace application-layer policies (who may read which blob) unless explicitly designed that way.
Suggested next security milestones¶
- Phase E: Session bound to
IdentityId, expiry, and blob read policy (e.g. only if blob referenced by a photo in an album the identity may access—or wrapped-key model). - Replication: Per-peer secrets or signed push tokens; TLS-only peer URLs in strict mode.
- Web: httpOnly sessions, strict CORS, CSP to reduce XSS impact.
- Ops: Threat model doc per deployment (LAN-only vs internet-facing).
5. Monorepo / repository structure¶
/
├── Cargo.toml # workspace root
├── crates/
│ ├── p2pos-core/ # identity, trust, policy types, crypto helpers
│ ├── p2pos-storage/ # encrypted blob store (filesystem impl)
│ ├── p2pos-replication/ # PeerTransport + HMAC (HTTP push; WebRTC later in p2pos-net)
│ ├── p2pos-net/ # (post–first-demo) WebRTC only: public signaling client, ICE, STUN/TURN, data channels
│ ├── p2pos-node/ # HTTP API, wiring, config (binary)
│ └── family-vault/ # domain: albums, photos, vault-specific policies
├── apps/
│ └── vault-web/ # Vite + React + TypeScript
├── docs/
│ └── P2POS_SOVEREIGN_FAMILY_VAULT_ARCHITECTURE.md
└── scripts/ # demo: docker-compose or two local dirs
Optional later (not MVP): packages/p2pos-client-ts for the browser SDK surface—do not block the MVP on extracting it.
6. Rust workspace / crates / modules¶
| Crate | Responsibility |
|---|---|
| p2pos-core | IdentityId, NodeId, TrustGroupId, Capability / Grant, Policy AST (minimal), SignedEnvelope, serialization contracts, error types. |
| p2pos-storage | BlobStore trait: put, get, delete, list; FsEncryptedBlobStore using per-blob AEAD + wrapped DEK in sidecar or manifest; content addressing optional (hash as id). |
| p2pos-replication | ReplicationTarget, ReplicationJob, retry/backoff, “last known state” per peer; transport trait PeerTransport (HTTP impl in node for dev; WebRTC impl in p2pos-net for production reachability—see §12). |
| p2pos-net | (After initial demo.) WebRTC-only stack (e.g. webrtc crate or equivalent): outbound client to public seeding/signaling (WebSocket/HTTPS), public STUN/TURN, ICE, SCTP data channels; replication stream protocol over data channels. Android: UniFFI in Phase M2 (§12.4). No libp2p. |
| family-vault | Album, Photo, membership, mapping photos → blob ids, vault-level “default replication targets”. |
| p2pos-node | Axum (or Actix) server, routes, auth middleware, SQLite/Redb for indexes and replication queue (blobs stay on disk), startup config; wires PeerTransport. |
Internal module boundaries inside p2pos-node: api/, auth/, config/, app_vault/ (handlers that call family-vault), substrate/ (re-exports wiring). Keep handlers thin.
7. Frontend structure (apps/vault-web)¶
apps/vault-web/
├── src/
│ ├── main.tsx
│ ├── App.tsx
│ ├── api/ # fetch client, types generated or hand-written
│ ├── crypto/ # encrypt/decrypt in Web Crypto (wrap in small module)
│ ├── pages/
│ │ ├── Dashboard.tsx
│ │ ├── Albums.tsx
│ │ ├── AlbumDetail.tsx
│ │ ├── Nodes.tsx
│ │ └── Settings.tsx
│ ├── components/
│ └── hooks/
└── vite.config.ts
Rule: No vault business logic hidden in components—use small hooks/services so a future p2pos-client-ts can lift the same patterns.
8. Core domain model¶
Substrate (generic)
- Identity:
IdentityId(Ed25519 public key or hash of it). - Node:
NodeId, base URL, human label, public key for attestations. - TrustGroup: set of
IdentityId+ policy defaults (e.g. “members may read all blobs in group X”). - BlobRef: opaque id, size, content hash (of ciphertext), encryption scheme id, wrapped key material pointer.
- Manifest / attestation: signed statement “this
BlobRefis stored onNodeIdat time T” (MVP: simple JSON + Ed25519).
Family vault (app)
- Family ≈ one
TrustGroupIdfor MVP (multi-family is a generalization). - Album: id, title, created_at, owner identity, ordered list of
PhotoId. - Photo: id, album_id, blob_ref, thumbnail_blob_ref (optional second blob), caption, created_at.
- Membership: which identities belong to the family (admin vs member optional flag).
9. Minimal trust / policy model (MVP)¶
Enrollment
- Node generates node keypair; operator adds trusted peers by pasting peer URL + peer public key (TOFU).
- User login MVP: sign a nonce with Ed25519 key in browser (import key from file or generate and download backup)—no passwords required for demo purity.
Policies (embryo)
- Storage policy: “Blob class
photomust exist on at leastkoftargets = [A,B].” - Read policy: “Only identities in
TrustGroupmay fetch wrapped keys for blobs in album Z.” - Execution policy: stub interface only (
ExecutionPolicyenum withNoRemoteExecutiondefault).
Wrapped keys
- Per-blob DEK encrypted for each authorized member (NaCl crypto_box or HPKE-style). MVP: encrypt DEK for each
IdentityIdpublic key listed on the album.
What you are not solving yet: revocation rotation drama, group key agreement at scale, hardware attestation—document as follow-ups.
10. Storage model¶
- Filesystem layout (per node):
data/blobs/<hex-prefix>/<blob_id>ciphertext
data/meta/<blob_id>.jsonencryption metadata + wrapped DEKs (or inline in DB) - Small index DB (SQLite): blob presence, album/photo rows, replication queue.
- Client: never sends plaintext to node; sends ciphertext + metadata. Node stores and replicates ciphertext + metadata only.
11. Replication model¶
Semantics: eventual consistency, single-writer per photo object for MVP (last write wins if misused—acceptable for demo if UI avoids concurrent edits).
Mechanism
- After
put_bloblocal success, enqueueReplicationJob { blob_id, targets[] }. - Worker pulls from queue;
POST /internal/replicate(mTLS or HMAC with pre-shared node secret for MVP) to peer. - Peer validates trust + optional quota, stores blob, acks.
- Originator marks target replicated; UI aggregates node attestations or simple presence map.
Shortcuts OK: periodic full reconcile (list blobs, diff); no Merkle sync required for MVP.
Dangerous shortcut to avoid: requiring S3 or one global database for correctness.
Reachability (product goal, post–first-demo): Plain inbound HTTP between two home NATs forces port forwarding, VPN, or tunneling—bad for “install app + scan QR.” The intended evolution is PeerTransport backed by WebRTC only (see §12): peers dial outbound to public signaling and public STUN/TURN; ICE + TURN when direct paths fail; Rust nodes, browser, and Android (Phase M2) may all be behind NAT—no user router configuration.
12. Reachability and P2P transport (WebRTC-only, QR, no home NAT setup)¶
Problem: Two houses behind typical NAT cannot accept arbitrary inbound connections from the public internet without port forwarding, VPN, or equivalent. Requiring users to configure routers violates the super-simple onboarding goal (install + QR).
Principle: Every participant dials out only. Rust nodes, future Android nodes, and browser clients may all sit behind NAT with no inbound holes on the home router. Complexity stays in WebRTC (ICE + STUN/TURN) plus a small public coordination plane—not in user networking skill.
12.0 Default topology (documented assumption)¶
This architecture assumes two public, operator-run surfaces (same org or self-hosted—see §12.5):
-
Seeding / signaling server — reachable on the public internet (HTTPS and/or secure WebSocket). All vault peers connect outbound to it to exchange SDP and ICE candidates and to discover or rendezvous with trusted peers. It is not on the ciphertext critical path for correctness if replication framing stays E2EE; it is required for session setup and optional metadata (who is online, pairing tokens).
-
STUN + TURN — public ICE servers (often one vendor or self-hosted coturn). STUN discovers reflexive addresses; TURN relays media/datagrams when direct peer paths fail (symmetric NAT, strict firewalls). TURN may see ciphertext in motion only; payloads remain blob ciphertext + app-layer HMAC as in Phase D.
With this default, browser vault UX (typically behind NAT), Android nodes (carrier/WiFi NAT), and home Rust nodes (behind home NAT) all share the same reachability story: outbound to signaling + STUN/TURN, then SCTP data channels (or equivalent) between peers when ICE succeeds.
For empirical bands on direct vs TURN-mediated paths (especially cellular mobile ↔ home NAT), source comparison, relay capacity planning, and getStats instrumentation guidance, see the documentation annex WebRTC direct P2P connection rates without TURN (mobile ↔ home NAT).
12.1 WebRTC as the only P2P stack (no libp2p layer)¶
Use standards-based WebRTC only: browser RTCPeerConnection for vault-web when mesh/P2P paths are wired; on servers and Android (Phase M2), a native stack (e.g. Rust webrtc-rs/webrtc or platform WebRTC) with the same ICE/signaling contract.
Scope simplification: Replication rides SCTP data channels (or an equivalent negotiated subprotocol) over DTLS. There is no separate libp2p (or similar) framing layer—signaling is out of band to the public server in §12.0, not part of the encrypted data-channel payload path.
Stack summary:
| Layer | Choice | Rationale |
|---|---|---|
| Transport | ICE + UDP/TCP candidates | Works for browser, Rust, and Android; ubiquitous NAT traversal story. |
| Security | DTLS (WebRTC norm) | Session encryption; compose with app-layer HMAC/signing for replicate ingest. |
| NAT / reachability | Public STUN + public TURN | All NAT’d clients obtain candidates and fallback relay without router config. |
| Direct path | ICE connectivity checks | Prefer host/srflx; use TURN only when needed (latency/cost tradeoff). |
| LAN | Host candidates (optional mDNS where supported) | Same WiFi: direct local candidates when both sides share a LAN. |
| Session setup | Public signaling / seeding (WSS/HTTPS) | Single stable hostname in QR (DNS); no embedding long SDP in the QR. |
Reliability: If TURN or signaling is down, P2P mesh sync stalls; local HTTP API to a node the browser can still reach (e.g. LAN or tunnel) remains a separate concern (M1-style). Design for high availability of signaling/TURN only where the product promises cross-NAT mesh.
12.2 Roles: Rust node, Android node, browser¶
| Role | NAT | Expectations | WebRTC posture |
|---|---|---|---|
| Rust node (home / laptop / VPS) | Often yes (home) or public VPS | Always-on where possible; disk for blobs | Outbound to signaling + STUN/TURN; data channels to other trusted nodes (and later mobile/browser peers). |
| Android node (Phase M2) | Yes (mobile networks) | Intermittent, background limits | Same as Rust: UniFFI-wrapped WebRTC or system stack; relay-first friendly. |
Browser (vault-web) |
Yes (typical home/office) | Talks HTTPS to one node’s API for CRUD today | For P2P replication or mesh (Phase G+): RTCPeerConnection, same signaling URL + ICE servers; no inbound listen port on the user’s PC. |
12.3 QR code payload (keep the QR small)¶
Do not embed long ICE candidate lists or SDP blobs in the QR. Prefer:
- Short-lived join / pairing token (app-layer, signed or redeemable once).
- Family or trust-group id (opaque).
- Signaling / seeding URL (stable public HTTPS/WSS, e.g.
wss://signal.example.org) — DNS-backed so QRs do not go stale when IPs move. - Optional: STUN/TURN URIs or time-limited TURN credentials (if not provisioned after login).
The app uses the token + public signaling endpoint to register, exchange SDP/ICE with trusted peers, and open data channels—exact wire format is implementation detail; UX stays scan → joined.
12.4 Two-phase mobile / browser plan (reduce schedule risk)¶
- Phase M1 — Asymmetric (faster): Mobile or browser uses HTTPS (and optional WebSocket) to a home node that is reachable (LAN, tunnel, or public URL); Rust nodes use WebRTC mesh node-to-node via §12.0. Delivers “both surfaces” quickly; mobile/browser not yet a full mesh peer.
- Phase M2 — Symmetric:
p2pos-net(WebRTC only) behind a thin API; Android via UniFFI (or system WebRTC); browser keepsRTCPeerConnection. Android and browser become first-class mesh peers—still no user NAT config, still public signaling + STUN/TURN by default.
Crate suggestion: add crates/p2pos-net/ (signaling client to the public seeding server, peer connection lifecycle, ICE server config, replication stream protocol over data channels) implementing PeerTransport for Rust nodes. Keep HTTP transport as optional for dev/tests.
12.5 Sovereignty wording (stays coherent)¶
- No user VPN / port forwarding is a product requirement for the mesh.
- Public signaling + public STUN/TURN are the default documented deployment for the “simple WebRTC” story; families or operators who want zero third-party infra can self-host the same three pieces (signaling service + coturn + DNS) on a VPS they control—same protocol, different operator.
- Correctness of “our keys, our ciphertext” does not require trusting signaling/TURN content—only availability of setup and relay paths (relay sees ciphertext in motion; application payload stays encrypted as in §9–§10).
13. Backend API (minimal)¶
Auth: Authorization: Bearer <session> where session is established after signature on /v1/auth/challenge.
Vault
GET /v1/albums— listPOST /v1/albums— createGET /v1/albums/:id— detail + photosPOST /v1/photos— register metadata +BlobRef(after upload)POST /v1/blobs— upload ciphertext (multipart or base64 JSON for small demo)GET /v1/blobs/:id— download ciphertext
Substrate / ops
GET /v1/nodes— this node + known peersPOST /v1/nodes/peers— add trusted peerGET /v1/replication/status— aggregate queue + per-blob status
Internal (peer-to-peer)
POST /internal/v1/replicate— ingest blob from trusted peer
All routes versioned under /v1 to preserve a stable path for a future browser SDK.
14. Frontend pages & major flows¶
| Page | Purpose |
|---|---|
| Dashboard | Family name, health: this node, peer count, replication backlog. |
| Albums | List/create albums. |
| Album detail | Grid of thumbnails; upload flow (encrypt → upload blob → register photo). |
| Nodes | Trusted nodes list, add peer, see last sync, storage location summary. |
| Settings | Key backup/download, API base URL, dark mode (optional). |
Flows: Create album → upload photos → open Nodes page → see two green checkmarks for replication → switch API URL to second node → album still visible.
15. Compelling demo scenario (script)¶
Cast: “River family”—two houses (Node Oak and Node Pine) and optional Cedar (cheap VPS) for off-site ciphertext.
- On Oak, create album “Summer 2026”, upload 5 photos; UI shows blobs only on Oak.
- Add Pine as trusted peer; replication jobs run; UI shows Oak ✓ Pine ✓.
- Disconnect Oak from network (or stop process); point browser at Pine; family opens same album; photos decrypt and display.
- Show Settings or Nodes copy: “No account on our servers—only keys and nodes you chose.”
16. Phased implementation plan¶
- Phase A — Skeleton: workspace,
p2pos-node“hello”,vault-webshell, health endpoint. - Phase B — Crypto path: browser encrypt/decrypt; node blob put/get; SQLite indexes.
- Phase C — Vault domain: albums/photos CRUD backed by DB + blob refs.
- Phase D — Replication: peer add, job queue, internal replicate endpoint (HTTP
PeerTransportfor dev), status UI. - Phase E — Trust/policy embryo: signed challenges, wrapped DEKs per member, minimal policy checks.
- Phase F — Demo hardening: docker-compose, scripted reset, README runbook.
- Phase G — WebRTC-only transport (post-MVP wedge toward QR onboarding):
p2pos-netwith public seeding/signaling, ICE, public STUN/TURN, and data channels (no libp2p); Phase M1 mobile/browser (HTTP to reachable home node) optional ahead of Phase M2 (UniFFI + Android + browser as full mesh peers, all NAT-friendly).
17. Decisions that should stay stable (long-term)¶
- Layering: substrate crates vs app crate vs UX app.
- Blob abstraction:
BlobStoretrait + content ids. - Transport boundary:
PeerTransportwith multiple implementations (HTTP for dev/tests; WebRTC for production reachability); replication logic must not assume HTTP-only. - API versioning:
/v1/...for public HTTP. - Identity as keys: capabilities tied to cryptographic identity, not emails; align WebRTC peer identity / DTLS fingerprints with substrate
IdentityId(explicit mapping layer). - Local-first truth: each node authoritative for what it has stored; sync merges via explicit protocols.
18. Acceptable MVP shortcuts¶
- TOFU peer enrollment; PSK or single shared secret between nodes.
- Single global family per deployment.
- SQLite only; no HA database.
- Last-write-wins on metadata.
- Full blob re-upload on conflict detection.
- Session tokens stored in
localStorage(demo only; documenthttpOnlycookie path for production).
19. Dangerous shortcuts (undermine the vision)¶
- Putting album/photo tables or concepts inside p2pos-core (contaminates substrate).
- Central object store as the only source of truth (kills sovereignty story).
- Server-side decryption for convenience (kills privacy-by-default).
- Hard-coding one cloud provider in core storage.
- Monolithic frontend that bakes node URLs and vault API without a thin client boundary.
- Implicit trust (any peer can pull any blob) without policy hooks—prevents future multi-app substrate.
Recommended repository tree (copy-paste target)¶
p2pos/
├── Cargo.toml
├── README.md
├── docs/
│ └── P2POS_SOVEREIGN_FAMILY_VAULT_ARCHITECTURE.md
├── crates/
│ ├── p2pos-core/
│ ├── p2pos-storage/
│ ├── p2pos-replication/
│ ├── p2pos-net/
│ ├── family-vault/
│ └── p2pos-node/
├── apps/
│ └── vault-web/
│ ├── package.json
│ ├── vite.config.ts
│ └── src/
└── scripts/
└── demo-two-nodes.sh
8-week milestone plan¶
| Week | Milestone |
|---|---|
| 1 | Rust workspace + p2pos-node binary + health + config; Vite/React app with API client stub. |
| 2 | BlobStore + filesystem impl; upload/download ciphertext; minimal auth challenge. |
| 3 | SQLite schema for albums/photos; vault CRUD API; list UI. |
| 4 | Browser encrypt/decrypt + wrapped DEK for one user; wire upload pipeline. |
| 5 | Peer registry + replication job + internal endpoint; second node in docker-compose. |
| 6 | Replication status in API + Nodes page; basic backoff/retries. |
| 7 | Trust group embryo: multiple identities, DEK wrap for each; policy check on download. |
| 8 | Demo script, polish, failure modes (offline node), documentation; freeze MVP scope. |
After week 8 (product track): Phase G / M1–M2—p2pos-net (WebRTC only; public signaling + public STUN/TURN by default), QR enrollment; browser/mobile HTTP-to-home (M1) where needed, then UniFFI + RTCPeerConnection full mesh (M2) with Android and browser behind NAT. See §12.
Exact next Cursor prompt (follow-up)¶
Phases A–G are in the repo. Next hardening targets: per-peer replicate secrets, strict HTTPS-only peer URLs, httpOnly sessions / CORS allowlist, chunked WebRTC frames for large blobs, and authenticated signaling.
End of architecture document.