Skip to content

P2POS — Sovereign Family Photo Vault (MVP Architecture & Execution Plan)

Status: Phases A–G are implemented in-repo: E session binding + TTL, album members + wrapped AES keys (client-side X25519 wrap), minimal blob read policy; F two-node docker-compose, scripts/demo-two-nodes.sh, README runbook; G p2pos-net with webrtc-rs negotiated data-channel blob push (≤256 KiB per message, same HMAC as HTTP), p2pos-signal WebSocket relay, per-peer webrtc_peer_id in SQLite, HTTP fallback for large blobs or ICE failures, and node ingress task writing verified ciphertext to BlobStore.
Stack: Rust (core + node runtime), React + TypeScript (web UX).
Shape: modular monolith, filesystem-backed encrypted blobs, explicit trusted-node replication.


1. Why the sovereign family photo vault is the best first stone

  • Emotional + clear: Everyone understands “our photos, our keys, our houses”—no blockchain or currency narrative required.
  • Exercises the right primitives: Identity (who is “family”), trust groups (which nodes), encrypted blobs (photos), replication (two houses), policies (who can read/write where), visibility (where copies live)—all map 1:1 to the long-term substrate.
  • Honest sovereignty story: Correctness does not depend on a central SaaS; a central host is optional for convenience only.
  • Small team–friendly: No LLM, no consensus research, no mobile store—ship a web UI + two node processes a technical family can run.
  • Wedge without painting into a corner: Albums/photos stay in an app layer; the core stays generic (objects, capabilities, nodes, policies).

2. Strict MVP

In scope

  • One family (single trust group) with 2–3 trusted nodes (e.g. home NAS + VPS + laptop).
  • Albums and photo objects (metadata + encrypted blob reference).
  • Upload from browser: client encrypts file; server/node stores ciphertext only.
  • Download/view in browser: decrypt with keys held by authorized members (see trust model).
  • Replication: user-configured targets; push-style sync between nodes (no global consensus).
  • Visibility UI: per-object or per-album “where is this stored?” and replication status (pending / ok / failed).
  • Enrollment: technical flow (token + URL, or paste public key)—no consumer onboarding polish. Product direction: “install + QR” and no home NAT/VPN setup via WebRTC only (outbound to public signaling + public STUN/TURN)—see §12. Same story for browser UX and future Android nodes: they may live entirely behind NAT; reachability does not require router config. Not required for the first technical demo.
  • Single modular monolith binary for the “node” plus a separate vault web app (could be static + API to any node).

Explicit MVP success: A demo where Alice uploads at Node A, sees replication complete to Node B, opens the UI pointed at Node B, and sees the same album/photo without any third-party cloud being on the critical path.


2.5 Product personas and roadmap milestones

Two roles frame who runs what. Milestone 1 is the first shippable slice for technical families; Milestone 2 extends device coverage and UX without changing the operator’s core obligation (signaling + STUN/TURN).

Personas

Persona Responsibility
Operator Hosts shared infrastructure: signaling / seeding (peer rendezvous, SDP and ICE exchange) and STUN/TURN (typically coturn or equivalent on a VPS or small cloud instance with a stable public address). The operator does not need access to family keys or photo plaintext; they provide availability of setup and relay paths only (see §12.5).
User Installs and runs nodes on their own equipment—PCs, home servers, and (in later milestones) phones / Android. Uses applications built on Sover (e.g. the sovereign family vault web app today; future apps on the same substrate) to encrypt, upload, browse photos, and manage replication between trusted nodes.

Milestone 1 — Operator-hosted signaling + STUN/TURN; user runs two nodes and the web app

Operator goals

  • Run a reachable signaling / seeder service on the public internet.
  • Run STUN and TURN so clients behind NAT can complete ICE; TURN may see ciphertext in motion only when the app uses WebRTC tunnels (application payload remains encrypted as in §9–§10).

User goals

  • Deploy two nodes (e.g. p2pos-node in Docker on one PC or two machines) and configure them as trusted peers.
  • Use the vault web application to access albums and upload photos (client-side encryption unchanged).

Connectivity goals (how traffic should flow)

  • ICE chooses the path; the user does not pick IPs. For any leg that uses WebRTC (browser ↔ node vault tunnel, node ↔ node replication), STUN and TURN feed candidates into ICE, which runs connectivity checks and nominates the winning pair (host, server-reflexive, or relay). The family member never has to decide “use LAN IP” vs “use public IP” vs “use TURN”—the stack selects a working path automatically. The outcome we want is: LAN when ICE can use host/local candidates, direct through NAT when reflexive pairs work, operator TURN when only relay works—all without user-supplied IP lists.
  • Bootstrap is one entry, not IP hunting. The user should get online through a single stable entry (e.g. one URL from the operator or from QR / enrollment), not by manually pointing the app at each node’s LAN or WAN address. Today’s dev setups may still expose a raw http://host:port for the static UI; the product shape for Milestone 1 is “open this link / scan this code,” after which signaling + ICE handle peer attachment. Remaining work is mostly UX and packaging around that entry (and optional HTTP bootstrap to the same entry for auth) — not asking users to maintain IP cheat sheets.
  • HTTP vs WebRTC for vault API: Where the app still uses HTTP to a node (e.g. auth bootstrap), that request goes to whatever origin the user loaded the app from or the single configured base URL—again, not a separate manual IP choice per network path. Prefer aligning one origin with the primary node or gateway the family is guided to at install time.

Node-to-node replication uses the same ICE semantics: nodes gather candidates (including via operator STUN/TURN), ICE selects direct or relay automatically. HTTP fallback for large blobs or failed WebRTC setup remains valid (Phase G) and uses the peer’s configured base_url from trust setup (set once at enrollment, not per session by the end user).

Redundancy and catch-up (two-node setup)
With two trusted nodes and replication enabled, stopping one node (e.g. docker stop on one container) does not take down the whole vault for the family: the other node still serves data that was already replicated there. The product goal is that the user keeps using the same bookmark / QR / operator URL where possible, with routing or discovery sending them to a healthy node—not asking them to discover and type the other machine’s IP. (Until that layer exists, a technical user may still repoint the UI origin; Milestone 1 implementation closes that gap over time.) When the stopped node starts again, the replication worker retries pending jobs and syncs blobs and metadata that changed while it was offline—eventual consistency across nodes is restored without manual re-copy. (If only one node ever held a blob and that node stays down, that copy is unavailable until it returns or another source exists—true redundancy requires at least one live peer that already received the replicate.)

Repo alignment: docker/e2e stack, Phase G (p2pos-net, signaling, STUN/TURN), optional WebRTC vault transport from the browser, P2POS_REP_FAILED_RETRY_AFTER_SECS (default 120) automatic requeue of failed replication jobs, split Compose files and Milestone 1 runbook for operator vs family deploys.

Milestone 2 — Same operator model; users and apps on more devices

Operator: Unchanged from Milestone 1—still signaling / seeder + STUN/TURN on VPS-class hosting (scale and HA as the product matures).

User and product goals (extension of Milestone 1)

  • Nodes on additional device classes, notably Android (and other mobile or embedded targets) as first-class mesh peers, using the same outbound-only story: dial public signaling and STUN/TURN; no user-driven home NAT configuration (§12, Phase M2).
  • Simpler enrollment toward install + QR (time-limited tokens, small QR payload, paste-key fallback for power users).
  • Multiple applications on top of Sover: the family vault is the first; future apps reuse identity, nodes, blobs, replication, and the same ICE semantics—LAN when possible, STUN-assisted direct when possible, operator TURN when not.

Connectivity goals: Same LAN → direct LAN, STUN-capable NAT → direct, else TURN ladder as in Milestone 1, applied consistently to browser ↔ node, Android ↔ node, and node ↔ node replication.


3. Explicitly out of scope (MVP)

  • Currency / value / tokens.
  • Local or hosted LLMs; arbitrary remote execution.
  • Android node runtime, browser SDK package (beyond ad-hoc fetch in the web app).
    (Architecture §12 still covers future Android peers behind NAT—same WebRTC + public signaling/STUN/TURN model as the browser.)
  • Consumer phone onboarding, push notifications, app store.
  • Community / multi-tenant “cloud” product.
  • Byzantine consensus, leader election across untrusted parties, complex CRDT graphs.
  • Microservices mesh, Kafka, etc.
  • Fine-grained photo editing, EXIF stripping policy, large-scale search—unless trivial.

4. Overall architecture (three surfaces, one repo)

┌─────────────────────────────────────────────────────────────┐
│  Layer 3 — UX: vault-web (React + TS)                       │
│  Talks to one node’s HTTP API; keys in memory; often NAT’d │
│  (mesh/P2P: WebRTC to public signaling + STUN/TURN — §12)  │
└────────────────────────────┬────────────────────────────────┘
                             │ HTTPS + JSON (+ optional SSE)
┌────────────────────────────▼────────────────────────────────┐
│  Layer 2 — App: family-vault                                  │
│  Albums, photos, family membership projections, app policies  │
└────────────────────────────┬────────────────────────────────┘
                             │ in-process calls
┌────────────────────────────▼────────────────────────────────┐
│  Layer 1 — P2POS core (substrate)                             │
│  Identity, nodes, trust groups, blob store, replication,      │
│  policy engine (embryo), attestations (signed manifests)      │
└─────────────────────────────────────────────────────────────┘

Deployment MVP: Each node runs the same p2pos-node process (Rust monolith). The web app is built as static assets; during demo you either proxy to one node or configure API base URL per deployment.

Critical rule: The family vault crate depends on substrate traits/types; substrate must not import vault types.


Security audit (implemented milestones A–D)

This section records a design-time security review of the code and configuration as shipped through Phase D. It is not a formal penetration test, compliance audit, or cryptographic proof. Use it to decide what is safe to expose and what must change before broader deployment.

Milestone coverage vs security goals

Goal (from architecture) A–D status Notes
Server stores ciphertext only for file payloads Met for blob bytes Node persists opaque octets under blobs/; no server-side decrypt.
Client-side encryption for photos Met (web) AES-GCM in browser; IV prepended to ciphertext (see vault-web crypto module).
Identity tied to keys Partial Ed25519 proves possession of a key at login only. The session token is not bound to that public key on later requests (see gaps).
Trusted-node replication Partial HMAC with shared P2POS_REPLICATE_PSK proves knowledge of the family secret, not per-node identity.
Policy-based read (who may fetch which blob) Not implemented Any valid session can GET /v1/blobs/{id} if the blob exists on that node (Phase E target).
No mandatory central cloud Met for architecture Operators self-host nodes; security then depends on how they expose them.

Controls that exist today

  1. Blob confidentiality from the node (application layer)
    File content is encrypted before upload; the node stores and replicates ciphertext only. Compromise of the DB does not yield photo plaintext without client keys.

  2. Login proof-of-key (Ed25519)
    /v1/auth/challenge + /v1/auth/verify: client signs a fresh nonce. Invalid signatures are rejected.

  3. Opaque session token
    After verify, API routes (except /health and internal replicate) require Authorization: Bearer <token>; token is a random UUID-derived hex string, not a JWT.

  4. Inter-node replicate authenticity (symmetric)
    POST /internal/v1/replicate/{blob_id} requires header X-P2POS-Replicate-Signature: HMAC-SHA256 over blob_id (UTF-8) || body. Verification uses constant-time hex compare. Only parties with P2POS_REPLICATE_PSK should be able to ingest blobs this way.

  5. Transport for node→node client
    HttpPeerTransport uses rustls for HTTPS when the peer base_url is https://….

  6. SQLite integrity
    Albums/photos use foreign keys; replication jobs reference peers.

Gaps and risks (prioritized for follow-up)

Critical / high

  • Session token is not bound to identity. After verify, the server only checks membership in a session set. It does not store “this token belongs to public key X”. Any party who steals a bearer token (XSS, localStorage scrape, log leak) has the same API access as the user until restart or manual code change. Phase E+ should attach capabilities or key id to the session.

  • No session expiry or rotation. Tokens live until the process ends (in-memory set). Stolen tokens do not age out.

  • GET /v1/blobs/{id} is not authorization-scoped. There is no check that the caller “owns” or is a member for that blob; album/photo linkage is not enforced on read. Metadata in SQLite is readable with a valid session but blob fetch is effectively “any blob id”.

  • Shared replication PSK. All trusted peers share one secret. Leak of P2POS_REPLICATE_PSK allows forging replicates to any peer that trusts it. There is no per-peer or per-blob capability in the HMAC. Compromise of one node’s config compromises the replication trust model for the whole mesh.

  • Internal replicate endpoint is unauthenticated except HMAC. There is no rate limit; a holder of the PSK can fill disk (DoS). No TLS requirement for http:// peer URLs—ciphertext could be observed or modified on the wire unless operators use HTTPS and trust the path.

Medium

  • CORS is allow_origin(Any) on the node. Any script that learns the bearer token (e.g. XSS on the vault origin, or token pasted into a malicious page) can call the API from another origin and read responses, because the server reflects Access-Control-Allow-Origin: *. For production-shaped deployments, use an origin allowlist, httpOnly + Secure cookies (or similar), and short-lived tokens.

  • Default P2POS_REPLICATE_PSK and demo keys must be changed for any real deployment; they are documented defaults.

  • Album/photo metadata is cleartext in SQLite (titles, captions, blob ids). This is consistent with “metadata server” but is not “full secrecy” of everything about the vault.

  • /health is unauthenticated (by design for probes); ensure it reveals nothing sensitive (currently OK).

Lower / operational

  • No rate limiting, request size caps (beyond Axum defaults), or audit logging of security events.

  • Browser key storage: Ed25519 and AES key material in localStorage is convenient for demos and vulnerable to XSS and physical access.

API surface (security-relevant)

Surface AuthN AuthZ / notes
/v1/auth/challenge, /v1/auth/verify N/A (verify proves key once) Nonces are single-use when consumed.
/v1/blobs, /v1/albums, /v1/photos, /v1/nodes, /v1/replication/status Bearer session Weak authz on blob read; peers writable by any session.
/internal/v1/replicate/{id} HMAC header Not browser-facing; protect at network layer + strong PSK.
/health None OK for liveness.

Relation to WebRTC (future)

Phases A–D use HTTP + TLS (optional) + HMAC for replication. WebRTC-only mesh transport (§12, Phase G / p2pos-net)—public seeding/signaling plus public STUN/TURN, no libp2p—is planned so Rust nodes, browser, and future Android peers can all sit behind NAT. It improves reachability and eventually peer identity at the transport layer (e.g. DTLS fingerprints aligned with substrate keys); it does not replace application-layer policies (who may read which blob) unless explicitly designed that way.

Suggested next security milestones

  1. Phase E: Session bound to IdentityId, expiry, and blob read policy (e.g. only if blob referenced by a photo in an album the identity may access—or wrapped-key model).
  2. Replication: Per-peer secrets or signed push tokens; TLS-only peer URLs in strict mode.
  3. Web: httpOnly sessions, strict CORS, CSP to reduce XSS impact.
  4. Ops: Threat model doc per deployment (LAN-only vs internet-facing).

5. Monorepo / repository structure

/
├── Cargo.toml                 # workspace root
├── crates/
│   ├── p2pos-core/            # identity, trust, policy types, crypto helpers
│   ├── p2pos-storage/         # encrypted blob store (filesystem impl)
│   ├── p2pos-replication/     # PeerTransport + HMAC (HTTP push; WebRTC later in p2pos-net)
│   ├── p2pos-net/             # (post–first-demo) WebRTC only: public signaling client, ICE, STUN/TURN, data channels
│   ├── p2pos-node/            # HTTP API, wiring, config (binary)
│   └── family-vault/          # domain: albums, photos, vault-specific policies
├── apps/
│   └── vault-web/             # Vite + React + TypeScript
├── docs/
│   └── P2POS_SOVEREIGN_FAMILY_VAULT_ARCHITECTURE.md
└── scripts/                   # demo: docker-compose or two local dirs

Optional later (not MVP): packages/p2pos-client-ts for the browser SDK surface—do not block the MVP on extracting it.


6. Rust workspace / crates / modules

Crate Responsibility
p2pos-core IdentityId, NodeId, TrustGroupId, Capability / Grant, Policy AST (minimal), SignedEnvelope, serialization contracts, error types.
p2pos-storage BlobStore trait: put, get, delete, list; FsEncryptedBlobStore using per-blob AEAD + wrapped DEK in sidecar or manifest; content addressing optional (hash as id).
p2pos-replication ReplicationTarget, ReplicationJob, retry/backoff, “last known state” per peer; transport trait PeerTransport (HTTP impl in node for dev; WebRTC impl in p2pos-net for production reachability—see §12).
p2pos-net (After initial demo.) WebRTC-only stack (e.g. webrtc crate or equivalent): outbound client to public seeding/signaling (WebSocket/HTTPS), public STUN/TURN, ICE, SCTP data channels; replication stream protocol over data channels. Android: UniFFI in Phase M2 (§12.4). No libp2p.
family-vault Album, Photo, membership, mapping photos → blob ids, vault-level “default replication targets”.
p2pos-node Axum (or Actix) server, routes, auth middleware, SQLite/Redb for indexes and replication queue (blobs stay on disk), startup config; wires PeerTransport.

Internal module boundaries inside p2pos-node: api/, auth/, config/, app_vault/ (handlers that call family-vault), substrate/ (re-exports wiring). Keep handlers thin.


7. Frontend structure (apps/vault-web)

apps/vault-web/
├── src/
│   ├── main.tsx
│   ├── App.tsx
│   ├── api/              # fetch client, types generated or hand-written
│   ├── crypto/           # encrypt/decrypt in Web Crypto (wrap in small module)
│   ├── pages/
│   │   ├── Dashboard.tsx
│   │   ├── Albums.tsx
│   │   ├── AlbumDetail.tsx
│   │   ├── Nodes.tsx
│   │   └── Settings.tsx
│   ├── components/
│   └── hooks/
└── vite.config.ts

Rule: No vault business logic hidden in components—use small hooks/services so a future p2pos-client-ts can lift the same patterns.


8. Core domain model

Substrate (generic)

  • Identity: IdentityId (Ed25519 public key or hash of it).
  • Node: NodeId, base URL, human label, public key for attestations.
  • TrustGroup: set of IdentityId + policy defaults (e.g. “members may read all blobs in group X”).
  • BlobRef: opaque id, size, content hash (of ciphertext), encryption scheme id, wrapped key material pointer.
  • Manifest / attestation: signed statement “this BlobRef is stored on NodeId at time T” (MVP: simple JSON + Ed25519).

Family vault (app)

  • Family ≈ one TrustGroupId for MVP (multi-family is a generalization).
  • Album: id, title, created_at, owner identity, ordered list of PhotoId.
  • Photo: id, album_id, blob_ref, thumbnail_blob_ref (optional second blob), caption, created_at.
  • Membership: which identities belong to the family (admin vs member optional flag).

9. Minimal trust / policy model (MVP)

Enrollment

  • Node generates node keypair; operator adds trusted peers by pasting peer URL + peer public key (TOFU).
  • User login MVP: sign a nonce with Ed25519 key in browser (import key from file or generate and download backup)—no passwords required for demo purity.

Policies (embryo)

  • Storage policy: “Blob class photo must exist on at least k of targets = [A,B].”
  • Read policy: “Only identities in TrustGroup may fetch wrapped keys for blobs in album Z.”
  • Execution policy: stub interface only (ExecutionPolicy enum with NoRemoteExecution default).

Wrapped keys

  • Per-blob DEK encrypted for each authorized member (NaCl crypto_box or HPKE-style). MVP: encrypt DEK for each IdentityId public key listed on the album.

What you are not solving yet: revocation rotation drama, group key agreement at scale, hardware attestation—document as follow-ups.


10. Storage model

  • Filesystem layout (per node):
    data/blobs/<hex-prefix>/<blob_id> ciphertext
    data/meta/<blob_id>.json encryption metadata + wrapped DEKs (or inline in DB)
  • Small index DB (SQLite): blob presence, album/photo rows, replication queue.
  • Client: never sends plaintext to node; sends ciphertext + metadata. Node stores and replicates ciphertext + metadata only.

11. Replication model

Semantics: eventual consistency, single-writer per photo object for MVP (last write wins if misused—acceptable for demo if UI avoids concurrent edits).

Mechanism

  1. After put_blob local success, enqueue ReplicationJob { blob_id, targets[] }.
  2. Worker pulls from queue; POST /internal/replicate (mTLS or HMAC with pre-shared node secret for MVP) to peer.
  3. Peer validates trust + optional quota, stores blob, acks.
  4. Originator marks target replicated; UI aggregates node attestations or simple presence map.

Shortcuts OK: periodic full reconcile (list blobs, diff); no Merkle sync required for MVP.

Dangerous shortcut to avoid: requiring S3 or one global database for correctness.

Reachability (product goal, post–first-demo): Plain inbound HTTP between two home NATs forces port forwarding, VPN, or tunneling—bad for “install app + scan QR.” The intended evolution is PeerTransport backed by WebRTC only (see §12): peers dial outbound to public signaling and public STUN/TURN; ICE + TURN when direct paths fail; Rust nodes, browser, and Android (Phase M2) may all be behind NATno user router configuration.


12. Reachability and P2P transport (WebRTC-only, QR, no home NAT setup)

Problem: Two houses behind typical NAT cannot accept arbitrary inbound connections from the public internet without port forwarding, VPN, or equivalent. Requiring users to configure routers violates the super-simple onboarding goal (install + QR).

Principle: Every participant dials out only. Rust nodes, future Android nodes, and browser clients may all sit behind NAT with no inbound holes on the home router. Complexity stays in WebRTC (ICE + STUN/TURN) plus a small public coordination plane—not in user networking skill.

12.0 Default topology (documented assumption)

This architecture assumes two public, operator-run surfaces (same org or self-hosted—see §12.5):

  1. Seeding / signaling server — reachable on the public internet (HTTPS and/or secure WebSocket). All vault peers connect outbound to it to exchange SDP and ICE candidates and to discover or rendezvous with trusted peers. It is not on the ciphertext critical path for correctness if replication framing stays E2EE; it is required for session setup and optional metadata (who is online, pairing tokens).

  2. STUN + TURNpublic ICE servers (often one vendor or self-hosted coturn). STUN discovers reflexive addresses; TURN relays media/datagrams when direct peer paths fail (symmetric NAT, strict firewalls). TURN may see ciphertext in motion only; payloads remain blob ciphertext + app-layer HMAC as in Phase D.

With this default, browser vault UX (typically behind NAT), Android nodes (carrier/WiFi NAT), and home Rust nodes (behind home NAT) all share the same reachability story: outbound to signaling + STUN/TURN, then SCTP data channels (or equivalent) between peers when ICE succeeds.

For empirical bands on direct vs TURN-mediated paths (especially cellular mobile ↔ home NAT), source comparison, relay capacity planning, and getStats instrumentation guidance, see the documentation annex WebRTC direct P2P connection rates without TURN (mobile ↔ home NAT).

12.1 WebRTC as the only P2P stack (no libp2p layer)

Use standards-based WebRTC only: browser RTCPeerConnection for vault-web when mesh/P2P paths are wired; on servers and Android (Phase M2), a native stack (e.g. Rust webrtc-rs/webrtc or platform WebRTC) with the same ICE/signaling contract.

Scope simplification: Replication rides SCTP data channels (or an equivalent negotiated subprotocol) over DTLS. There is no separate libp2p (or similar) framing layer—signaling is out of band to the public server in §12.0, not part of the encrypted data-channel payload path.

Stack summary:

Layer Choice Rationale
Transport ICE + UDP/TCP candidates Works for browser, Rust, and Android; ubiquitous NAT traversal story.
Security DTLS (WebRTC norm) Session encryption; compose with app-layer HMAC/signing for replicate ingest.
NAT / reachability Public STUN + public TURN All NAT’d clients obtain candidates and fallback relay without router config.
Direct path ICE connectivity checks Prefer host/srflx; use TURN only when needed (latency/cost tradeoff).
LAN Host candidates (optional mDNS where supported) Same WiFi: direct local candidates when both sides share a LAN.
Session setup Public signaling / seeding (WSS/HTTPS) Single stable hostname in QR (DNS); no embedding long SDP in the QR.

Reliability: If TURN or signaling is down, P2P mesh sync stalls; local HTTP API to a node the browser can still reach (e.g. LAN or tunnel) remains a separate concern (M1-style). Design for high availability of signaling/TURN only where the product promises cross-NAT mesh.

12.2 Roles: Rust node, Android node, browser

Role NAT Expectations WebRTC posture
Rust node (home / laptop / VPS) Often yes (home) or public VPS Always-on where possible; disk for blobs Outbound to signaling + STUN/TURN; data channels to other trusted nodes (and later mobile/browser peers).
Android node (Phase M2) Yes (mobile networks) Intermittent, background limits Same as Rust: UniFFI-wrapped WebRTC or system stack; relay-first friendly.
Browser (vault-web) Yes (typical home/office) Talks HTTPS to one node’s API for CRUD today For P2P replication or mesh (Phase G+): RTCPeerConnection, same signaling URL + ICE servers; no inbound listen port on the user’s PC.

12.3 QR code payload (keep the QR small)

Do not embed long ICE candidate lists or SDP blobs in the QR. Prefer:

  • Short-lived join / pairing token (app-layer, signed or redeemable once).
  • Family or trust-group id (opaque).
  • Signaling / seeding URL (stable public HTTPS/WSS, e.g. wss://signal.example.org) — DNS-backed so QRs do not go stale when IPs move.
  • Optional: STUN/TURN URIs or time-limited TURN credentials (if not provisioned after login).

The app uses the token + public signaling endpoint to register, exchange SDP/ICE with trusted peers, and open data channels—exact wire format is implementation detail; UX stays scan → joined.

12.4 Two-phase mobile / browser plan (reduce schedule risk)

  1. Phase M1 — Asymmetric (faster): Mobile or browser uses HTTPS (and optional WebSocket) to a home node that is reachable (LAN, tunnel, or public URL); Rust nodes use WebRTC mesh node-to-node via §12.0. Delivers “both surfaces” quickly; mobile/browser not yet a full mesh peer.
  2. Phase M2 — Symmetric: p2pos-net (WebRTC only) behind a thin API; Android via UniFFI (or system WebRTC); browser keeps RTCPeerConnection. Android and browser become first-class mesh peers—still no user NAT config, still public signaling + STUN/TURN by default.

Crate suggestion: add crates/p2pos-net/ (signaling client to the public seeding server, peer connection lifecycle, ICE server config, replication stream protocol over data channels) implementing PeerTransport for Rust nodes. Keep HTTP transport as optional for dev/tests.

12.5 Sovereignty wording (stays coherent)

  • No user VPN / port forwarding is a product requirement for the mesh.
  • Public signaling + public STUN/TURN are the default documented deployment for the “simple WebRTC” story; families or operators who want zero third-party infra can self-host the same three pieces (signaling service + coturn + DNS) on a VPS they control—same protocol, different operator.
  • Correctness of “our keys, our ciphertext” does not require trusting signaling/TURN content—only availability of setup and relay paths (relay sees ciphertext in motion; application payload stays encrypted as in §9–§10).

13. Backend API (minimal)

Auth: Authorization: Bearer <session> where session is established after signature on /v1/auth/challenge.

Vault

  • GET /v1/albums — list
  • POST /v1/albums — create
  • GET /v1/albums/:id — detail + photos
  • POST /v1/photos — register metadata + BlobRef (after upload)
  • POST /v1/blobs — upload ciphertext (multipart or base64 JSON for small demo)
  • GET /v1/blobs/:id — download ciphertext

Substrate / ops

  • GET /v1/nodes — this node + known peers
  • POST /v1/nodes/peers — add trusted peer
  • GET /v1/replication/status — aggregate queue + per-blob status

Internal (peer-to-peer)

  • POST /internal/v1/replicate — ingest blob from trusted peer

All routes versioned under /v1 to preserve a stable path for a future browser SDK.


14. Frontend pages & major flows

Page Purpose
Dashboard Family name, health: this node, peer count, replication backlog.
Albums List/create albums.
Album detail Grid of thumbnails; upload flow (encrypt → upload blob → register photo).
Nodes Trusted nodes list, add peer, see last sync, storage location summary.
Settings Key backup/download, API base URL, dark mode (optional).

Flows: Create album → upload photos → open Nodes page → see two green checkmarks for replication → switch API URL to second node → album still visible.


15. Compelling demo scenario (script)

Cast: “River family”—two houses (Node Oak and Node Pine) and optional Cedar (cheap VPS) for off-site ciphertext.

  1. On Oak, create album “Summer 2026”, upload 5 photos; UI shows blobs only on Oak.
  2. Add Pine as trusted peer; replication jobs run; UI shows Oak ✓ Pine ✓.
  3. Disconnect Oak from network (or stop process); point browser at Pine; family opens same album; photos decrypt and display.
  4. Show Settings or Nodes copy: “No account on our servers—only keys and nodes you chose.”

16. Phased implementation plan

  1. Phase A — Skeleton: workspace, p2pos-node “hello”, vault-web shell, health endpoint.
  2. Phase B — Crypto path: browser encrypt/decrypt; node blob put/get; SQLite indexes.
  3. Phase C — Vault domain: albums/photos CRUD backed by DB + blob refs.
  4. Phase D — Replication: peer add, job queue, internal replicate endpoint (HTTP PeerTransport for dev), status UI.
  5. Phase E — Trust/policy embryo: signed challenges, wrapped DEKs per member, minimal policy checks.
  6. Phase F — Demo hardening: docker-compose, scripted reset, README runbook.
  7. Phase G — WebRTC-only transport (post-MVP wedge toward QR onboarding): p2pos-net with public seeding/signaling, ICE, public STUN/TURN, and data channels (no libp2p); Phase M1 mobile/browser (HTTP to reachable home node) optional ahead of Phase M2 (UniFFI + Android + browser as full mesh peers, all NAT-friendly).

17. Decisions that should stay stable (long-term)

  • Layering: substrate crates vs app crate vs UX app.
  • Blob abstraction: BlobStore trait + content ids.
  • Transport boundary: PeerTransport with multiple implementations (HTTP for dev/tests; WebRTC for production reachability); replication logic must not assume HTTP-only.
  • API versioning: /v1/... for public HTTP.
  • Identity as keys: capabilities tied to cryptographic identity, not emails; align WebRTC peer identity / DTLS fingerprints with substrate IdentityId (explicit mapping layer).
  • Local-first truth: each node authoritative for what it has stored; sync merges via explicit protocols.

18. Acceptable MVP shortcuts

  • TOFU peer enrollment; PSK or single shared secret between nodes.
  • Single global family per deployment.
  • SQLite only; no HA database.
  • Last-write-wins on metadata.
  • Full blob re-upload on conflict detection.
  • Session tokens stored in localStorage (demo only; document httpOnly cookie path for production).

19. Dangerous shortcuts (undermine the vision)

  • Putting album/photo tables or concepts inside p2pos-core (contaminates substrate).
  • Central object store as the only source of truth (kills sovereignty story).
  • Server-side decryption for convenience (kills privacy-by-default).
  • Hard-coding one cloud provider in core storage.
  • Monolithic frontend that bakes node URLs and vault API without a thin client boundary.
  • Implicit trust (any peer can pull any blob) without policy hooks—prevents future multi-app substrate.

p2pos/
├── Cargo.toml
├── README.md
├── docs/
│   └── P2POS_SOVEREIGN_FAMILY_VAULT_ARCHITECTURE.md
├── crates/
│   ├── p2pos-core/
│   ├── p2pos-storage/
│   ├── p2pos-replication/
│   ├── p2pos-net/
│   ├── family-vault/
│   └── p2pos-node/
├── apps/
│   └── vault-web/
│       ├── package.json
│       ├── vite.config.ts
│       └── src/
└── scripts/
    └── demo-two-nodes.sh

8-week milestone plan

Week Milestone
1 Rust workspace + p2pos-node binary + health + config; Vite/React app with API client stub.
2 BlobStore + filesystem impl; upload/download ciphertext; minimal auth challenge.
3 SQLite schema for albums/photos; vault CRUD API; list UI.
4 Browser encrypt/decrypt + wrapped DEK for one user; wire upload pipeline.
5 Peer registry + replication job + internal endpoint; second node in docker-compose.
6 Replication status in API + Nodes page; basic backoff/retries.
7 Trust group embryo: multiple identities, DEK wrap for each; policy check on download.
8 Demo script, polish, failure modes (offline node), documentation; freeze MVP scope.

After week 8 (product track): Phase G / M1–M2p2pos-net (WebRTC only; public signaling + public STUN/TURN by default), QR enrollment; browser/mobile HTTP-to-home (M1) where needed, then UniFFI + RTCPeerConnection full mesh (M2) with Android and browser behind NAT. See §12.


Exact next Cursor prompt (follow-up)

Phases A–G are in the repo. Next hardening targets: per-peer replicate secrets, strict HTTPS-only peer URLs, httpOnly sessions / CORS allowlist, chunked WebRTC frames for large blobs, and authenticated signaling.


End of architecture document.