# Acequia Distributed Sync & Swarm Transfer – Requirements Document ## 1. Goal Design and implement a **browser- and Node.js–native, rsync-like synchronization system** written entirely in JavaScript, suitable for Acequia nodes running in: * Node.js servers * Browsers (desktop & mobile) * PWAs using Origin Private File System (OPFS), IndexedDB, or File System Access API The system must support **bidirectional sync**, **resumable large-file transfer**, and **multi-source swarm distribution**, while abstracting over multiple transport layers. No external system binaries (e.g. `rsync`) are assumed to exist on either endpoint. --- ## 2. Core Design Principles * **Transport-agnostic core** * **Content-addressed & chunk-based** * **Resumable and interruption-tolerant** * **Scales from 1:1 sync to N:M swarm distribution** * **Runs symmetrically on browser ↔ browser, browser ↔ server, server ↔ server** --- ## 3. Transport Abstraction Layer ### 3.1 Transport Priority Order The system MUST attempt transports in the following order: 1. **WebTransport (preferred)** 2. **WebSocket** 3. **WebRTC DataChannels** Transport selection is dynamic and negotiated at runtime. --- ### 3.2 Transport Capabilities Matrix | Capability | WebTransport | WebSocket | WebRTC | | -------------------------- | ------------ | --------- | ------ | | Reliable streams | Yes | Yes | Yes | | Unreliable / unordered | Yes | No | Yes | | Backpressure control | Yes (native) | Limited | Medium | | Multiplexed streams | Yes | No | Yes | | NAT traversal | No | No | Yes | | Memory footprint | Low | Very Low | High | | CDN / tunnel compatibility | Yes (HTTP/3) | Yes | No | --- ### 3.3 Transport Adapter Interface Each transport MUST implement a common interface: ``` connect(peerDescriptor) sendChunk(chunkId, data, metadata) requestChunk(chunkId) ack(chunkId) pause() resume() close() ``` The sync engine MUST NOT depend on transport-specific semantics. --- ## 4. Storage Targets ### 4.1 Supported Storage Backends * Node.js filesystem * Browser OPFS (Origin Private File System) * File System Access API * IndexedDB (fallback / metadata only) Each backend exposes a uniform virtual filesystem abstraction: ``` stat(path) readRange(path, offset, length) writeRange(path, offset, data) listDirectory(path) hashRange(path, offset, length) ``` --- ## 5. Rsync-Like Functionality (Reimplemented) ### 5.1 File Segmentation * Files are split into **fixed or adaptive chunks** (e.g. 1–8 MB) * Each chunk has: * Content hash * Offset * Length ### 5.2 Change Detection * Metadata comparison (size, mtime) * Rolling or fixed hashes (implementation choice) * Chunk-level diffing instead of whole-file transfer ### 5.3 Resume & Interruption Handling * Chunk-level acknowledgements * Transfer state persisted locally * Resume from last confirmed chunk after: * Network drop * Process restart * Device sleep / wake --- ## 6. Large File Handling (1–10+ GB) * Chunked streaming only (no full-file buffering) * Explicit backpressure handling * Out-of-order chunk reception allowed * Final reassembly verification via full-file hash --- ## 7. Swarm / Multi-Source Transfer (CDN Pattern) ### 7.1 Pattern Description Implement **managed swarm downloading**: * A coordinator (logical, not centralized) tracks: * Which peers have which chunks * Peer upload capacity * Receiver requests **different chunks from multiple peers in parallel** * Inspired by BitTorrent-style swarms, but: * Origin-scoped * Authenticated * Application-controlled This explicitly follows the **pattern used by CDN tunnel systems (e.g. Cloudflare-style architectures)** **without using Cloudflare tunnels themselves**. Acequia implements this pattern directly at the application layer. ### 7.2 Use Cases * Large media distribution (e.g. 2 GB video) * Many low-bandwidth peers → one high-bandwidth sink * University / data-center node as aggregation point --- ## 8. Coordination & Metadata Exchange * Lightweight manifest exchange: * File list * Chunk maps * Availability vectors * Transported over: * WebTransport control stream (preferred) * WebSocket control channel * WebRTC data channel (fallback) --- ## 9. Deployment & Tunnel Pattern (Without Cloudflare) ### 9.1 Tunnel Pattern Clarification Acequia **does NOT depend on Cloudflare tunnels**. Instead, it **reimplements the same architectural pattern**: * Multiple endpoints bound to a shared logical origin * Each endpoint capable of serving content * Clients dynamically select optimal peers * Origin identity decoupled from physical network topology This pattern is implemented using Acequia’s own transport abstraction and peer discovery mechanisms. ### 9.2 Direct Node ↔ Node * Public IP or LAN: * WebTransport works directly * No third-party tunnel required * Same protocol stack as browser ↔ server --- ## 10. Security & Trust (Out of Scope for v1, but Required Hooks) * Pluggable authentication * Optional end-to-end encryption at chunk layer * Trust graph / peer reputation (future) --- ## 11. Non-Goals (v1) * POSIX filesystem semantics * Block-device mirroring * Kernel-level integration * rsync protocol compatibility --- ## 12. Summary This system is: * A **JavaScript-native rsync analogue** * Transport-agnostic * Optimized for modern web transports * Capable of **sync**, **backup**, and **swarm CDN-style distribution** * Explicitly inspired by **Cloudflare-like tunnel/CDN patterns**, but **implemented entirely within Acequia** It treats **files as streams, networks as variable, and peers as collaborators**, not clients.