# Federated Query: Data Stays, Queries Travel

## The Default Assumption

Most systems assume you collect data first, then query it. Download the video. Ingest the GIS layer. ETL the records into a warehouse. Build a copy of the world inside your perimeter, then ask questions of the copy.

This works when data is small and sources are few. It breaks when data is large, sources are many, and freshness matters.

## The Inversion

In the acequia architecture, data stays where it was created. Queries travel to the data. Each node on the network holds its own content (video, imagery, sensor feeds, documents, GIS layers) and answers questions about that content on demand.

No central database. No staging area. No import step. The query goes out; the answer comes back.

This is called **federated query**, **compute-to-data**, or **in-situ processing**, depending on the field. The core insight is the same: data is heavy, queries are light. Move the light thing.

## How It Works in Practice

### Video: Zoom Recordings

A Zoom recording of a research meeting lives on Zoom's servers. The conventional approach: download the MP4, upload it to your own storage, transcode it, index the transcript, build a search interface.

The acequia approach: query the recording where it already lives. A local proxy handles Zoom's session cookies and NWS API. The response includes the video stream URL, transcript with timestamps, participant list, and topic metadata. No file is downloaded. The page renders the video via the proxy and displays the transcript synchronized to playback. If you search the transcript, you seek directly to that moment in the stream.

The recording never leaves Zoom's infrastructure. Your node queries it, serves the relevant slice, and caches only what was requested.

### Imagery: Cloud-Optimized GeoTIFF

A 2 GB satellite image sits on an S3 bucket or a WebDAV node. The conventional approach: download the entire file, load it into a GIS application, navigate to your area of interest.

The cloud-native approach: read the file's header (a few kilobytes at the front of the file), determine which tiles overlap your viewport, issue HTTP Range requests for those tiles only. A 2 GB file yields a 200 KB response for a single viewport. The file format itself (COG, FlatGeobuf, Zarr, Cloud-Optimized Point Cloud) is designed so the index lives at the front and the data is chunked for partial retrieval.

### Video: Space-Time Tiles

A faststart-encoded MP4 is structurally identical to a Cloud-Optimized GeoTIFF. The moov atom at the front of the file maps timestamps to byte offsets, just as a COG header maps spatial tiles to byte offsets. An HTTP Range request retrieves the frames for a specific time window without transferring the rest of the file.

This is how browser video seeking already works. When you scrub to minute 47 of a video, the browser reads the moov atom, calculates the byte range for minute 47, and fetches those bytes. The full file never downloads.

Acequia generalizes this across many sources indexed by space, time, and meaning. A viewpoint query (a frustum through space-time at a given resolution) fans out to every node whose content intersects that frustum. Each node serves its slice. The requesting node composites.

## The Agent Layer

In the acequia system, each node runs a local agent (the mayordomo) that handles incoming queries. The agent knows what content the node holds, how it is indexed, and what transformations it can perform locally.

When a query arrives:

1. The agent evaluates whether local content is relevant
2. If yes, it extracts the requested slice (spatial tile, temporal range, transcript segment)
3. It serves the response using standard HTTP (Range headers, WebDAV verbs)
4. It may cache the derived result for future requests

The agent is the query processor. It replaces the centralized database engine with distributed, local intelligence. Each node answers for itself.

## What This Makes Possible

### Realtime access without replication

A camera feed in Santa Fe, a Zoom recording at Harvard, a LiDAR scan in Albuquerque. All queryable from a single interface without copying any of them to a central server. The index knows where things are. The query goes to the source.

### Resolution on demand

A zoomed-out globe view needs low-resolution thumbnails from thousands of cameras. A zoomed-in street view needs high-resolution frames from two or three cameras. The tile pyramid serves the right resolution for the viewport. No source video is transcoded to multiple resolutions in advance; the node extracts what is needed at query time.

### Temporal slicing

"Show me what this intersection looked like at 3:47 PM on Tuesday." The query carries a timestamp. Each relevant node looks up the byte range for that moment and serves those frames. No one downloads hours of video to find one minute.

### Semantic query across distributed transcripts

"Find every mention of 'evacuation route' across all recorded meetings this semester." The query fans out to every node hosting a meeting recording. Each node searches its local transcript and returns matching segments with timestamps. The requesting node merges results. No central transcript database exists. Each recording's transcript lives with the recording.

## Why This Is Invisible to Most Users

The centralized model is so pervasive that people plan around it without realizing there is an alternative. The workflow "download it, put it in our system, then query it" is assumed to be the only workflow.

This assumption drives real costs:

- **Storage duplication.** The same video exists on Zoom, on Google Drive, on a department server, on a laptop. Four copies, none authoritative.
- **Staleness.** The warehouse copy is always behind the source. ETL pipelines introduce lag. Someone is always looking at yesterday's data.
- **Bandwidth waste.** Downloading a 2 GB file to look at one tile. Transferring an hour of video to watch two minutes.
- **Permissioning complexity.** Every copy needs its own access controls. Copies drift out of sync with source permissions.

The federated model eliminates all of these. One copy, at the source, queried in place. Permissions are enforced at the source. Freshness is guaranteed because there is no copy.

## The Technical Requirements

For federated query to work, content must be structured for partial retrieval:

| Content Type | Requirement | Why |
|---|---|---|
| Video | `-movflags +faststart` | Moov atom at front enables timestamp-to-byte lookup |
| Raster imagery | Cloud-Optimized GeoTIFF | Tiled layout with header index for spatial Range requests |
| Point clouds | COPC (Cloud-Optimized Point Cloud) | Octree index at front, chunked data |
| Vector geo | FlatGeobuf | Spatial index enables partial reads |
| Tabular data | Parquet / Arrow | Column-chunked with row group indexes |
| Transcripts | VTT/SRT with timestamps | Enables temporal seek to text matches |

The common pattern: **put the index at the front of the file, chunk the data so partial reads are meaningful.** The file itself becomes a queryable service.

## Relationship to Acequia Architecture

The acequia is a network of autonomous nodes (parcelas), each holding its own content and running its own agent. There is no central server above and no dumb device below. Content flows laterally between peers.

Federated query is how this network answers questions. A query enters the network at any node. The node's agent either answers locally or routes the query to nodes that can. Responses flow back. The querying node assembles the result.

This is the same topology as a traditional acequia irrigation system: water flows laterally through channels, governed locally at each gate. No central pump station. No reservoir hierarchy. Each parciante manages their own flow.

Data, like water, stays in the channel until someone needs it. The query is the request to open the gate.

## See Also

- [Viewpoint Rendering](https://realtime.earth/docs/viewpoint-rendering.md): space-time tile queries for video
- [Agent Skills](https://acequia.org/docs/agent-skills.md): URI-in, URI-out transformation pattern
- [Browser Hosting](https://acequia.io/.ai/browser-hosting.md): browsers as CDN endpoints in the network

---

Version: 1.0