Component 5: The Grothendieck Topology

Security as a topological structure on the schema.

Security in a data platform is typically a pile of ad-hoc rules: role checks, row filters, column masks, marking-based restrictions. Each mechanism is coded separately, and their interactions are handled case by case : or, more often, not handled at all. The categorical formulation replaces all of this with one structure: a Grothendieck topology J on OntSch. The topology says which data is visible; a single adjunction : sheafification : enforces the policy, and everything else is a special case.

1. Sieves

Definition 1.1 : Sieve

A sieve on O ∈ Ob(OntSch) is a set S of morphisms with codomain O that is closed under precomposition: if f ∈ S and g is composable, then f ∘ g ∈ S. Think of it as a "right ideal" of arrows into O.

Two extremes: the maximal sieve (all morphisms into O : "full access") and the empty sieve ("no access"). Sieves form a complete lattice under inclusion; the meet of two sieves is their intersection.

Concretely, a sieve on entity type O is a set of access paths. If you can reach O through some chain of link traversals, those chains form a sieve. Closure under precomposition means: if a path reaches O, any extension of that path does too.

Example 1.2 : Access Paths

Take O = Dataset. A sieve might include the path Project → Dataset (direct access) and, by closure, Workspace → Project → Dataset (extended access). The maximal sieve includes everything; the empty sieve permits nothing. Intersecting a "viewer" sieve with a "project P" sieve gives paths satisfying both constraints.

2. The Grothendieck Topology

Definition 2.1 : Grothendieck Topology

A Grothendieck topology J on OntSch assigns to each object O a collection J(O) of covering sieves, satisfying:

Maximality: The maximal sieve always covers.
Stability: If S ∈ J(O) and f : O′ → O, then f*(S) = {g | f ∘ g ∈ S} ∈ J(O′).
Transitivity: If S ∈ J(O) and R is a sieve on O such that f*(R) ∈ J(dom(f)) for every f ∈ S, then R ∈ J(O).

Read these as security axioms:

Remark 2.2 : Security Reading

Maximality: a superuser with all permissions passes every check. Unremarkable but necessary.
Stability: if you have sufficient authorization for O, and you follow a link to O′, your authorization propagates. You cannot bypass a restriction by navigating around it.
Transitivity: if a set of access paths is authorized, and each extends to authorized paths at deeper levels, then the deep access is itself authorized. Composed authorizations compose.

Remark 2.3 : Why "topology"?

On a topological space, open sets satisfy analogous axioms. A Grothendieck topology generalizes this to arbitrary categories : "opens" become sieves, "open covers" become covering families. OntSch has no points or open sets in the classical sense; the topology is given purely in terms of morphisms.

3. Sheaves

Definition 3.1 : Sheaf

A presheaf Φ : OntSch^op → Set is a sheaf for J if for every covering sieve S ∈ J(O) and every compatible family {x_f ∈ Φ(dom(f))}_{f ∈ S}, there exists a unique x ∈ Φ(O) restricting to each x_f.

Compatibility means: for all f ∈ S and composable g, Φ(g)(x_f) = x_{f ∘ g}. In the security reading: a sheaf is a data state consistent with the access policy (the data seen through any authorized path agrees with the data seen through any other authorized path to the same object). No contradictions between views. A presheaf that violates this would show different data along different access paths; sheafification eliminates such states.

Definition 3.2 : Sheaf Topos

Ont_sec = Sh(OntSch, J), the full subcategory of sheaves. It is a topos : inheriting all the structure of Ont, but restricted to policy-consistent states.

Example 3.3

A Dataset accessible via two paths: Project → Dataset and Workspace → Project → Dataset. A sheaf requires that the data restricts consistently along both paths. A presheaf assigning different "versions" along different paths fails the sheaf condition. Sheafification collapses the inconsistency, keeping only what agrees across all authorized paths.

4. The Sheafification Adjunction

This is the central result.

Theorem 4.1 : Sheafification Adjunction

There exists an adjunction

a : [OntSch^op, Set] ⇄ Sh(OntSch, J) : i

where i is the inclusion and a is sheafification. The functor a takes any presheaf (possibly containing unauthorized data) and produces the nearest sheaf: the secured version. The counit ε : a ∘ i → Id is an isomorphism (sheaves are already secured).

Construction 4.2 : Plus Construction

The sheafification a(Φ) is built by iterating the "plus construction." Define:

Φ⁺(O) = colim_{S ∈ J(O)} Match(S, Φ)

where Match(S, Φ) is the set of compatible families on the covering sieve S. Applying plus twice : a(Φ) = (Φ⁺)⁺ : first gives separation (uniqueness), then sheafness (existence).

In security terms: a(Φ) keeps data visible through any authorized path and strips everything else. The colimit over covering sieves means: if there is any authorized way to see a piece of data, it survives; if there is none, it is gone. No per-case filters. One functor, one enforcement point.

5. Properties of the Adjunction

Three properties make this useful in practice, not just in theory.

Proposition 5.1 : Compositionality

If you cannot access O₂, traversing a link L : O₁ → O₂ does not help. Stability says f*(S) is covering whenever S is. If O₂ has no covering sieve including the user, neither does the pullback along L. You cannot escape a restriction by following links. ∎

Proposition 5.2 : Idempotence

a ∘ i ≅ Id. Sheafifying an already-secured state does nothing. If Φ is a sheaf, matching families are just restrictions of the unique global element; the colimit collapses. Safe to apply a at every query boundary without worry. ∎

Proposition 5.3 : Minimality

a strips exactly the unauthorized data : nothing more. By the universal property of the left adjoint, any map from Φ to a sheaf factors uniquely through a(Φ). So a(Φ) is the nearest sheaf: the closest secured state. ∎

Taken together: security propagates correctly, re-applying is harmless, and no data is over-stripped. The query engine can apply a at every boundary : API, subquery, cache : without correctness concerns.

6. Concrete Security Mechanisms

Every standard security primitive maps into this framework as data defining the topology J : no new machinery is needed, and the same adjunction handles all of them.

6.1 Projects as Open Sets

A project is a collection of entity types forming a security boundary. Users belong to projects; their access is scoped to the project sieve. For each project P, the covering sieves on entity types within P include the project membership sieve (the set of access paths originating from within P).

6.2 Organizations as Clopen Partitions

Organizations are hard silos: simultaneously open (complete within themselves) and closed (impermeable from outside). No sieve on an entity type in organization A includes morphisms from organization B. Cross-organization access is topologically excluded.

6.3 Roles as Chains of Sub-Sieves

Within a project, roles form an ordered chain: Owner ≥ Editor ≥ Viewer ≥ Discoverer. Each role defines a sub-sieve; higher roles see strictly more. Role checks become sieve inclusions rather than ad-hoc predicates.

6.4 Markings as Sub-Topology Generators

A marking m (PII, PHI, classified, ...) generates a sub-topology J_m ⊂ J: the covering sieves are those where every path has clearance for m. Multiple markings compose by intersection: J ∩ J_m₁ ∩ J_m₂ ∩ …. Adding a new marking type extends the topology by another generator and composes with everything that already exists : no code changes.

6.5 Row-Level Security as Sub-Presheaf

For user u, a characteristic morphism χ_u : Φ(O) → Ω determines which rows are visible. Sheafification with respect to the user's effective topology includes this restriction: objects where χ_u evaluates to false are stripped. Typical policies ("only rows you own," "only rows in your region") become characteristic morphisms; the sheaf condition ensures consistency across paths.

6.6 Column-Level Security as Schema Projection

Column-level security is a projection in Sch: the user sees S_u(O) ↪ S(O). Columns outside S_u are invisible. This composes orthogonally with row-level security : both are enforced by a single sheafification step.

7. The Interaction Theorem

The real payoff of the topological approach:

Theorem 7.1 : Composition of Mechanisms

All security mechanisms : projects, organizations, roles, markings, row-level filters, column-level masks : combine into a single Grothendieck topology J. Sheafification with respect to J enforces all of them simultaneously. Adding a new mechanism means adding a generator to J; the interaction with existing mechanisms is handled by the closure properties of topologies.

Every query passes through a. One adjunction, one topology, one enforcement path. No per-case logic for "role X in project Y with marking Z."

Corollary 7.2

Implementing a new security mechanism (a new marking, a new row filter, a new access model) requires extending J. No changes to the query pipeline or other mechanisms. The architecture is both extensible and compositional by construction.

Summary

Security is a Grothendieck topology on the schema: sieves are access paths, sheaves are policy-consistent states, and the adjunction a ⊣ i enforces the policy : compositionally, idempotently, minimally. All concrete mechanisms (projects, organizations, roles, markings, row/column filters) are topological data composing into a single J, with the secured topos Ont_sec = Sh(OntSch, J).

Next: Component 6: The Lineage Category, provenance as a free category on a DAG.