Security as a topological structure on the schema.
Security in a data platform is typically a pile of ad-hoc rules: role checks, row filters, column masks, marking-based restrictions. Each mechanism is coded separately, and their interactions are handled case by case : or, more often, not handled at all. The categorical formulation replaces all of this with one structure: a Grothendieck topology J on OntSch. The topology says which data is visible; a single adjunction : sheafification : enforces the policy, and everything else is a special case.
A sieve on O ∈ Ob(OntSch) is a set S of morphisms with codomain O that is closed under precomposition: if f ∈ S and g is composable, then f ∘ g ∈ S. Think of it as a "right ideal" of arrows into O.
Two extremes: the maximal sieve (all morphisms into O : "full access") and the empty sieve ("no access"). Sieves form a complete lattice under inclusion; the meet of two sieves is their intersection.
Concretely, a sieve on entity type O is a set of access paths. If you can reach O through some chain of link traversals, those chains form a sieve. Closure under precomposition means: if a path reaches O, any extension of that path does too.
Take O = Dataset. A sieve might include the path Project → Dataset (direct access) and, by closure, Workspace → Project → Dataset (extended access). The maximal sieve includes everything; the empty sieve permits nothing. Intersecting a "viewer" sieve with a "project P" sieve gives paths satisfying both constraints.
A Grothendieck topology J on OntSch assigns to each object O a collection J(O) of covering sieves, satisfying:
Read these as security axioms:
On a topological space, open sets satisfy analogous axioms. A Grothendieck topology generalizes this to arbitrary categories : "opens" become sieves, "open covers" become covering families. OntSch has no points or open sets in the classical sense; the topology is given purely in terms of morphisms.
A presheaf Φ : OntSchop → Set is a sheaf for J if for every covering sieve S ∈ J(O) and every compatible family {xf ∈ Φ(dom(f))}f ∈ S, there exists a unique x ∈ Φ(O) restricting to each xf.
Compatibility means: for all f ∈ S and composable g, Φ(g)(xf) = xf ∘ g. In the security reading: a sheaf is a data state consistent with the access policy (the data seen through any authorized path agrees with the data seen through any other authorized path to the same object). No contradictions between views. A presheaf that violates this would show different data along different access paths; sheafification eliminates such states.
Ontsec = Sh(OntSch, J), the full subcategory of sheaves. It is a topos : inheriting all the structure of Ont, but restricted to policy-consistent states.
A Dataset accessible via two paths: Project → Dataset and Workspace → Project → Dataset. A sheaf requires that the data restricts consistently along both paths. A presheaf assigning different "versions" along different paths fails the sheaf condition. Sheafification collapses the inconsistency, keeping only what agrees across all authorized paths.
This is the central result.
There exists an adjunction
a : [OntSchop, Set] ⇄ Sh(OntSch, J) : i
where i is the inclusion and a is sheafification. The functor a takes any presheaf (possibly containing unauthorized data) and produces the nearest sheaf: the secured version. The counit ε : a ∘ i → Id is an isomorphism (sheaves are already secured).
The sheafification a(Φ) is built by iterating the "plus construction." Define:
Φ+(O) = colimS ∈ J(O) Match(S, Φ)
where Match(S, Φ) is the set of compatible families on the covering sieve S. Applying plus twice : a(Φ) = (Φ+)+ : first gives separation (uniqueness), then sheafness (existence).
In security terms: a(Φ) keeps data visible through any authorized path and strips everything else. The colimit over covering sieves means: if there is any authorized way to see a piece of data, it survives; if there is none, it is gone. No per-case filters. One functor, one enforcement point.
Three properties make this useful in practice, not just in theory.
If you cannot access O₂, traversing a link L : O₁ → O₂ does not help. Stability says f*(S) is covering whenever S is. If O₂ has no covering sieve including the user, neither does the pullback along L. You cannot escape a restriction by following links. ∎
a ∘ i ≅ Id. Sheafifying an already-secured state does nothing. If Φ is a sheaf, matching families are just restrictions of the unique global element; the colimit collapses. Safe to apply a at every query boundary without worry. ∎
a strips exactly the unauthorized data : nothing more. By the universal property of the left adjoint, any map from Φ to a sheaf factors uniquely through a(Φ). So a(Φ) is the nearest sheaf: the closest secured state. ∎
Taken together: security propagates correctly, re-applying is harmless, and no data is over-stripped. The query engine can apply a at every boundary : API, subquery, cache : without correctness concerns.
Every standard security primitive maps into this framework as data defining the topology J : no new machinery is needed, and the same adjunction handles all of them.
A project is a collection of entity types forming a security boundary. Users belong to projects; their access is scoped to the project sieve. For each project P, the covering sieves on entity types within P include the project membership sieve (the set of access paths originating from within P).
Organizations are hard silos: simultaneously open (complete within themselves) and closed (impermeable from outside). No sieve on an entity type in organization A includes morphisms from organization B. Cross-organization access is topologically excluded.
Within a project, roles form an ordered chain: Owner ≥ Editor ≥ Viewer ≥ Discoverer. Each role defines a sub-sieve; higher roles see strictly more. Role checks become sieve inclusions rather than ad-hoc predicates.
A marking m (PII, PHI, classified, ...) generates a sub-topology Jm ⊂ J: the covering sieves are those where every path has clearance for m. Multiple markings compose by intersection: J ∩ Jm₁ ∩ Jm₂ ∩ …. Adding a new marking type extends the topology by another generator and composes with everything that already exists : no code changes.
For user u, a characteristic morphism χu : Φ(O) → Ω determines which rows are visible. Sheafification with respect to the user's effective topology includes this restriction: objects where χu evaluates to false are stripped. Typical policies ("only rows you own," "only rows in your region") become characteristic morphisms; the sheaf condition ensures consistency across paths.
Column-level security is a projection in Sch: the user sees Su(O) ↪ S(O). Columns outside Su are invisible. This composes orthogonally with row-level security : both are enforced by a single sheafification step.
The real payoff of the topological approach:
All security mechanisms : projects, organizations, roles, markings, row-level filters, column-level masks : combine into a single Grothendieck topology J. Sheafification with respect to J enforces all of them simultaneously. Adding a new mechanism means adding a generator to J; the interaction with existing mechanisms is handled by the closure properties of topologies.
Every query passes through a. One adjunction, one topology, one enforcement path. No per-case logic for "role X in project Y with marking Z."
Implementing a new security mechanism (a new marking, a new row filter, a new access model) requires extending J. No changes to the query pipeline or other mechanisms. The architecture is both extensible and compositional by construction.
Security is a Grothendieck topology on the schema: sieves are access paths, sheaves are policy-consistent states, and the adjunction a ⊣ i enforces the policy : compositionally, idempotently, minimally. All concrete mechanisms (projects, organizations, roles, markings, row/column filters) are topological data composing into a single J, with the secured topos Ontsec = Sh(OntSch, J).
Next: Component 6: The Lineage Category, provenance as a free category on a DAG.