Component 2: The Presheaf

Data as a contravariant functor on the schema.

The schema (Component 1) declares what kinds of things can exist. A presheaf fills that skeleton with actual data. One presheaf is one snapshot of the world as seen through the schema. Rows become set elements, foreign keys become linking functions, and functoriality keeps multi-hop navigation honest.

1. Presheaves on a Category

We need the general notion first. A presheaf on a category is a systematic assignment: each object receives a set, each morphism receives a set map, and these assignments respect composition. The one twist is that the maps go in the opposite direction.

Definition 1.1 : Presheaf

A presheaf on a category C is a functor F : C^op → Set. Given f : A → B in C, the presheaf yields F(f) : F(B) → F(A) : note the reversal. Functoriality requires: (1) F(id_A) = id, and (2) F(g ∘ f) = F(f) ∘ F(g). The second condition is contravariance: composition flips.

Remark 1.2 : Why contravariant?

Consider a link L : Person → Company ("each person works at a company"). The covariant reading gives a function w : Φ(Person) → Φ(Company) : look up where someone works. The contravariant reading gives Φ(L) : Φ(Company) → Φ(Person) : for each company, return its employees. Both encode the same information. We choose contravariant because it makes the ambient category a topos (Component 3), which gives us limits, colimits, exponentials, and a subobject classifier for free.

2. The Ontology Presheaf

Definition 2.1 : Ontology Instance

An ontology instance is a presheaf Φ : OntSch^op → Set.

On objects: for each entity type O, the set Φ(O) is the collection of all actual instances of that type. If O = Person, then Φ(Person) is every person currently in the system.

On morphisms: for each link type L : O_i → O_j, the function Φ(L) : Φ(O_j) → Φ(O_i) implements referential navigation. Covariant reading: w_L : Φ(O_i) → Φ(O_j) is the foreign key lookup.

Functoriality: for composable links L₁ : O₁ → O₂ and L₂ : O₂ → O₃, we have Φ(L₂ ∘ L₁) = Φ(L₁) ∘ Φ(L₂). In other words, following a two-hop path in the schema corresponds to composing the linking functions in reverse order. Multi-hop traversals compose by construction.

Remark 2.2

The functoriality condition is doing more than it looks. It is saying: you can traverse links in any order, and as long as you end at the same place, you get the same answer. Without this, a query like "find all persons → companies → orders" could give different results depending on how you decompose the path. Functoriality rules that out.

3. Representable Presheaves and Yoneda

Among all presheaves on OntSch, the representable ones play a distinguished role. They are the presheaves "generated by" a single entity type, and the Yoneda lemma connects them to the ontology in a way that is both deep and immediately useful.

Definition 3.1 : Representable Presheaf

For O ∈ Ob(OntSch), the representable presheaf is y(O) = OntSch(:, O) : OntSch^op → Set. At each entity type O′, it returns the set of all link paths from O′ to O : every way to navigate to O.

Theorem 3.2 : Yoneda Lemma

For any presheaf Φ : OntSch^op → Set and any entity type O:

Ont(y(O), Φ) ≅ Φ(O)

Natural transformations from the representable into Φ correspond bijectively to elements of Φ(O). To give a morphism from y(O) to Φ is to pick an object. The bijection is natural in both variables, so there are no arbitrary choices involved.

Remark 3.3

This is why typed APIs "just work." An SDK function that takes an entity type O as a type parameter and returns objects of that type is exactly a morphism y(O) → Φ. The lemma guarantees that such morphisms are in bijection with actual objects : so the SDK gives type-safe access to exactly the right data, no more and no less. Auto-generated SDKs in TypeScript or Python are, in categorical language, externalizations of the representable functor into those type systems.

4. The Semantic Functor

Raw data does not arrive as a presheaf. Datasets are tables with rows and columns, stored in files or databases. The semantic functor is the bridge: it interprets tabular data as an ontology instance, mapping rows to objects and foreign keys to link functions.

Definition 4.1 : The Data Category

Data has pairs (D, S) as objects : a dataset with its schema : and transforms f : (D₁, S₁) → (D₂, S₂) as morphisms: deterministic computations reading from D₁ and producing rows conforming to S₂. Composition is pipeline chaining; identity is the copy transform.

Definition 4.2 : Semantic Functor

Sem : Data → [OntSch^op, Set] maps the data layer into the ontology. For a dataset (D, S) backing entity type O: each row becomes an element of Φ(O), each column becomes a property, each foreign key column becomes a link function. The result is a presheaf Φ = Sem(D, S).

Proposition 4.3 : Faithfulness on Primary Keys

For each entity type O backed by dataset D, the component Sem_O : V_D(t) ↪ Φ(O) is an injection on primary keys. One row becomes exactly one object. When D is the sole backing datasource, this is a bijection.

Faithfulness means the interpretation is deterministic : there is never ambiguity about which object a row produces. Naturality of Sem with respect to data transforms says that transforming raw data and then indexing into the ontology gives the same result as indexing first and then applying the induced ontology transition; the functor preserves the pipeline structure.

5. Versioned Presheaves

Data changes, and each dataset has a history of transactions : inserts, updates, deletes, snapshots. The presheaf structure extends naturally to track this: we put a presheaf on the transaction poset, getting a "versioned state" that varies over time.

Definition 5.1 : Transaction Poset

For a dataset D, the transaction poset T(D) = (t₁ < t₂ < …) is the totally ordered set of all transactions on D. As a thin category, it has a unique morphism t_i → t_j whenever t_i ≤ t_j.

Definition 5.2 : Versioned State

The versioned state of D is V_D : T(D)^op → Set, mapping each transaction t to the set of rows visible at that point. The restriction maps are induced by transaction semantics: SNAPSHOT replaces all rows, APPEND adds, UPDATE modifies, DELETE removes.

Composing with the semantic functor gives Sem ∘ V_D : T(D)^op → [OntSch^op, Set] : a presheaf of presheaves. At each moment in time, you get a complete ontology snapshot. Temporal queries and point-in-time semantics fall out of this structure with no additional work.

6. Pullback Queries

The presheaf structure makes queries compositional. The fundamental operation is pullback along a link: given a subset of instances at one entity type, recover the instances at a linked type that point into that subset.

Definition 6.1 : Pullback Along a Link

For L : O₁ → O₂ and a subset B ⊆ Φ(O₂), the pullback is L*(B) = {x ∈ Φ(O₁) | w_L(x) ∈ B} : the preimage of B under the linking function.

Example 6.2 : Concrete Query Decomposition

"All persons at US companies with transactions over $1M." Using links L_emp : Person → Company and L_txn : Person → Transaction, this decomposes as:

Filter: B₁ = {c ∈ Φ(Company) | country(c) = "US"}.
Filter: B₂ = {t ∈ Φ(Transaction) | amount(t) > 1M}.
Pull back: L_emp*(B₁) and L_txn*(B₂).
Intersect.

Each step is a pullback or a meet in the subobject lattice : all operations the presheaf structure guarantees exist.

Remark 6.3

Pullbacks along composable links satisfy L₂*(L₁*(B)) = (L₂ ∘ L₁)*(B), so multi-hop queries reduce to single pullbacks; filters form a Boolean algebra, pullbacks distribute over meets, and the query algebra is not hand-designed : it is derived from the category.

Summary

The presheaf Φ : OntSch^op → Set is the data poured into the schema skeleton. Representables give type-safe APIs via Yoneda. Sem bridges raw data to the ontology, faithful on primary keys. Versioned presheaves track temporal state. Pullback queries compose by construction.

Next: Component 3: The Presheaf Category, the ambient topos in which Φ lives, and why it matters.