Data as a contravariant functor on the schema.
The schema (Component 1) declares what kinds of things can exist. A presheaf fills that skeleton with actual data. One presheaf is one snapshot of the world as seen through the schema. Rows become set elements, foreign keys become linking functions, and functoriality keeps multi-hop navigation honest.
We need the general notion first. A presheaf on a category is a systematic assignment: each object receives a set, each morphism receives a set map, and these assignments respect composition. The one twist is that the maps go in the opposite direction.
A presheaf on a category C is a functor F : Cop → Set. Given f : A → B in C, the presheaf yields F(f) : F(B) → F(A) : note the reversal. Functoriality requires: (1) F(idA) = id, and (2) F(g ∘ f) = F(f) ∘ F(g). The second condition is contravariance: composition flips.
Consider a link L : Person → Company ("each person works at a company"). The covariant reading gives a function w : Φ(Person) → Φ(Company) : look up where someone works. The contravariant reading gives Φ(L) : Φ(Company) → Φ(Person) : for each company, return its employees. Both encode the same information. We choose contravariant because it makes the ambient category a topos (Component 3), which gives us limits, colimits, exponentials, and a subobject classifier for free.
An ontology instance is a presheaf Φ : OntSchop → Set.
On objects: for each entity type O, the set Φ(O) is the collection of all actual instances of that type. If O = Person, then Φ(Person) is every person currently in the system.
On morphisms: for each link type L : Oi → Oj, the function Φ(L) : Φ(Oj) → Φ(Oi) implements referential navigation. Covariant reading: wL : Φ(Oi) → Φ(Oj) is the foreign key lookup.
Functoriality: for composable links L₁ : O₁ → O₂ and L₂ : O₂ → O₃, we have Φ(L₂ ∘ L₁) = Φ(L₁) ∘ Φ(L₂). In other words, following a two-hop path in the schema corresponds to composing the linking functions in reverse order. Multi-hop traversals compose by construction.
The functoriality condition is doing more than it looks. It is saying: you can traverse links in any order, and as long as you end at the same place, you get the same answer. Without this, a query like "find all persons → companies → orders" could give different results depending on how you decompose the path. Functoriality rules that out.
Among all presheaves on OntSch, the representable ones play a distinguished role. They are the presheaves "generated by" a single entity type, and the Yoneda lemma connects them to the ontology in a way that is both deep and immediately useful.
For O ∈ Ob(OntSch), the representable presheaf is y(O) = OntSch(:, O) : OntSchop → Set. At each entity type O′, it returns the set of all link paths from O′ to O : every way to navigate to O.
For any presheaf Φ : OntSchop → Set and any entity type O:
Ont(y(O), Φ) ≅ Φ(O)
Natural transformations from the representable into Φ correspond bijectively to elements of Φ(O). To give a morphism from y(O) to Φ is to pick an object. The bijection is natural in both variables, so there are no arbitrary choices involved.
This is why typed APIs "just work." An SDK function that takes an entity type O as a type parameter and returns objects of that type is exactly a morphism y(O) → Φ. The lemma guarantees that such morphisms are in bijection with actual objects : so the SDK gives type-safe access to exactly the right data, no more and no less. Auto-generated SDKs in TypeScript or Python are, in categorical language, externalizations of the representable functor into those type systems.
Raw data does not arrive as a presheaf. Datasets are tables with rows and columns, stored in files or databases. The semantic functor is the bridge: it interprets tabular data as an ontology instance, mapping rows to objects and foreign keys to link functions.
Data has pairs (D, S) as objects : a dataset with its schema : and transforms f : (D₁, S₁) → (D₂, S₂) as morphisms: deterministic computations reading from D₁ and producing rows conforming to S₂. Composition is pipeline chaining; identity is the copy transform.
Sem : Data → [OntSchop, Set] maps the data layer into the ontology. For a dataset (D, S) backing entity type O: each row becomes an element of Φ(O), each column becomes a property, each foreign key column becomes a link function. The result is a presheaf Φ = Sem(D, S).
For each entity type O backed by dataset D, the component SemO : VD(t) ↪ Φ(O) is an injection on primary keys. One row becomes exactly one object. When D is the sole backing datasource, this is a bijection.
Faithfulness means the interpretation is deterministic : there is never ambiguity about which object a row produces. Naturality of Sem with respect to data transforms says that transforming raw data and then indexing into the ontology gives the same result as indexing first and then applying the induced ontology transition; the functor preserves the pipeline structure.
Data changes, and each dataset has a history of transactions : inserts, updates, deletes, snapshots. The presheaf structure extends naturally to track this: we put a presheaf on the transaction poset, getting a "versioned state" that varies over time.
For a dataset D, the transaction poset T(D) = (t₁ < t₂ < …) is the totally ordered set of all transactions on D. As a thin category, it has a unique morphism ti → tj whenever ti ≤ tj.
The versioned state of D is VD : T(D)op → Set, mapping each transaction t to the set of rows visible at that point. The restriction maps are induced by transaction semantics: SNAPSHOT replaces all rows, APPEND adds, UPDATE modifies, DELETE removes.
Composing with the semantic functor gives Sem ∘ VD : T(D)op → [OntSchop, Set] : a presheaf of presheaves. At each moment in time, you get a complete ontology snapshot. Temporal queries and point-in-time semantics fall out of this structure with no additional work.
The presheaf structure makes queries compositional. The fundamental operation is pullback along a link: given a subset of instances at one entity type, recover the instances at a linked type that point into that subset.
For L : O₁ → O₂ and a subset B ⊆ Φ(O₂), the pullback is L*(B) = {x ∈ Φ(O₁) | wL(x) ∈ B} : the preimage of B under the linking function.
"All persons at US companies with transactions over $1M." Using links Lemp : Person → Company and Ltxn : Person → Transaction, this decomposes as:
Each step is a pullback or a meet in the subobject lattice : all operations the presheaf structure guarantees exist.
Pullbacks along composable links satisfy L₂*(L₁*(B)) = (L₂ ∘ L₁)*(B), so multi-hop queries reduce to single pullbacks; filters form a Boolean algebra, pullbacks distribute over meets, and the query algebra is not hand-designed : it is derived from the category.
The presheaf Φ : OntSchop → Set is the data poured into the schema skeleton. Representables give type-safe APIs via Yoneda. Sem bridges raw data to the ontology, faithful on primary keys. Versioned presheaves track temporal state. Pullback queries compose by construction.
Next: Component 3: The Presheaf Category, the ambient topos in which Φ lives, and why it matters.