The shape of the domain as a small category.
We start at the bottom. Every column in every table has a type, drawn from a fixed collection of primitives. These types come with a natural subtyping structure (integers embed into longs, dates refine to timestamps), and the lattice structure propagates upward into the rest of the framework.
Let Type be the category whose objects are primitive types:
Ob(Type) = {String, Integer, Long, Double, Boolean, Date, Timestamp, Binary, GeoPoint, GeoShape, TimeSeries, Array(τ), Struct({li : τi}), Attachment, Marking}
Morphisms are subtype inclusions and coercions: if τ1 is a subtype of τ2, there is a unique morphism τ1 → τ2. The familiar chains Integer → Long → Double and Date → Timestamp are examples. Because there is at most one morphism between any two objects, Type is a thin category.
The subtype ordering makes Type a bounded lattice: ⊤ = String (every type coerces to string) and ⊥ = Void (the empty type, initial object).
Type has all finite meets ∧ and joins ∨. The meet τ1 ∧ τ2 is the greatest common subtype; the join τ1 ∨ τ2 is the least common supertype. Since the type system is finite, these always exist. In practice, the join is what matters: when you merge two columns, the result type is τ1 ∨ τ2.
For any chain τ1 → τ2 → … → τk, the composite is unique. Thinness kills ambiguity: there is never more than one way to coerce along a chain.
Type compatibility is a preorder, coercion chains compose by transitivity, and you can use a value of type τ1 wherever τ2 is expected as long as τ1 → τ2 exists. Nothing deep : but getting it right once means we never have to think about it again.
A schema is just a list of columns with their types, and the categorical way to say this is surprisingly clean.
A schema is a functor S : n → Type, where n = {1, 2, …, n} is the discrete category on n objects. For each k ∈ n, S(k) = τk gives the type of column k. Equivalently:
S ≅ τ1 × τ2 × … × τn
A named schema is a pair (S, name) where name : n → Label gives each column a human-readable label.
The category Sch has:
Schema morphisms capture every structural transformation of tabular data: drop columns, rename them, cast types. Sch sits inside [FinSetop, Type] as a full subcategory.
A "Person" schema: S : 3 → Type with S(1) = String, S(2) = Integer, S(3) = Date, named (name, age, dob). In product form: String × Integer × Date.
Now we assemble the central object. The idea: entity types become objects, relationship types become morphisms, and the whole thing lives in a single small category.
The ontology schema category OntSch is a small category where:
Multiple morphisms between the same pair of objects are allowed : a person can be both an employee of and a shareholder in the same company : so OntSch is really a finite directed multigraph, viewed as a category.
Three distinct link types, some sharing codomains. Note that a path like Person to Company to Product is not composable when arrows point in different directions.
The triple (SO, pkO, titleO) is doing all the work: SO gives the table shape, pkO identifies the uniqueness constraint, and titleO picks the column used for display in UIs. That is everything you need to fully specify an entity type.
Each entity type carries a schema; that much is clear. But link types do not induce maps between schemas in any natural way. The right structure is a span.
The property functor S : Ob(OntSch) → Ob(Sch) sends each entity type O to its property schema SO. Note that S is only defined on objects : it does not extend to morphisms, because a link L : Oi → Oj tells you nothing about how the columns of Oi and Oj relate. What the link does specify is which columns participate in the join.
For a link type L : Oi → Oj, the foreign key is a span in Sch:
FKL is the key schema; π1 picks out the foreign key column(s) from the source; π2 embeds the primary key column(s) of the target.
For n:1 links, FKL is just the primary key type of Oj, living as a column in S(Oi). For m:n links, FKL is a separate join schema : potentially backed by its own dataset.
If we strip OntSch of composition and identities, keeping only the raw graph of entity types and link types, we get a quiver. The free category on this quiver recovers composition and gives us something extra: a universal property that pins down multi-hop traversal.
Let Q = (V, E, s, t) where V = Ob(OntSch), E = link types, and s, t assign source and target. The free category Path(Q) has objects V and morphisms = composable paths of link types, with composition by concatenation and the empty path as identity.
For any category C and graph morphism F : Q → U(C) (where U forgets composition), there exists a unique functor F̃ : Path(Q) → C extending F.
In plain terms: once you decide what each entity type and each link type mean in some target category, all multi-hop traversals are forced. You don't get to choose how two-hop paths behave; the universal property decides for you. Link traversal "just works."
Morphisms in Path(Q) are sequences L1 ∘ L2 ∘ … ∘ Lk with matching endpoints. Identity is the empty path. Composition is concatenation : nothing more.
The 1:1, 1:n, n:1, m:n annotations on links are more than database decoration : they constrain the shape of any presheaf inhabiting the schema, and the right language for this is enrichment.
For L : Oi → Oj with card(L):
We can view OntSch as enriched over the lattice 1:1 ≤ n:1 ≤ m:n (and 1:1 ≤ 1:n ≤ m:n). The cardinality determines the implementation: n:1 as a foreign key column, m:n as a join table, 1:1 as a subset isomorphism. These are structural consequences of the enrichment, not choices an engineer makes.
An n:1 link Order → Customer stores a customer_id column in the Order table. The reverse 1:n is computed (the set of orders per customer). An m:n link Student ↔ Course requires a join table with student_id and course_id. None of this is surprising, but notice that the category theory tells you which implementation is forced.
Ehresmann's notion of a sketch gives one last way to think about the schema: a sketch is a category with distinguished cones and cocones, and a model is a functor preserving them. The punchline is that models of OntSch-as-sketch are exactly the valid database states.
The sketch T = (OntSch, L, C) consists of:
A model of this sketch is a functor M : OntSch → Set that preserves the distinguished cones and cocones. The category of models is exactly the full subcategory of presheaves satisfying all referential integrity and cardinality constraints.
Mod(T) is equivalent to the full subcategory of [OntSchop, Set] consisting of presheaves that send distinguished limit cones to limits and distinguished colimit cocones to colimits. Referential integrity and cardinality constraints are not ad hoc checks : they are the limit/colimit preservation conditions.
This answers a question that sounds simple but is surprisingly hard to make precise: "What is a valid database state?" Answer: a model of the sketch. The sketch formulation is strictly more expressive than ER diagrams, since it can encode arbitrary limit/colimit conditions, not just keys and cardinalities.
The schema OntSch is a small category carrying: a property functor S on objects, cardinality annotations on morphisms, and foreign key specifications as spans. Its free path category generates multi-hop traversals by universal property. As a sketch, it determines exactly which presheaves are valid database states. There is no data here, only structure, and from this structure the entire relational backbone follows.
Next: Component 2: The Presheaf, pouring data into this skeleton.