Which Set Theory?

#1 · November 9, 2021, 1:23 pm

It is often said that the Relational Model is based on logic and set theory.

I note that there is an "embarrassment" of set theories used by mathematicians. Wikipedia lists some 15 first order set theories https://en.wikipedia.org/wiki/List_of_first-order_theories#Set_theories

My question is: Is there any particular set theory that is considered the "best" one to use for the relational model?

If there is not one "best" one, are there pros/cons for the various theories?

For example, I note that in https://en.wikipedia.org/wiki/Finitist_set_theory

> The empty set `{}` has no members, and therefore there exists no such thing as `{}`

Which would have interesting consequences if that set theory was used for the (or is that a?) relational model.

Another example is the definition of an ordered pair as used in most(?) set theories. If the short version of Kuratowski definition was chosen rather than the "full" version, then a (relational model) heading could not have an attribute with the same type name and attribute name, nor could a (relational model) tuple have an attribute with the same attribute name and attribute value (or same type name and attribute value).

My real question here is, how solid is the foundation of the relational model if we don't say *which* set theory is used in the foundation? I could not find anything in the Alice book or TTM etc that mentions say ZFC, NFU or any of the more apparently "well known" set theories.

P.S. I am no mathematician, but nonetheless find this area interesting, and am interested in any insight or pointers to further (easy) reading this group could give.

P.P.S. Hi all by the way. I did post on this forum many moons ago, and have been a very part time follower of it over the years, but have been thinking more deeply about the relational model this past year.

Paul Vernon

#2 · November 9, 2021, 2:38 pm

I agree with your observation, that if the Set Theory, in general, and its axioms, in particular, are not mentioned in a Database Textbook, then it is likely not a genuine foundation.

The idea of the foundations been based on the Set Theory might have been carried over from Mathematics. Note, however that the Set Theory foundation for mathematics has lost its allure. Nowadays, there are Univalent Foundations, Category Theory, and probably many more.

A more technical question might be where in math do we encounter n-ary relations? The Relational Clones (also known as co-clones) in the Universal Algebra is one such fascinating topic. The fact that co-clones algebra is essentially relational algebra of conjunctive queries is really intriguing... Not to mention the entire philosophical framework where we have the operational/functional perspective of Clones on one side, and the co-clones of invariant relations on the other. With Galois Connection between them...

#3 · November 9, 2021, 8:35 pm

Quote from AntC on November 9, 2021, 8:35 pm

Quote from Paul Vernon on November 9, 2021, 1:23 pm

It is often said that the Relational Model is based on logic and set theory.

Hi Paul, welcome back from your furlough.

Yeah, that's the sort of claim you see in introductory texts. They then go on to introduce SQL, which is based on neither.

I note that there is an "embarrassment" of set theories used by mathematicians. Wikipedia lists some 15 first order set theories https://en.wikipedia.org/wiki/List_of_first-order_theories#Set_theories

My question is: Is there any particular set theory that is considered the "best" one to use for the relational model?

One that's easy to understand, and easy to derive implications from. We aren't in practice troubled by Russell's Paradox/Antinomy. Codd [1972]'s notion of 'domain independence' protects from that. If I need a specific theory (not often), I go with ZFC. We do need a type system -- this is more drawn from programming theory than abstract math.

If there is not one "best" one, are there pros/cons for the various theories?

For example, I note that in https://en.wikipedia.org/wiki/Finitist_set_theory

FST doesn't meet my desiderata for "easy to understand". Also: " designed for modeling finite nested structures " So not 'flat' relations. " Unlike classical set theories such as ZFC and KPU, FST is not intended to function as a foundation for mathematics, but only as a tool in ontological modeling." So not that. (Entity-Attribute modelling being the bane of our lives.)

The empty set `{}` has no members, and therefore there exists no such thing as `{}`

We require typed sets: an empty relation value of the Suppliers schema is distinct (incomparable) to an empty relation value of the Parts Schema. That is why TTM's RM Prescription 9 goes to great length to define a Heading, which gives the type for a relation value. IOW there is not just one empty relation. There is a participant who used to hang out here who would disagree.

Which would have interesting consequences if that set theory was used for the (or is that a?) relational model.

Definitely 'a' relational model. TTM's is distinct from Codd 1972 from other models in Codd's later works; also distinct from D.L. Childs'.

Another example is the definition of an ordered pair as used in most(?) set theories. If the short version of Kuratowski definition was chosen

The short Kuratowski definition is all we need for a TTM model: the pair of attribute-type within a Heading, a Heading is a set of them; the pair of attribute-value within a tuple, a tuple is a set of them. Attribute in those pairs is a primitive unanalyzable type.

rather than the "full" version, then a (relational model) heading could not have an attribute with the same type name and attribute name,

That we don't want. Type names must be distinct from attribute names. A type name might incorporate a Heading type including attribute names -- for a Relation-Valued Attribute or a Tuple-Valued Attribute -- but that's not an attribute name simpliciter.

nor could a (relational model) tuple have an attribute with the same attribute name and attribute value (or same type name and attribute value).

Attribute values must be distinct from type names.

My real question here is, how solid is the foundation of the relational model if we don't say *which* set theory is used in the foundation? I could not find anything in the Alice book or TTM etc that mentions say ZFC, NFU or any of the more apparently "well known" set theories.

The threat to soundness of relational models comes from allowing `Null` (and consequent two-and-a-half-value logic) or allowing duplicate rows in relation values. That's why SQL gives me the heebie-jeebies.

P.S. I am no mathematician, but nonetheless find this area interesting, and am interested in any insight or pointers to further (easy) reading this group could give.

Read the TTM Prescriptions, then read the motivation for them in DTATRM. Also look at Appendix A for a possible algebra that relies on relation values as well-behaved sets.

The Alice book rather takes its set theory for granted, but you might reflect on how much of it would fall apart if it allowed duplicate rows.

P.P.S. Hi all by the way. I did post on this forum many moons ago, and have been a very part time follower of it over the years, but have been thinking more deeply about the relational model this past year.

Quote from Paul Vernon on November 9, 2021, 1:23 pm

It is often said that the Relational Model is based on logic and set theory.

Hi Paul, welcome back from your furlough.

Yeah, that's the sort of claim you see in introductory texts. They then go on to introduce SQL, which is based on neither.

I note that there is an "embarrassment" of set theories used by mathematicians. Wikipedia lists some 15 first order set theories https://en.wikipedia.org/wiki/List_of_first-order_theories#Set_theories

My question is: Is there any particular set theory that is considered the "best" one to use for the relational model?

One that's easy to understand, and easy to derive implications from. We aren't in practice troubled by Russell's Paradox/Antinomy. Codd [1972]'s notion of 'domain independence' protects from that. If I need a specific theory (not often), I go with ZFC. We do need a type system -- this is more drawn from programming theory than abstract math.

If there is not one "best" one, are there pros/cons for the various theories?

For example, I note that in https://en.wikipedia.org/wiki/Finitist_set_theory

FST doesn't meet my desiderata for "easy to understand". Also: " designed for modeling finite nested structures " So not 'flat' relations. " Unlike classical set theories such as ZFC and KPU, FST is not intended to function as a foundation for mathematics, but only as a tool in ontological modeling." So not that. (Entity-Attribute modelling being the bane of our lives.)

The empty set `{}` has no members, and therefore there exists no such thing as `{}`

We require typed sets: an empty relation value of the Suppliers schema is distinct (incomparable) to an empty relation value of the Parts Schema. That is why TTM's RM Prescription 9 goes to great length to define a Heading, which gives the type for a relation value. IOW there is not just one empty relation. There is a participant who used to hang out here who would disagree.

Which would have interesting consequences if that set theory was used for the (or is that a?) relational model.

Definitely 'a' relational model. TTM's is distinct from Codd 1972 from other models in Codd's later works; also distinct from D.L. Childs'.

Another example is the definition of an ordered pair as used in most(?) set theories. If the short version of Kuratowski definition was chosen

The short Kuratowski definition is all we need for a TTM model: the pair of attribute-type within a Heading, a Heading is a set of them; the pair of attribute-value within a tuple, a tuple is a set of them. Attribute in those pairs is a primitive unanalyzable type.

rather than the "full" version, then a (relational model) heading could not have an attribute with the same type name and attribute name,

That we don't want. Type names must be distinct from attribute names. A type name might incorporate a Heading type including attribute names -- for a Relation-Valued Attribute or a Tuple-Valued Attribute -- but that's not an attribute name simpliciter.

nor could a (relational model) tuple have an attribute with the same attribute name and attribute value (or same type name and attribute value).

Attribute values must be distinct from type names.

My real question here is, how solid is the foundation of the relational model if we don't say *which* set theory is used in the foundation? I could not find anything in the Alice book or TTM etc that mentions say ZFC, NFU or any of the more apparently "well known" set theories.

The threat to soundness of relational models comes from allowing `Null` (and consequent two-and-a-half-value logic) or allowing duplicate rows in relation values. That's why SQL gives me the heebie-jeebies.

P.S. I am no mathematician, but nonetheless find this area interesting, and am interested in any insight or pointers to further (easy) reading this group could give.

Read the TTM Prescriptions, then read the motivation for them in DTATRM. Also look at Appendix A for a possible algebra that relies on relation values as well-behaved sets.

The Alice book rather takes its set theory for granted, but you might reflect on how much of it would fall apart if it allowed duplicate rows.

P.P.S. Hi all by the way. I did post on this forum many moons ago, and have been a very part time follower of it over the years, but have been thinking more deeply about the relational model this past year.

#4 · November 9, 2021, 10:40 pm

Quote from AntC on November 9, 2021, 8:35 pm

Quote from Paul Vernon on November 9, 2021, 1:23 pm

It is often said that the Relational Model is based on logic and set theory.

Another example is the definition of an ordered pair as used in most(?) set theories. If the short version of Kuratowski definition was chosen

The short Kuratowski definition is all we need for a TTM model: the pair of attribute-type within a Heading, a Heading is a set of them; the pair of attribute-value within a tuple, a tuple is a set of them. Attribute in those pairs is a primitive unanalyzable type.

I should add that the Kuratowski definition is not adequate to characterise rows in SQL -- especially rows in query results, which might have duplicate column names or unnamed columns. IOW queries can produce results that can't be stored back into the database. Bizarre.

SQL also at least used to support position-based access to columns: SELECT 1 2 3 FROM T. I've no idea how you'd model that as an ordered thingummy in set theory.

#5 · November 9, 2021, 11:00 pm

Quote from AntC on November 9, 2021, 11:00 pm

Quote from Vadim on November 9, 2021, 2:38 pm

The idea of the foundations been based on the Set Theory might have been carried over from Mathematics. Note, however that the Set Theory foundation for mathematics has lost its allure. Nowadays, there are Univalent Foundations, Category Theory, and probably many more.

Yeah Naa Naa. Vadim you're answering a question that was not put. First-order theory is adequate to describe a relational model -- that is to model relations. Even your fancy theories are not adequate, I suspect, to model the lump of quivering blancmange that is SQL.

We also stay within First Order Predicate Logic to model a Data Manipulation Algebra (such as A) up to <TCLOSE> -- as previous discussions here have elucidated. But to go to generalised Transitive Closure (RECURSIVE CTE) needs higher-order. When we embed the DMA in a general-purpose programming language (such as a D) then we have to worry about Turing Completeness/the Halting Problem/potentially infinite calculations like a DBMS serving requests for ever, which needs co-algebras and maybe Category Theory, the whole nine yards.

Category theorists are obsessed with higher-order functions (functions passed as arguments to functions), functions over types, functions from values to types (Dependent Programming), storing functions inside data structures (Existential quantification), extracting them and applying them to other fields in the data structure, and expecting to preserve type safety. Good luck to them, but modelling relations doesn't need to worry.

A more technical question might be where in math do we encounter n-ary relations? The Relational Clones (also known as co-clones) in the Universal Algebra is one such fascinating topic. The fact that co-clones algebra is essentially relational algebra of conjunctive queries is really intriguing... Not to mention the entire philosophical framework where we have the operational/functional perspective of Clones on one side, and the co-clones of invariant relations on the other. With Galois Connection between them...

I'd suspect none of that algebraic purity applies with SQL. I don't think sprinkling around Galois Connection magic dust is any help to the jobbing relationalist.

Quote from Vadim on November 9, 2021, 2:38 pm

The idea of the foundations been based on the Set Theory might have been carried over from Mathematics. Note, however that the Set Theory foundation for mathematics has lost its allure. Nowadays, there are Univalent Foundations, Category Theory, and probably many more.

Yeah Naa Naa. Vadim you're answering a question that was not put. First-order theory is adequate to describe a relational model -- that is to model relations. Even your fancy theories are not adequate, I suspect, to model the lump of quivering blancmange that is SQL.

We also stay within First Order Predicate Logic to model a Data Manipulation Algebra (such as A) up to <TCLOSE> -- as previous discussions here have elucidated. But to go to generalised Transitive Closure (RECURSIVE CTE) needs higher-order. When we embed the DMA in a general-purpose programming language (such as a D) then we have to worry about Turing Completeness/the Halting Problem/potentially infinite calculations like a DBMS serving requests for ever, which needs co-algebras and maybe Category Theory, the whole nine yards.

Category theorists are obsessed with higher-order functions (functions passed as arguments to functions), functions over types, functions from values to types (Dependent Programming), storing functions inside data structures (Existential quantification), extracting them and applying them to other fields in the data structure, and expecting to preserve type safety. Good luck to them, but modelling relations doesn't need to worry.

A more technical question might be where in math do we encounter n-ary relations? The Relational Clones (also known as co-clones) in the Universal Algebra is one such fascinating topic. The fact that co-clones algebra is essentially relational algebra of conjunctive queries is really intriguing... Not to mention the entire philosophical framework where we have the operational/functional perspective of Clones on one side, and the co-clones of invariant relations on the other. With Galois Connection between them...

I'd suspect none of that algebraic purity applies with SQL. I don't think sprinkling around Galois Connection magic dust is any help to the jobbing relationalist.

#6 · November 9, 2021, 11:21 pm

Vadim/AntC. Thanks for the replies.

I rather liked Elementary Set Theory with a Universal Set as linked from https://en.wikipedia.org/wiki/New_Foundations as it ticked the "easy to understand" box (well, not all of it, but then, as I said, I'm no mathematician)

P.S. I'm not (really very) interested in SQL :-)

#7 · November 9, 2021, 11:40 pm

Quote from AntC on November 9, 2021, 11:00 pm

I'd suspect none of that algebraic purity applies with SQL. I don't think sprinkling around Galois Connection magic dust is any help to the jobbing relationalist.

It is not about purity. It is about insight.

However, my hand waving warrants some explanation, or an example, at least. Consider the binary operation of addition, on one hand, and the binary operation "less than <", on the other. Here they are, conveniently organized in a table:

	u	v	u < v
x	1	2	true
y	1	7	true
x+y	2	9	true

The less than relation is compatible with (i.e. invariant to) the addition operation as evidenced by the bold Boolean value in the bottom right corner. I'm currently puzzled by the fact that the Functional Dependency {x,y} -> {x+y} induced by the addition operation is oriented the weird way vertically, as opposed to what we have in Database Dependency Theory...

#8 · November 10, 2021, 12:56 am

Quote from dandl on November 10, 2021, 12:56 am

Quote from AntC on November 9, 2021, 11:00 pm

Quote from Vadim on November 9, 2021, 2:38 pm

The idea of the foundations been based on the Set Theory might have been carried over from Mathematics. Note, however that the Set Theory foundation for mathematics has lost its allure. Nowadays, there are Univalent Foundations, Category Theory, and probably many more.

Yeah Naa Naa. Vadim you're answering a question that was not put. First-order theory is adequate to describe a relational model -- that is to model relations. Even your fancy theories are not adequate, I suspect, to model the lump of quivering blancmange that is SQL.

But a formal model of SQL (data and queries) is a necessary prerequisite for going beyond SQL.

We also stay within First Order Predicate Logic to model a Data Manipulation Algebra (such as A) up to <TCLOSE> -- as previous discussions here have elucidated. But to go to generalised Transitive Closure (RECURSIVE CTE) needs higher-order. When we embed the DMA in a general-purpose programming language (such as a D) then we have to worry about Turing Completeness/the Halting Problem/potentially infinite calculations like a DBMS serving requests for ever, which needs co-algebras and maybe Category Theory, the whole nine yards.

Not so.As I have shown here: http://www.andl.org/2021/07/formal-definitions-for-an-extended-relational-algebra/:

FOPL via set-builder notation can formally define SPJRUN (select project join rename union negate)

It can define extend and aggregate if you include App-A relcons, which are actually simple functions with a relational skin

It can define TCLOSE as the sum of an infinite series

It can define GTC using both the infinite series and relcon.

This is the definition of an Extended RA, which is still insufficient to define all the queries possible in SQL.

I'd suspect none of that algebraic purity applies with SQL. I don't think sprinkling around Galois Connection magic dust is any help to the jobbing relationalist.

I think it's both possible and necessary to model SQL. It's just too damn useful and pervasive to do otherwise.

Quote from AntC on November 9, 2021, 11:00 pm

Quote from Vadim on November 9, 2021, 2:38 pm

The idea of the foundations been based on the Set Theory might have been carried over from Mathematics. Note, however that the Set Theory foundation for mathematics has lost its allure. Nowadays, there are Univalent Foundations, Category Theory, and probably many more.

Yeah Naa Naa. Vadim you're answering a question that was not put. First-order theory is adequate to describe a relational model -- that is to model relations. Even your fancy theories are not adequate, I suspect, to model the lump of quivering blancmange that is SQL.

But a formal model of SQL (data and queries) is a necessary prerequisite for going beyond SQL.

We also stay within First Order Predicate Logic to model a Data Manipulation Algebra (such as A) up to <TCLOSE> -- as previous discussions here have elucidated. But to go to generalised Transitive Closure (RECURSIVE CTE) needs higher-order. When we embed the DMA in a general-purpose programming language (such as a D) then we have to worry about Turing Completeness/the Halting Problem/potentially infinite calculations like a DBMS serving requests for ever, which needs co-algebras and maybe Category Theory, the whole nine yards.

Not so.As I have shown here: http://www.andl.org/2021/07/formal-definitions-for-an-extended-relational-algebra/:

FOPL via set-builder notation can formally define SPJRUN (select project join rename union negate)
It can define extend and aggregate if you include App-A relcons, which are actually simple functions with a relational skin
It can define TCLOSE as the sum of an infinite series
It can define GTC using both the infinite series and relcon.

This is the definition of an Extended RA, which is still insufficient to define all the queries possible in SQL.

I'd suspect none of that algebraic purity applies with SQL. I don't think sprinkling around Galois Connection magic dust is any help to the jobbing relationalist.

I think it's both possible and necessary to model SQL. It's just too damn useful and pervasive to do otherwise.

Andl - A New Database Language - andl.org

#9 · November 10, 2021, 4:59 am

Quote from Vadim on November 9, 2021, 11:40 pm

Quote from AntC on November 9, 2021, 11:00 pm

I'd suspect none of that algebraic purity applies with SQL. I don't think sprinkling around Galois Connection magic dust is any help to the jobbing relationalist.

It is not about purity. It is about insight.

However, my hand waving warrants some explanation, or an example, at least. Consider the binary operation of addition, on one hand, and the binary operation "less than <", on the other. Here they are, conveniently organized in a table:

u v u < v

x 1 2 true

y 1 7 true

x+y 2 9 true

The less than relation is compatible with (i.e. invariant to) the addition operation as evidenced by the bold Boolean value in the bottom right corner. I'm currently puzzled by the fact that the Functional Dependency {x,y} -> {x+y} induced by the addition operation is oriented the weird way vertically, as opposed to what we have in Database Dependency Theory...

That's because you have a table trying to show two different relations orthogonally. For the + relation, you need {x, y, (x+y)} to be attributes (conventionally shown as columns -- but of course TTM deliberately abstracts away from a spatial organisation, precisely because thinking in columns and rows will confound you).

Not sure what you mean by less than being invariant. Throw in a few negative integers, that'll soon mess it up. For example { x (-5), y (+10), x+y (+5)}. Now you can't swizzle that into {u, v} columns such that the invariant holds.

#10 · November 10, 2021, 5:05 am

Quote from dandl on November 10, 2021, 12:56 am

Quote from AntC on November 9, 2021, 11:00 pm

Quote from Vadim on November 9, 2021, 2:38 pm

The idea of the foundations been based on the Set Theory might have been carried over from Mathematics. Note, however that the Set Theory foundation for mathematics has lost its allure. Nowadays, there are Univalent Foundations, Category Theory, and probably many more.

Yeah Naa Naa. Vadim you're answering a question that was not put. First-order theory is adequate to describe a relational model -- that is to model relations. Even your fancy theories are not adequate, I suspect, to model the lump of quivering blancmange that is SQL.

But a formal model of SQL (data and queries) is a necessary prerequisite for going beyond SQL.

No I don't want to "go beyond SQL". I want to go nowhere near so far as SQL. Specifically: no Null, no duplicate rows, no duplicate column names nor anonymous columns.

I want only relation values as sets-of-tuples with Headings per TTM.

TTM Forum

The Forum for Discussion about The Third Manifesto and Related Matters

Which Set Theory?