The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Relational Lattice: 2019 Report Card

PreviousPage 2 of 2
Quote from AntC on January 5, 2020, 12:06 pm

Thanks yes, I've discovered that the official term for what I called 'powerset' is Boolean Algebra. A Boolean Algebra is a 'relatively complemented' lattice aot, as it shows here. So no we don't need a unary complement; a relative complement will do.

The desire to have an operation which interpretation doesn't raise any attribute set dependence concerns is understandable. From algebraic perspective, however, the binary relative complement operation is more complicated compared to the unary complement. When Prover9 manipulates expressions in the search space, it definitely has to do more work with binary operation, as compared to unary one -- that is just elementary combinatorics.

The computational complexity limits of theorem prover (that you also stumbled upon in your last posting) directed me towards some soul-searching; more on that at the message epilog...

And in fact the Litak axioms provide enough (nearly). Here's a couple of theorems proven from Litak et al axioms

% interval [x ^ R00, x] (unique relative complement) proven from Litak.
x ^ y = y & y ^ (x ^ R00) = (x ^ R00)
& y ^ z1 = (x ^ R00) & y v z1 = x
& y ^ z2 = (x ^ R00) & y v z2 = x
-> z1 = z2.

% heading subinterval [x ^ R00, R00] unique relative complement.
% proven from Litak.
R00 ^ y = y & y ^ (x ^ R00) = (x ^ R00)
& y ^ z1 = (x ^ R00) & y v z1 = R00
& y ^ z2 = (x ^ R00) & y v z2 = R00
-> z1 = z2.

There is also artistic part of me enjoying how Boolean algebra axioms look (e.g. De Morgan's law). I simply didn't come across anything remotely as nice expressed in terms of relative complement. But keep on pushing, maybe there are some gems out there!

Why deemphasizing Boolean algebra/Power set structure on attributes & tuples? In section 2.3 of the Litak&Mikulas&Hidders paper  the authors proposed a generic recipe how to manufacture relational lattices out of other mathematical structures. Followed their idea I pondered what might be the relational lattice where attributes are organized into the lattice of all linear subspaces. Tadeusz, however, was not very impressed by "...the fact that a class of mathematical objects [relational lattices] can be found in hindsight in various mathematical disguises...".

 

I'm not interested in mathematical disguises. Most of the Litak et al papers seem to be written in a foreign language. They don't need any more disguising to be inscrutable. I'm interested in being able to reason about queries. If you're going to throw out the Boolean Algebra, that'll make it even less tractable than I'm finding already. So no, thanks.

Litak has written an essay of his contribution to the logic field. The field is quite extensive and deep, while the author admits that the gap between the research frontier and teaching has grown to uncomfortable proportions. Mastering all these skills requires serious commitment and time that not many of us are willing to sacrifice.

From my perspective, however, [mathematical] logic has negligible impact in natural sciences. Set theory, even less so. Consequently, the utility of databases in science is dubious, as they are essentially glorified storage engines.  The culprit, in my opinion, is that the nature invented a very flexible attribute naming scheme. Linear spaces permeate physics, but the choice of basis vectors is completely arbitrary. The results that one obtains with a particular basis are immune to the basis change. This is something that relational model lacks, and consequently has a very poor modelling power for natural phenomena. Having said that, however, there are serious research efforts combining relational model with linear algebra...

Quote from Vadim on January 6, 2020, 8:05 pm
Quote from AntC on January 5, 2020, 12:06 pm

Thanks yes, I've discovered that the official term for what I called 'powerset' is Boolean Algebra. A Boolean Algebra is a 'relatively complemented' lattice aot, as it shows here. So no we don't need a unary complement; a relative complement will do.

The desire to have an operation which interpretation doesn't raise any attribute set dependence concerns is understandable. From algebraic perspective, however, the binary relative complement operation is more complicated compared to the unary complement. When Prover9 manipulates expressions in the search space, it definitely has to do more work with binary operation, as compared to unary one -- that is just elementary combinatorics.

Hmm? Despite my ugly axioms, I'm getting sub-1 minute proofs for complements, see below.

The computational complexity limits of theorem prover (that you also stumbled upon in your last posting) directed me towards some soul-searching; more on that at the message epilog...

There is also artistic part of me enjoying how Boolean algebra axioms look (e.g. De Morgan's law). I simply didn't come across anything remotely as nice expressed in terms of relative complement. But keep on pushing, maybe there are some gems out there!

But DeMorgan's law doesn't work within a Relational Lattice, because absolute complement (defined the usual lattice way) is ambiguous for empty relations. Perhaps you'd like gems like these (defining absolute complement the usual lattice way):

  • The absolute complement of a non-empty relation is empty;
  • The absolute complement of a non-empty relation is uniquely determined;
  • The absolute complement of an empty relation is non-empty;
  • The double-absolute complement of an empty relation is identity.

They're all theorems, proven from the Litak axioms.

The trouble with your complement corresponding to Appendix A <NOT> is that it needs mention of R11 -- the domain-dependent relation with all attributes and all possible tuples for that heading; and R11 cannot be fixed any better than R00. Which is to say you can define R11, R00 together via your Fundamental Decomposition (identity) Axioms, but you just get 2 undefined elements. The trouble with a complement over headings is that to complement an empty relation you need a non-empty result (but it doesn't matter what content); to make it determinate you could stipulate maximum possible content; then that's domain-dependent and needs R11.

 

Litak has written an essay of his contribution to the logic field. The field is quite extensive and deep, while the author admits that the gap between the research frontier and teaching has grown to uncomfortable proportions. Mastering all these skills requires serious commitment and time that not many of us are willing to sacrifice.

Lattice theory has been claimed to be 'the new dawn' for decades (at least since Birkhoff). It also gets 'rediscovered' as the new dawn repeatedly. I can explain that: the texts are so obtusely written as to be useless. It's easier to reinvent the wheel (whereupon you think you've discovered something) than to figure out what the texts are saying. Very likely that's what I'm doing with Relational Lattices: I've dipped into the Litak et al papers a bit more, and I discover (for example) them talking about 'Boolean Algebras'. I remain in the dark as to what they're saying about them or whether it's useful.

There's a similar obtuseness with Category Theory's application to computer science. I think Monads would have remained a backwater without Phil Wadler's breadth of vision to see how they solved the type-safety and encapsulation difficulties behind Continuation-passing style (which had been folk art up to the late 1980's). Now Haskellers are writing whole libraries full of Category-theoretic abstractions.

From my perspective, however, [mathematical] logic has negligible impact in natural sciences. Set theory, even less so.

Huh?

Consequently, the utility of databases in science is dubious, as they are essentially glorified storage engines.

If a database isn't a storage engine, it's useless to anybody. I dispute that SQL (if that's what you mean) makes anything glorified: quite the reverse. NoSQL usually makes the engine worse than SQL. Most practicing scientists seem to think that Excel is a storage engine. Isn't something like Datalog applying mathematical logic to natural sciences? It is a storage engine amongst other things.

  The culprit, in my opinion, is that the nature invented a very flexible attribute naming scheme. Linear spaces permeate physics, but the choice of basis vectors is completely arbitrary. The results that one obtains with a particular basis are immune to the basis change. This is something that relational model lacks, and consequently has a very poor modelling power for natural phenomena. Having said that, however, there are serious research efforts combining relational model with linear algebra...

"An L[inear]A[lgebra]-oriented system may emphasize matrix operations in the programming interface but require awkward gymnastics with column and
row indices to implement even simple SPJ queries. A relational system, in contrast, obscures matrix properties suitable for optimization and algorithm selection."

Sounds like they want Childs's Indexed Sets, which are very much applying a matrix approach to Relationally-modelled data. "The object of Lara is the Associative Table, a data structure capturing core properties of relations, tensors, and keyvalues" yep it's Childs. Saying "awkward gymnastics" sounds like the authors think SQL is a "relational system". "[blah, blah] and no RDBMS applies this optimization." yep it's SQL, which allows Nulls into the database and then has to abandon logic.

Vadim, your views on Relational Lattice are so far out of whack with disciplines I expect in applying mathematical logic to natural science/data science, that I see this as more blowing smoke pseudo-science. The Relational Model provides a perfectly general abstraction for representing any structured data at all. That's not to say a particular implementation of the RM is the most efficient/ergonomic for all data retrieval/data analysis tasks, particularly if you're trying to extract patterns and inferences 'in the large' from high-volume semi-structured data.

Quote from AntC on January 7, 2020, 6:04 am
Quote from Vadim on January 6, 2020, 8:05 pm

Consequently, the utility of databases in science is dubious, as they are essentially glorified storage engines.

If a database isn't a storage engine, it's useless to anybody. I dispute that SQL (if that's what you mean) makes anything glorified: quite the reverse. NoSQL usually makes the engine worse than SQL. Most practicing scientists seem to think that Excel is a storage engine.

Excel is a storage engine. And it's an easy storage engine. Almost everything else, SQL, NoSQL, Tableau, Qlikview, Hadoop, you-name-it is, for anyone but a theoretician or computer power-user -- which most of the world is not interested in becoming -- is too damn hard.

Even Excel is too damn hard, but most scientists or other technical people have achieved some notional facility with it, i.e., they can open a spreadsheet or CSV file, change values in cells, maybe create a pivot table or a graph, and Python is also making certain inroads here, in precisely the same way that BASIC (both original and Microsoft's "Visual" variants) once did. But it's all hard. Very hard. It's rather disheartening to see how much intelligent users outside our technical domains struggle to do even the simplest things.

Last year and the year before I was the "database expert" on a data science project, which largely meant acting as a go-between between a data provider's DBA and our data analyst. The DBA advised me that we were granted access to a certain SQL DBMS as long as we didn't export its data except in aggregate or summary form. Graphs and statistics with anonymised headings, fine; downloading/exposing/saving raw data in any form, no, absolutely not. Raw data could be held (briefly) in volatile storage -- RAM -- but nowhere else.

I explained that to the analyst, told him that he'd have to pull data in real time via ODBC, analyse it with whatever he liked, but not save any source data anywhere and only emit summaries which the DBA and I would check for anonymity.

"Excellent," he said, "ODBC is fine; I used to know SQL and used it all the time. Can you get me a CSV file of all the data?"

Me: "Eh? No. It's 128 tables and probably a terabyte of data and we're not allowed to store it on anything. You have to retrieve it in real-time via ODBC and analyse it in RAM."

Him: "Ah, I understand. Yes. Good. When can you get me the CSV file?"

Me: "You can't have it. It's too big and too many and it's not allowed. You've got ODBC access -- I'll create the queries, if you like; tell me what you need -- and you can analyse it with whatever you like."

Him: "I see. I understand. I get it now. Can you put the CSV file on this USB stick?"

Me: "... .. ! .. ."

I couldn't really fault the guy; he's a scientific statistician, not a technical person.

The problem with almost all database technology is that it's all too technical for most people, not just statisticians (actually, they're a pretty bright bunch), but even technical programmers. A certain proportion of NoSQL (or Excel, or -- per my colleague above -- CSV and no, I don't know what he intended to use to read the CSV file) advocacy comes from otherwise-competent people who don't "get" SQL. It's not that they don't like SQL or balk at its flaws and limitations; they don't even get that far. They don't understand it.

I don't see anything in this or similar topics to suggest how relational lattice theory will address that conceptual/technical hurdle, because that is a bigger problem -- in many practical respects -- than any theoretical limitation of the relational (or any other) model.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on January 7, 2020, 10:40 am
Quote from AntC on January 7, 2020, 6:04 am
Quote from Vadim on January 6, 2020, 8:05 pm

Consequently, the utility of databases in science is dubious, as they are essentially glorified storage engines.

If a database isn't a storage engine, it's useless to anybody. I dispute that SQL (if that's what you mean) makes anything glorified: quite the reverse. NoSQL usually makes the engine worse than SQL. Most practicing scientists seem to think that Excel is a storage engine.

Excel is a storage engine. ... [database technology] They don't understand it.

Thanks Dave, This isn't a technology issue; it's a mental picture issue: what these people don't understand is that data is structured/how data is structured/why structured data might have 128 tables as opposed to one big spreadsheet with heaps of repetitions (and probably anomalies).

I don't see anything in this or similar topics to suggest how relational lattice theory will address that conceptual/technical hurdle, because that is a bigger problem -- in many practical respects -- than any theoretical limitation of the relational (or any other) model.

Relational Lattice theory, as I'm investigating it, is a 'drop in' replacement for Appendix A. So no it doesn't address the conceptual hurdles. Your user-facing D (with a Relational Lattice back end) would probably look just like any other D. Neither do Lattices lift any theoretical limitations (indeed they seem to come with some extra limitations); they're another mindset for thinking about how to combine the structure in your 128 tables.

I don't know what Vadim (or Tadeusz) are speculating about. I don't think we need fancier theory; we need to make the current theory work. But since it doesn't work (and Tadeusz et al's papers seem to have an explanation why it could never work -- again to the extent I understand the forest of terminology); I'm pretty much abandoning it after this brain dump.

Quote from dandl on January 5, 2020, 7:38 am
  • The importance of Natural JOIN/Appendix A <AND> as a powerful operation amongst relations. NatJOIN is closed over relations, whereas Codd's Cartesian product and his 'union-compatible' operators are partial functions.
  • The natural dual of that, Inner UNION, aka ISBL UNION aka SQL UNION CORRESPONDING as a closed-over-relations counterpart of Codd's UNION, avoiding being restricted to only 'union-compatible' operands. I think this is just as valid a generalisation as Appendix A's <OR>; has the merit of being domain-independent; and captures the effect of projecting a relation on the heading of another, which leads on to ...
  • Treating relation values as 'standing for' their headings, thereby avoiding introducing into the algebra a need for attribute names or sets thereof as an additional syntactic or semantic construct. Critical to getting this to work is ...
  • The importance of a distinguished relation value denoting the relation with an empty heading and no tuples [RM Pro 5], that is DUM aka Tropashko's R00. Then relation r NatJOIN DUM gives a relation same heading as r but empty; r NatJOIN DUM = r is a test for r being empty; r InnerUNION DUM = DUM is another test; r InnerUNION DUM = DEE is a test for r being non-empty; (r InnerUNION DUM) NatJOIN DUM = r NatJOIN DUM is a test for r having an empty heading ; r NatJOIN DUM = s NatJOIN DUM is a test for r, s having the same heading; r InnerUNION (s NatJOIN DUM) gives r projected on the attributes in common with s, but ignoring s's content.

I  like these a lot, except for one problem. Where do relation values come from? How do you express the idea that attribute names form a set? How do you express relation value equality, so as to avoid duplicates?

The natural numbers can be generated from 0 and +1. Do you have something like that in mind? Please show how.

OK let's do this, just for fun; and to show off my growing understanding of a Boolean Algebra embedded in a sub-Lattice -- what Litak et al call a 'fibre'.

My counterpart to 0 is ... DUM: the empty set of tuples with an empty heading. Although it's sets all the way down, I should point out these sets are a multi-sorted ontology/logic:

  • A relation body is a set of tuples, which is a different sort to:
  • A heading is a set of <A, T> ordered pairs, which is a different sort to:
  • A tuple is a set of <A, v> ordered pairs (or we might say <A, T, v> triples but where the T is implicit/inferred from the v: each v belongs to a single type RM Pre 2), which is a different sort to:
  • A ordered pair is actually a two-element set (using the Kuratowski definition{A, {A, T}} or (different sort) {A, {A, v}, where we identify the attribute A as the element that appears at both outer and nested levels.
  • Attribute names are a different sort. We assume a supply of attribute names, with an ability to test two attribute names for equality.
  • We assume a supply of types (whose names might be arbitrarily complex type expressions) and values within those types. We assume nothing about types/values more than an ability to test two type names for equality [RM Pre 1]; and an ability to test two values for equality [RM Pre 8], providing those values are drawn from the same type. 'Types are orthogonal to model'.

 

  • We assume NatJOIN/Appendix A <AND> defined using setbuilder, as per the FORMAL DEFINITIONS. (Needs a bit of adjusting for the Kuratowski ordered pairs: exercise for the reader.) Note the setbuilder observes the sort of the components, and applies equality tests only between components of the same sort.
  • We assume InnerUNION defined in similar setbuilder style. (Another exercise for the reader, but there's examples in Vadim's papers and Litak et al.) Again the setbuilder observes sort-hygiene.
  • Then equality testing between relations or between headings or between tuples is derived from equality testing between elements of a set: ZFC axiom of extensionality.
  • Note that both operators take as operands 2 sets each of tuples of the same heading (relations), but each operand potentially of a different heading, and produce as results sets of tuples of the same heading (relations).

OK Starting from DUM, I have two counterparts to +1:

  • Add an attribute: pick a name from the supply of attribute names; pick a type from the supply of types (that gives a degree 1 heading); form the empty relation of degree 1. I.e. one relation for each attribute name. Make sure to form three empty relations with headings {X INT}, {Y INT}, {Z INT}.
  • Add a value: from the degree-1 empty relation, pick each value from that attribute's type; form a singleton degree-1 relation for each value.

And we're done -- oh apart from the let-all-hell-break-loose stage (David pay attention, this is where "relation values come from"):

  • Take the empty degree-1 relations and pairwise NatJOIN them; join the joins (i.e. transitive closure); etc carry on until you reach a fixed point of generating no more relation values. (This gives a Boolean Algebra structure of empty relations.)
  • Take any two singleton degree-1 relations with different attribute names. InnerUNION them, that gives DEE.
  • More generally, take the singleton degree-1 relations of the same attribute name and pairwise InnerUNION them; union the unions (i.e. transitive closure); etc carry on until you reach a fixed point of generating no more relation values. (This gives a Boolean Algebra structure of degree-1 relations for each distinct attribute name.)
  • OK you see the game? Take pairwise any/all of the relations so far generated; both NatJOIN them and InnerUNION them; join the joins and union the unions; and join the unions; and union the joins; ... (i.e. transitive closure); ... carry on until you reach a fixed point of generating no more relation values.
  • That process will generate a lot of duplicates, simply discard them using the test for equality between relations.
  • That process will generate lots of relations with heading {X INT, Y INT, Z INT}. Amongst them will be a relation with all possible INT values for X, Y and a INT value for Z equal the sum of X, Y. Mark that one PLUS. (This is a little like the identical-looking, random-filled books in Borges' 'Library of Babel'. There will be many 'false PLUS' es with one error in Z, or two errors ..., or missing one combo for X, Y, or two combos ... That's why David should be careful to mark the true one. But don't throw the erroneous ones out, they might be useful for some other formula.)

Now our useful relational operators for our user-facing D will be defined in terms of NatJOIN, InnerUNION, DUM. Since we've generated all possible relation values using all combinations of those components, we must have produced every possible query result (in abstract mathematical infinite time, of course). So no real-life query could take as operand nor produce as result anything outside the relations we have. QED.

I think that answers the points below. The algebra just treats those relation values as already available; it doesn't need to talk about headings/tuples/attributes because that all got dealt to by construction.

I've put that last bullet at some length to demonstrate the sort of algebraic and equational reasoning about programs/expressions that I was hoping for. Contrast that the reasoning and equivalence of operators in Codd's Algebra (or sets of operators derived from it as used within SQL engines) is much more ad hoc and informal; and hung around with many hand-wavy side conditions like: these two expressions are equivalent providing r, s have no attributes in common. We can express that condition in the Lattice algebra: (r InnerUNION s) NatJOIN DUM = DUM; we can't express it using 'traditional' RA operators.

These operators also have nice properties for heading-inference over expressions (i.e. type inference); whereas Appendix A or Tutorial D operators using (sets of) attribute names really have to be limited to name literals: reasoning over name variables (if it were allowed) would be intractable.

Then if the only entities in the domain of discourse of the algebra is relations (not sets of attributes, not attribute names nor values); and the semantics for NatJOIN/InnerUNION have built-in tests for same attribute names and same attribute values; we can treat other scalar operators as relations in the style of Appendix A PLUS, and expressions in the algebra need only talk about whole-relations.

That's hand-wavy. I don't think you can do that in the same style as the above.

Then we can use FOPL to express the logic of our relational expressions, and reason about their properties in FOPL, and use theorem provers to check our reasoning:

  • The FOPL quantifies over relations (and only those).
  • NatJOIN, InnerUNION, NOTMATCHING, PROJECT, REMOVE, etc are operations/functions over relations returning relations.
  • Equivalences between operators we can express as FOPL equivalences over terms invoking those operators: r NOTMATCHING (r NOTMATCHING s) = r MATCHING sr COMPOSE s = (r NatJOIN s) REMOVE (r InnerUNION s), r REMOVE (r REMOVE s) = r PROJECT s are equivalences give in Appendix A but needing a great deal of handwaving and scare-quotes.
  • We take FOPL with equality = between relations, as I've already assumed above.
  • We express subset comparisons between relations using the relational operators: r ⊆ s ≡ (r NatJOIN DUM = s NatJOIN DUM) & r NatJOIN s = r.

As I've said before, I don't think you can create new relation values within FOPL. You would have to show how, in some detail.

 

It's the NatJOIN/InnerUNION that create relation values; they're functions/operators within the FOPL. "create"? Arguably they're merely selecting a value already generated via the above transitive closures.

The natural numbers can be generated from 0 and +1. Do you have something like that in mind? Please show how.

OK let's do this, just for fun; and to show off my growing understanding of a Boolean Algebra embedded in a sub-Lattice -- what Litak et al call a 'fibre'.

Excellent! But I don't propose to acquire that knowledge. Hopefully this will not depend on it.

My counterpart to 0 is ... DUM: the empty set of tuples with an empty heading. Although it's sets all the way down, I should point out these sets are a multi-sorted ontology/logic:

  • A relation body is a set of tuples, which is a different sort to:
  • A heading is a set of <A, T> ordered pairs, which is a different sort to:
  • A tuple is a set of <A, v> ordered pairs (or we might say <A, T, v> triples but where the T is implicit/inferred from the v: each v belongs to a single type RM Pre 2), which is a different sort to:

OK.

  • A ordered pair is actually a two-element set (using the Kuratowski definition{A, {A, T}} or (different sort) {A, {A, v}, where we identify the attribute A as the element that appears at both outer and nested levels.

I have no idea what this means. I hope it doesn't matter too much.

  • Attribute names are a different sort. We assume a supply of attribute names, with an ability to test two attribute names for equality.
  • We assume a supply of types (whose names might be arbitrarily complex type expressions) and values within those types. We assume nothing about types/values more than an ability to test two type names for equality [RM Pre 1]; and an ability to test two values for equality [RM Pre 8], providing those values are drawn from the same type. 'Types are orthogonal to model'.

OK. We need a type system for attributes: a type is a named set of values. We do not need the whole TTM machinery to go with it, so let's not.

  • We assume NatJOIN/Appendix A <AND> defined using setbuilder, as per the FORMAL DEFINITIONS. (Needs a bit of adjusting for the Kuratowski ordered pairs: exercise for the reader.) Note the setbuilder observes the sort of the components, and applies equality tests only between components of the same sort.
  • We assume InnerUNION defined in similar setbuilder style. (Another exercise for the reader, but there's examples in Vadim's papers and Litak et al.) Again the setbuilder observes sort-hygiene.
  • Then equality testing between relations or between headings or between tuples is derived from equality testing between elements of a set: ZFC axiom of extensionality.
  • Note that both operators take as operands 2 sets each of tuples of the same heading (relations), but each operand potentially of a different heading, and produce as results sets of tuples of the same heading (relations).

No problem. Why muddy it with K, V and L?

OK Starting from DUM, I have two counterparts to +1:

  • Add an attribute: pick a name from the supply of attribute names; pick a type from the supply of types (that gives a degree 1 heading); form the empty relation of degree 1. I.e. one relation for each attribute name. Make sure to form three empty relations with headings {X INT}, {Y INT}, {Z INT}.
  • Add a value: from the degree-1 empty relation, pick each value from that attribute's type; form a singleton degree-1 relation for each value.

Fine so far, but the XYZ stuff is hokey. Later.

And we're done -- oh apart from the let-all-hell-break-loose stage (David pay attention, this is where "relation values come from"):

  • Take the empty degree-1 relations and pairwise NatJOIN them; join the joins (i.e. transitive closure); etc carry on until you reach a fixed point of generating no more relation values. (This gives a Boolean Algebra structure of empty relations.)

I get it, but isn't there something more systematic?

  • Take any two singleton degree-1 relations with different attribute names. InnerUNION them, that gives DEE.

The hard way, over and over again? Isn't there a shorter way?

  • More generally, take the singleton degree-1 relations of the same attribute name and pairwise InnerUNION them; union the unions (i.e. transitive closure); etc carry on until you reach a fixed point of generating no more relation values. (This gives a Boolean Algebra structure of degree-1 relations for each distinct attribute name.)
  • OK you see the game? Take pairwise any/all of the relations so far generated; both NatJOIN them and InnerUNION them; join the joins and union the unions; and join the unions; and union the joins; ... (i.e. transitive closure); ... carry on until you reach a fixed point of generating no more relation values.
  • That process will generate a lot of duplicates, simply discard them using the test for equality between relations.

I get it, but surely something more systematic? How can you be sure you didn't miss any?

  • That process will generate lots of relations with heading {X INT, Y INT, Z INT}. Amongst them will be a relation with all possible INT values for X, Y and a INT value for Z equal the sum of X, Y. Mark that one PLUS. (This is a little like the identical-looking, random-filled books in Borges' 'Library of Babel'. There will be many 'false PLUS' es with one error in Z, or two errors ..., or missing one combo for X, Y, or two combos ... That's why David should be careful to mark the true one. But don't throw the erroneous ones out, they might be useful for some other formula.)

This is weird. When you get into functions, there are just as many as values in a type. You can't keep track of them this way.

Now our useful relational operators for our user-facing D will be defined in terms of NatJOIN, InnerUNION, DUM. Since we've generated all possible relation values using all combinations of those components, we must have produced every possible query result (in abstract mathematical infinite time, of course). So no real-life query could take as operand nor produce as result anything outside the relations we have. QED.

I think that answers the points below. The algebra just treats those relation values as already available; it doesn't need to talk about headings/tuples/attributes because that all got dealt to by construction.

Despite the above comments, I'll happily give you relation values. That was roughly what I thought you had in mind. It needs to be more systematic, but t think it can work.

But I don't buy the XYZ thing. Why not create function relation values to order using set builder notation?

As I've said before, I don't think you can create new relation values within FOPL. You would have to show how, in some detail.

It's the NatJOIN/InnerUNION that create relation values; they're functions/operators within the FOPL. "create"? Arguably they're merely selecting a value already generated via the above transitive closures.

Types are sets of values: all the values already exist. You can either create the values using something like the above, or just make them already exist by virtue of the type system.

No, the question is how to select a new relation value. Given R1 and R2, how do you get R3 as the result of some query, where R3 has attribute values that do not occur in R1 or R2?

Andl - A New Database Language - andl.org
Learning about Relational Lattice
Undoubtedly, the biggest block to wider understanding of Relational Lattices amongst Relational theorists (that is, if there's anything worth understanding wider -- maybe I've just failed to find it because of the blocks) is the learning materials.
  • Wikipedia has a reasonable (for wikipedia) coverage of abstract/mathematical lattice theory. It's spread across various pages, with hap-hazard linkages, gaps, inconsistencies and non-sequiturs. Coverage is more thorough than (say) planetmath or Wolfram. The external linkages are either more simplistic than wp or densely abstract and incomprehensible.
  • The Litak et al papers are more in the densely abstract vein: there might be some good explanations (I find a few more each time I persevere); but it's just too much hard work. Furthermore they seem to be hell-bent on inventing new notation and new terminology (on top of the already abstruse terminology). We do not need another set of symbols for NatJOIN/lattice meet, InnerUNION/lattice join, DUM aka R00. We particularly do not need an implicit precedence in expressions between lattice meet/join: please put explicit parens.
  • The idea of Relational Lattices was first discussed by Vadim on c.d.t., where it was met with blank incomprehension by the Relational theorists AFAICT. (Some of them lurk here, perhaps they could comment.)
  • Vadim's papers and web pages are just awful: badly organised, poor English, wacky ideas mixed in with some good stuff. The papers with Marshall are a little better.

To learn about Relational Lattices, I essentially took the formulas from Vadim's material (in Prover9 format), the Litak axioms translated into Prover9 with explicit parens, and got Prover9/Mace4 to tell me the consequences. It turns out many of the formulas Vadim gives are just unprovable. We want them to be true under an interpretation of the logic to correspond to relations; they would be true if the logic were actually able to talk about headings and tuples. The words in between the formulas are mostly wrong or misleading or baffling. It's only the past month or so I've learnt from Vadim he has a different mental model of a Relational Lattice vs Litak. I find Vadim's approach just out-and-out wrong. Perhaps if I re-read Vadim's materials with that mental model I'd find it making more sense, but I can't see any reason to bother.

Vadim's QBQL tool is something of a red herring: it has hard-coded logic (around R00/DUM especially), so is likely to produce results inconsistent with formal proofs.

What is remarkable is both how powerful the Litak axioms turn out to be; and yet too weak to prevent structures that could never be a structure of Relations, or to thoroughly characterise the properties of DUM -- to the extent their axioms are consistent with taking DUM = DEE -- patent nonsense.

Quote from AntC on January 4, 2020, 10:28 am

... Reasoning was intractable because, as it turned out, defining sufficient properties to identify DUM is intractable. An explanation for why is for another post.

  • ...

Next I should I suppose apologise to Litak et al. Their two main axioms AxRH1, AxRH2 involve DUM/R00, which they call H just to add to the confusion, along with their using 'header' when they mean 'heading' grr. But only a mealy-mouthed apology because they fail to spell out the useful consequences of those axioms:

  • The sub-lattice of relations with the same heading is distributive. Actually the distributiveness is more powerful than that but still weaker than saying the sub-lattice forms a powerset.
  • The sub-lattice of empty relations is distributive. ditto.
  • The M3 'diamond' cannot appear as an embedded sub-lattice anywhere, even across elements with different headings or between empty and non-empty elements.

Given how much my earlier post relied on DUM/R00 when talking about headings and empty relations, you'd think that would easily put R00 in its place. But no, it doesn't even prove my equivalence above wrt non-empty relations. Their axioms fail to have the consequence that R00 is distinct from R01! ...

Some more surprising/derivable consequences:

  • Taking two relations with no attributes in common, of which one is non-empty, you can cross-product the two, then project away the non-empty one's attributes to get back the other (whether or not it was empty). Surprising because it's not dealing with the empty or same-heading relations that are axiomatised with the Boolean algebras.
  • The RL1 axiom gets used in nearly all proofs I run. Surprising because it's derivable from the two base AxRH1, AxRH2, which make heavy use of DUM/R00; and yet RL1 doesn't mention DUM/R00. From it derives the proof there can't be an embedded M3 sub-lattice -- which again doesn't mention DUM/R00.

I've added axioms to patch things up wrt DUM/R00; but Litak [private correspondence] thinks the form of them is unacceptable because they're not (quasi-)equational. Indeed the [Litak et al 2015] paper seems to be saying that would make the system undecidable. And yes most 'useful' stuff I try to prove with those added axioms comes to no conclusion.

So this claim (from the Litak et al 2015 paper Section 1.1, but essentially the same as earlier papers) is just bunkum:

Apart from an obvious mathematical interest, Birkhoff-style equational inference is the basis for certain query optimization techniques where algebraic expressions represent query evaluation plans and are rewritten by the optimizer into equivalent but more efficient expressions. As for quasiequations, i.e., definite Horn clauses over equalities, reasoning over many database constraints such as key constraints and inclusion dependencies [1, Ch. 9] (also known as foreign keys) can be reduced to quasiequational reasoning. Note that an optimizer can consider more equivalent alternatives for the original expression if it can take the specified database constraints into account.

Nearly everything needed to describe keys/foreign keys/constraints needs to talk about headings IOW needs to reason with DUM/R00. If Litak et al can't adequately characterise that (or not without making the system undecidable), it is useless for "query optimization techniques".

Decomposition/Partitioning Theorems: domain independent
I see on another thread Vadim is producing decomposition equations using his domain-dependent (and schema-dependent) absolute complement operators.
I think we already can express all the decomposition we need, and get it proved from the Litak et al axioms, wth R00/DUM of course. (Again this illustrates their surprising power, despite their indeterminacy.)
Quote from AntC on January 10, 2020, 1:23 am
...

 

Some more surprising/derivable consequences:

  • Taking two relations with no attributes in common, of which one is non-empty, you can cross-product the two, then project away the non-empty one's attributes to get back the other (whether or not it was empty). Surprising because it's not dealing with the empty or same-heading relations that are axiomatised with the Boolean algebras.
  • The RL1 axiom gets used in nearly all proofs I run. Surprising because it's derivable from the two base AxRH1, AxRH2, which make heavy use of DUM/R00; and yet RL1 doesn't mention DUM/R00. From it derives the proof there can't be an embedded M3 sub-lattice -- which again doesn't mention DUM/R00.

We can always 'vertically' partition a relation (as with a Join Dependency), and non-loss join the partitions back to what we started with:

  • r = s1 NatJOIN s2.

This is more general than a partition: s1, s2 could each have tuples not in the original r, providing all and only the tuples in common are in r (considered on the attributes in common). There are some trivial partitions r = r NatJOIN r. r = r NatJOIN DEE.

Similarly we can always 'horizontally' partition a relation. Let's generalise to:

  • r = t1 InnerUNION t2.

t1, t2 could each have attributes not in the original r, providing all and only the attributes in common are in rt1, t2 could also have tuples in common (considered on the attributes in r). There are some trivial partitions r = r InnerUNION r. r = r InnerUNION empty(r). -- i.e. the empty relation with heading same as r, = r NatJOIN R00.

Combining these orthogonal decompositions/partitions gives us four quadrants (loosely speaking), with two ways of recombining them back to where we started

  • r = s1 NatJOIN s2 = t1 InnerUNION t2.
  • = ((s1 NatJOIN t1) NatJOIN (s2 NatJOIN t1)) InnerUNION ((s1 NatJOIN t2) NatJOIN (s2 NatJOIN t2)).
  • = ((t1 InnerUNION (s1 NatJOIN R00)) InnerUNION (t2 InnerUNION (s1 NatJOIN R00)))
    NatJOIN ((t1 InnerUNION (s2 NatJOIN R00)) InnerUNION (t2 InnerUNION (s2 NatJOIN R00))).

The idiom (t1 InnerUNION (s1 NatJOIN R00)) is projecting t1 on the heading of s1. Both those recomposition equations are proven (although the last takes some chunky work dealing with R00), as are a bunch of less interesting equations such as t1 NatJOIN s1 = t1).

Again, despite the Litak axioms focussing on the Boolean algebras wrt headings (empty relations) and same-heading relations, they can be used to bridge disjoint/overlapping headings s1, s2 and non-empty/disjoint/overlapping different-heading content t1, t2. I'm just not seeing that absolute complements or the (very scary) domain dependent über-relation R11 with all possible attributes and all possible tuples for those attributes are needed.

PreviousPage 2 of 2