The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Towards algebraic dependency theory?

If you ever taught database theory class, then you have probably been bewildered by the fact that the two course sections -- relational algebra and constraints -- are quite disconnected. That presents an additional challenge for a student, who can easily become disoriented with topics that have so little in common. On the other hand, imagine being able to explain functional dependencies with relational algebra toolkit. Algebraic methods, such adding expression on both sides of equality, were taught much earlier in student life, than the set based logic (if ever), this is why algebraic reasoning reins superior (at least at the level of college curriculum).

This posting is built upon two papers:

  • "Horn Clauses and Database Dependencies" by Ronald Fagin
  • "Algebraic Dependencies" by Yannakakis & Papadimitrou
Both papers have the following shortcomings as far as the stated goal of constraints algebraization is concerned. Fagin expresses constraints via faithfulness property, which is not algebraic. Yannakakis & Papadimitrou expand the relations with something what seems to be an unary operation, but the technical virtues of this operation escape me. For example, is attribute renaming expressible in their system?
I suggest amending Fagin's direct product ⊗ as follows:
  1. The header of the result, which is formally R_{00} \bowtie (x ⊗ y) , where R_{00} is the empty relation with empty signature (also known as TABLE_DUM), is defined as  R_{00} \bowtie (x \vee y). This is pretty straightforward generalization to make the operation total.
  2. Fagin defines x ⊗ y content as the set of all attribute pairs \langle t_x,t_y \rangle.  The values constitute a new type, outside of the ones already presented in the database. I suggest to keep the original value t_x if both tuples match, because this would allow expressing the equality predicate "p=q" (p and q here are attribute names, not relation variables).
Also, since a value of new type is created anyway, it can be sugarcoated in two ways:
  • It can be made symmetric (then, perhaps, the set notation \{t_x,t_y\} would have been more appropriate). Then, it would satisfy
       x ⊗ y = y ⊗ x
  • It can be made associate to satisfy
       x ⊗ (y ⊗ z) = (x ⊗ y) ⊗ z
Here is couple of axioms for the ⊗:
x ⊗ (y v z) = (x ⊗ y) v (x ⊗ z).
x < y -> x ⊗ z < y ⊗ z.
And the Heath theorem:
x ^ y ^ z = R ^ R00 &
x v y = R00 &
((R v (x ^ y))⊗(R v (x ^ y)))^(R v x) = R v (x ^ y)
->
R = (R v (x ^ y)) ^ (R v (x ^ z))
There are two more conditions onto the header relations x, y, and z which seems to be redundant
y v z = R00
x v z = R00

The functional dependency is expressed via direct product as

((R v (x ^ y))⊗(R v (x ^ y)))^(R v x) = R v (x ^ y)

This is essentially a subset condition expressed via lattice order.

Unfortunately, I'm unable to exhibit a lattice based axiom system where Heath theorem can be derived. Here is a lattice system amended with domain dependent unary negation expressed in Prover9/Mace4:
% Standard lattice axioms:
x ^ y = y ^ x.
(x ^ y) ^ z = x ^ (y ^ z).
x ^ (x v y) = x.
x v y = y v x.
(x v y) v z = x v (y v z).
x v (x ^ y) = x.

% Litak et.al.:
x ^ (y v z) = (x ^ (z v (R00 ^ y))) v (x ^ (y v (R00 ^ z))).
(x ^ y) v (x ^ z) = x ^ ( ((x v z) ^ y) v ((x v y) ^ z) ).
(R00 ^ (x ^ (y v z))) v (y ^ z) = ((R00 ^ (x ^ y)) v z) ^ ((R00 ^ (x ^ z)) v y).

% unary postfix <NOT>:
x' ^ x = R00 ^ x.
x'' = x.

% Various versions of weak distributivity:
(x ^ (y v z)) ^ ((x ^ y) v (x ^ z))' = ((((z v y) ^ y') ^ x)' v (z ^ x))'.
(x ^ (y v z)) ^ ((x ^ y) v (x ^ z))' = (((z' ^ x) ^ (z v y))' v (y ^ x))'.
(x ^ (y v z)) ^ ((x ^ y) v (x ^ z))' = ((y' ^ x)' v (z ^ x))' ^ (z v y).
(x ^ (y v z)) ^ ((x ^ y) v (x ^ z))' = ((z' ^ x)' v (y ^ x))' ^ (z v y).
(x ^ (y v z)) ^ ((x ^ y) v (x ^ z))' = (((((z ^ x) v (y ^ x)))' ^ x) ^ (z v y)).
(x ^ (y v z)) ^ ((x ^ y) v (x ^ z))' = ((((z ^ x) v (y ^ x)) v (((z v y) ^ x))'))'.

% Direct product:
x * y = y * x.
%x * (y * z) = (x * y) * z.
x * (y v z) = (x * y) v (x * z).

R00 ^ (x * y) = R00 ^ (y v x).
(x * x) ^ x = x.
(x * x') ^ x = x ^ R00.

% formulas(goals):

(x ^ y) ^ z = R ^ R00 &
x v y = R00 &
y v z = R00 &
x v z = R00 &
((R v (x ^ y))*(R v (x ^ y)))^(R v x) = R v (x ^ y) 
->
R = (R v (x ^ y)) ^ (R v (x ^ z)).

There is a major flaw with this approach. The direct product operation generates new domain values, which undermines self-checking your progress via Mace4 counterexample generation.

 

If you ever taught database theory class, then you have probably been bewildered by the fact that the two course sections -- relational algebra and constraints -- are quite disconnected.

That would have been a database theory class not using D&D's texts? Also a class that didn't pay any attention to the very first sentence of the wikipedia article on the RM:

The relational model (RM) for database management is an approach to managing data using a structure and language consistent with first-order predicate logic, ...

Expressions in Relational Algebra can be mechanically translated into expressions of first-order logic (presuming we give a FOL 'characteristic predicate' for each base relation).

Constraints (aka Dependencies) can likewise be expressed equivalently either using RA or FOL. The convention in D&D's texts is to express constraints as IS_EMPTY(expr), in which expr is a RA expression.

Vadim, I suggest you start back at the beginning by explaining what you mean by "the stated goal of constraints algebraization". Who sets this goal? Can you quote how they state it? Why should anybody at TTM or in the teaching of database theory be interested in it?

Dependency theory is well-developed: we know FunDeps + MVDs are axiomatisable, but if you add JDs inference is not decidable; indeed IIRC you've described it as the 'jewel in the crown' of the Relational Model. Then it's quite a surprise for you to now say there's some "shortcomings".

Actually, there is even more compelling operator: "Predicate squared".

Given a relation

p q
a 1
b 1
c 2

its square is the Fagin's product of the relation with itself. Again, the single element sets are identified with their content (informally, we just drop the curly brackets):

p q
a 1
{a,b} 1
b 1
{a,c} {1,2}
{b,c} {1,2}
c 2

Squaring is unary algebraic operation, therefore it should act easier on computer theorem provers, compared to its predecessor -- binary direct product.

With this newly introduced operation the Functional Dependency constraint

x \xrightarrow{\text{R}} y

is expressed as

R^2 \wedge (x \vee R) \le R^2 \wedge (y \vee R)

The relations at the both sides of the lattice order \le have the same signature, therefore it is essentially tuple subset operation.

Expressed in this form Armstrong transitivity is immediate, and reflexivity

x \le y  \xRightarrow   R^2 \wedge (x \vee R) \le R^2 \wedge (y \vee R)

is easy as well. Speaking of reflexivity, do you see now how the flip of the ordering in the classic notation of Armstrong reflexivity from premise to the conclusion is naturally interpreted via the lattice ordering acting in the opposite direction on relation headers and tuples? Please also note, that the lattice of partitions behind the dependency theory have been discovered a while ago, but what we witness here is that the lattice structure behind Functional dependencies is exactly the relational lattice. Unlike partition lattice it fits perfectly into Relational Algebra.

 

 

 

 

 

Quote from vadim tropashko on July 19, 2019, 5:40 pm
Quote from AntC on July 18, 2019, 7:42 am

... Dependency theory is well-developed: we know FunDeps + MVDs are axiomatisable, but if you add JDs inference is not decidable; indeed IIRC you've described it as the 'jewel in the crown' of the Relational Model. Then it's quite a surprise for you to now say there's some "shortcomings".

Sorry Vadim, I don't understand any of your response. I don't see how it relates to what you quote from me.

Edit: at Vadim's request (on another thread) removed quote from Vadim + discussion of terminology.

 

...: the elements of the lattice are modelling relations in the sense of Codd. They're not modelling predicates. In particular, lattice meet ≡ relational join models that Codd's Natural Join (which has as special cases Cartesian Product, Intersection of sets of tuples) produces 'flat' relations, i.e. another element of the lattice, not pairings of tuples as per mathematicians' Cartesian Product.

Quote from vadim tropashko on July 17, 2019, 5:29 pm
...
I suggest amending Fagin's direct product ⊗ as follows:
  1. The header of the result, which is formally R_{00} \bowtie (x ⊗ y) , where R_{00} is the empty relation with empty signature (also known as TABLE_DUM), is defined as  R_{00} \bowtie (x \vee y). This is pretty straightforward generalization to make the operation total.
There you've used R00 aka DUM. And that runs into a shortcoming which as far as I can see is a total block to any further algebraisation: there is no definition of R00 in terms of lattice meet/join that fixes the value. Using R00 in definitions is therefore layering in more indeterminacy.
Unfortunately, I'm unable to exhibit a lattice based axiom system where Heath theorem can be derived.
Your definitions (I've elided) rely on R00. You're unable to exhibit a lattice based axiom system where R00's existence and uniqueness can be derived. Therefore you're unable to exhibit an axiom system where anything defined in terms of R00 can be derived. There's nothing specific to Heath's theorem here. You're not saying anything about Functional Dependencies, you're just churning out consequences of a broken axiom system.
Here is a lattice system amended with domain dependent unary negation expressed in Prover9/Mace4:
Unary negation expressed in terms of R00. Therefore not fixed. Therefore no help.
There is a major flaw with this approach. The direct product operation generates new domain values, which undermines self-checking your progress via Mace4 counterexample generation.

 

Again, I think that's the least of the troubles. Vadim this is really pointless exploration. Please just stop.

Note that without being able to fix R00, you don't have a definition of 'generalised difference' aka Tutorial D's NOT MATCHING. So you don't have anything as expressive as Appendix A's algebra.

Quote from Vadim on July 19, 2019, 5:40 pm

Actually, there is even more compelling operator: "Predicate squared".

Given a relation

p q
a 1
b 1
c 2

its square is the Fagin's product of the relation with itself. Again, the single element sets are identified with their content (informally, we just drop the curly brackets):

p q
a 1
{a,b} 1
b 1
{a,c} {1,2}
{b,c} {1,2}
c 2

I'm not seeing this is using Fagin's product operator: in all the Figures in his paper giving examples of products, the results contains sequences of values not sets. That is angle brackets not curly braces. And all of the results contain pairs not singletons. That is:

* There are no "single element sets". There are no sets. There's nothing with a single element.

* We can't "just drop the curly brackets" because there are no curly brackets.

What your result looks like is some sort of partitioning or permutation of the input relation. I don't see you can use any of the thinking in Fagin's paper to support your reasoning.

 

 

That was my amendment of Fagin's definition. Again, he introduced the concept of faithfulness, while I aimed at FD expressed purely by algebraic means. Therefore, I replaced his pairs with sets. As a side effect it makes the amended direct product commutative. Then, I dropped curly brackets around singleton sets, which allowed to compare these values against the values in the original relation.

Alternatively, we may keep the pairs, but identify each <a,a> with a.