The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Expressing FDs without RENAME (nor COUNT)

This is something of a half-answer to Vadim's thread on algebraic dependency theory; and a musing on David B's comment on the 'Uniqueness of R00' thread:

Appendix A goes to a lot of trouble to remove RENAME, but it does so by sleight of hand: the attribute names are simply moved somewhere else.

As Philip Kelly pointed out a long time ago, Appendix A doesn't really eliminate RENAME (I agree with "sleight of hand"). Instead it defines RENAME in terms of a COMPOSE with a relation constant that has a header of the from/to attribute names, both same attribute type, and tuples with every value in the domain of the attribute type paired same-same.

Is that relation constant just like Appendix A's PLUS? Consider this: PLUS has attribute names {X, Y, Z}; suppose we want to 'apply' it to a relation with attribute names {A, B} (and others) to EXTEND with an attribute {C} that is sum of {A, B}. How do we 'adapt' PLUS's {X, Y, Z} to {A, B, C}? We need to RENAME PLUS's X to A and Y to B and Z to C. How do we achieve the RENAMEs? If RENAME's relation constant really were like PLUS, with attributes named (say) {V, U}, we'd take that relcon and RENAME V to X and U to A, in order to 'apply' it to PLUS. But oh no! We can't use a RENAME to implement RENAME.

Instead, we have to take it that there's an additional primitive concept not acknowledged in Appendix A, commonly known as 'equi-relation' that builds the needed relcon from two given attribute names (and their attribute type).

If this followed Hall, Hitchcock, Todd 1975, that would be an 'algorithmic relation'.

In QBQL, the notation is (from memory) [X = A]. Note an equi-relation can be used to RENAME X to A or A to X; or to restrict a relation with both attributes to tuples with those attribute values equal; or to extend a relation having attribute X by attribute A being a copy of X's value or vice versa. That's why QBQL doesn't follow Tutorial D's syntax of using :=.

I tried very hard to produce an equivalent of equi-relations within the lattice algebra; and eventually gave up. I was trying specifically to be able to express a Functional Dependency constraint.

What I did achieve (I think, but I'm posting here to get opinion) is expressing a bijection relation. And I think that's sufficient to express an FD. I didn't get to the algebraisation of FD I was looking for; but with a great deal of hindsight, perhaps that's because I didn't have R00 adequately defined. OK let's assume I can fix R00 sufficient for the properties I need in the following. Can I express an FD thusly?

* Relation R with FD X -> Y

* Assume X, Y are sets of attributes, subsets of the attributes in heading of R, neither empty, X, Y disjoint.

(Those assumptions express the 'useful' varieties of FDs; they can all be expressed as axioms in the lattice algebra providing we can fix R00. We can later consider relaxing some of them for trivial FDs etc if/when we can get this off the ground.)

Suppose we have a bijection relation yy' with heading {Y, Y'} where those are disjoint non-empty sets of attributes. Note I am not assuming Y' has the same number of attributes as Y or that there is any 'correspondence' between individual attributes. Merely that forall relations z with attributes including Y but not Y' or with attributes including Y' but not Y:

* (z COMPOSE yy') COMPOSE yy' = z.

Think of this as: I can move attributes Y in z to some computed holding slot Y'; I can bring them back from Y' to Y and get the answer I started with; or equivalently I can move attributes Y' in z to some computed holding slot Y; I can bring them back from Y to Y' and get the answer I started with. Emphasise: starting with any relation z that has attributes Y not Y', or Y' not Y. The bijectivity is crucial to having yy' defined as we need. It means that in yy' for each distinct sub-tuple value of heading Y there is a distinct sub-tuple value of heading Y'.

Then we express the above FD constraint as:

1. project {X, Y} from R;

2. COMPOSE the projection with bijection yy' to get a relation with heading {X, Y'}

3. JOIN that back to R (i.e. JOIN on attributes X in common), then we have attributes {X, Y, Y', and others}

4. Take that join NOT MATCHING yy'.

5. The result must be empty, if the FD holds.

I hope you can see how these steps correspond to expressing a FD with the traditional RENAME at step 2. What 'goes wrong' if the FD doesn't hold is that step 1. produces tuples with same X value but differing Y values; step 2 maps them to differing Y' values; step 3 cross-multiplies each X value with all the differing Y' values; then step 4 finds those differing Y's don't match the bijection yy'.

Note these steps don't explicitly reverse the COMPOSE with yy' from {X, Y') back to {X, Y}, but step 4's NOT MATCHING implicitly relies on that.

If step 4 amounts to WHERE y <> y' (that is, such an inequality test included for each attribute in y and all of them ORed together) then that's the technique SIRA_PRISE uses to turn key specifications into the general database constraint form IS_EMPTY(<rel_exp>) and derive the enforcement strategy from that.

(The nice thing is that it also works for keys declared on views.)

I've posted that a number of times on the forum already.

 

While I certainly admire your ability to follow the logic through to a correct final conclusion, I don't find the steps at all easy to understand. I would not like to find things like this in code.

What I think I would prefer is to treat all operations that mention attributes as declarative in nature, set out as declarations and type manipulations before getting into the part with the relational expressions. Something like:

Type
   T1 is R1 & R2,  // common attributes
   T2 is R2 - R1,  // non-common of R2
   R1{A5} match R2{A8},     // make joinable
   R1{A17} no match R2{A17} // keep separate
   R1{A,B,C} match T3{X,Y,Z};
R4 := R1{T1} union R2{T1}  // projection
R3 := R1 join PLUS;

The intention is to confine the use of attribute names to the type system and keep them out of the executable code. The underlying database might indeed have the named attributes as per this code, or different attributes, or none at all. The compiler inserts whatever conversions are needed: performing RENAMEs or replacing attributes by ordinals as needed.

As per Alice, I prefer to think in terms of a database that has ordinal columns and no attribute names, and then a separate set of functions that map a set of attribute names to the ordinals of the database. The database relvars are typeless; the compiler applies a type system for the convenience of authoring code.

Andl - A New Database Language - andl.org
Quote from Erwin on July 25, 2019, 10:58 am

If step 4 amounts to WHERE y <> y' (that is, such an inequality test included for each attribute in y and all of them ORed together) ...

Step 4 takes the place of the inequality test you'd use if RENAMing. That is, that's my intent. But I'm asking if I've got it right using an  injective relcon instead.

Quote from dandl on July 25, 2019, 11:06 am

While I certainly admire your ability ...

Let's hold back on the admiring until someone tells me if I've got it right.

to follow the logic through to a correct final conclusion, I don't find the steps at all easy to understand. I would not like to find things like this in code.

And there I was thinking I'd already made it clear the algebra underpins the surface D/the compiler would be generating the algebra code; the end-programmer wouldn't be writing that code direct.

What I think I would prefer is to treat all operations that mention attributes as declarative in nature,

Again in the surface D we'd be declaring a Functional Dependency. My steps are what the compiler needs to do to check the constraint holds.

 

 

There is no need for negation (aka NOT MATCHING). Lattice order (tuple subset) would suffice.

"p->q"=[p q r]      -- to reinforce the names in double quotes idea
        0 a 0
        0 a 1
        1 c 0
        1 c 1
        2 a 0
;
R = "p->q" v [p q];                -- don't need the third attribute

S =(R ^ (R /^ "q=q1")) v [q q1];   -- define S
S;                                 -- print S
S ^ "q=q1" = S.                    -- assertion equivalent to: S < "q=q1"  
                                   -- (had to use lattice equality instead of lattice order due
                                       to bug in QBQL when comparing relation S with equivalence predicate "q=q1")

T = (R ^ (R /^ "p=p1")) v [p p1];
T;
T ^ "p=p1" = T.

Output:

S=[q  q1]
   a  a
   c  c
;
T=[p  p1]
   0  0
   0  2
   1  1
   2  0
   2  2
;
Time = 14
*** False Assertion ***
T^"p=p1"=T

Quote from AntC on July 25, 2019, 3:29 pm
Quote from Erwin on July 25, 2019, 10:58 am

If step 4 amounts to WHERE y <> y' (that is, such an inequality test included for each attribute in y and all of them ORed together) ...

Step 4 takes the place of the inequality test you'd use if RENAMing. That is, that's my intent. But I'm asking if I've got it right using an  injective relcon instead.

OK, to re-iterate,

1. project {X, Y} from R;

I am personally not concerned with this projection.  I am concerned with some relation schema in which some FD is supposed/required to hold and that relation schema could well be the schema of a projection view, so I think what I have to say applies regardless of your point 1.

2. COMPOSE the projection with bijection yy' to get a relation with heading {X, Y'}

My step 2 is to RENAME but it makes no difference : in both your and mine case the result is a relation in which only the attributes of the concerned FD's LHS have "kept their name" and the others haven't.

3. JOIN that back to R (i.e. JOIN on attributes X in common), then we have attributes {X, Y, Y', and others}

The "and others" does not apply in my case because of my remark on point 1., but it makes no difference.

4. Take that join NOT MATCHING yy'.

So you get a relation from the original where all of the attributes in X (the FD's LHS) match but not all of the (corresponding) attributes between (those in) y and y' match.  (Which is indicative of two distinct tuples in the original in which all of the attributes in X match.)  Yes that's my step 4.

5. The result must be empty, if the FD holds.

Yes that's how I do it too.

 

So with this post you officially have the confirmation that you've got it right from someone who is a member of both the set of non-algebraicists and that of non-logicians.

The same simple stupid silly software engineer who spent a whole day improving the JOIN strategy written out in PL/1 to match a millions-of-records flatfile to another 150K records one in such a way that it would not consume more than 3600 CPU seconds, only to find that when he was ready to test whether it really worked and he hadn't introduced any new errors (standard consequence of software maintenance), operations had already decided to give the job a try with a limit of 6 CPU hrs.  I had to go weeping to the systems guy of all folks to get any form of consolation.

Quote from Erwin on July 25, 2019, 9:20 pm
Quote from AntC on July 25, 2019, 3:29 pm
Quote from Erwin on July 25, 2019, 10:58 am

If step 4 amounts to WHERE y <> y' (that is, such an inequality test included for each attribute in y and all of them ORed together) ...

Step 4 takes the place of the inequality test you'd use if RENAMing. That is, that's my intent. But I'm asking if I've got it right using an  injective relcon instead.

...

So with this post you officially have the confirmation that you've got it right from someone who is a member of both the set of non-algebraicists and that of non-logicians.

Thank you. The parts for which I think I need confirmation from an algebraicist/logician is whether my definition of what makes a relation represent a bijective function is robust. Note the intent is that any suitably bijective relation will do. So the compiler-writer's/implementor's guilty little secret will be that if they need 'just any' bijective relation, they can implement that as a RENAME.

OK let's recap wrt expressing dependencies in the lattice algebra:

* we have (maybe) a way to express FDs. I don't hear Vadim celebrating: this has been a difficulty for several years.

* JD constraints are easily expressed as 'lossless join decomposition'

* Therefore MVDs are easy as a special case of JDs.

* FDs 'come with' a MVD: if the definitions I've given for FDs are correct, we should be able to use the algebra to infer that MVD. Needs R00 robustly defined, of course.

* INDs are easily expressed using the lattice ordering equivalent of subset -- providing the two relations have the same attribute names for the attributes involved in the IND.

What if the IND is between differently-named attributes? Conventionally you'd use a RENAME to get the same attrib names. Instead, can I use the bijective relation idea? I think it needs both apply the bijection from subset relation to superset relation's attributes and check the subset as lattice ordering; and apply the bijection 'in reverse' from superset relation's attributes to subset's and again check the superset as lattice ordering. Why both directions? Because it would be possible to 'cheat' in the construction of the bijection if it were only to apply in one direction.

There's an ugly case of INDs where some of the attributes are same-named, some different. Then we need to split out the differently-named for the bijection treatment.

Now I'm entertaining a radical thought: is RENAME (or equivalently equi-relations) essential to relational completeness? Could we weaken the requirement to being for bijective relations? Consider

* The first appearance of (what was effectively) RENAME in the literature is in Codd 1972 'Relational Completeness ...' to give a definition of Natural Join in terms of Cartesian Product with various renamings. TTM avoids that need by making JOIN aka <AND> the primitive, then (trivially) defining Cartesian Product in terms of <AND>.

* FDs can be defined without RENAME tbc.

* INDs can be defined without RENAME tbc.

* Another common use for RENAME is where self-joining a relation (e.g. representing a hierarchy), to avoid too much joining. Again we could use a bijection to move attributes out of the way, then bring them back, or use the bijection relation in some restriction role, as I've done for FDs.

 

Thoughts?

Quote from Vadim on July 25, 2019, 4:25 pm

There is no need for negation (aka NOT MATCHING). Lattice order (tuple subset) would suffice.

Indeed. But this being a TTM Forum, I'm following TTM style of expressing a constraint as IS_EMPTY( relexpr ).

I'm completely flummoxed by the QBQL you give.

EDIT: On further staring at the code I diml perceive you might be doing some of what I thought you weren't. But I'm still guessing: do not expect anybody here to be able to read QBQL code and understand what it's doing. I've strikethroughed some of my original post.

Is this a proof that agrees with what I'm trying, or a disproof? It ends with "False assertion" but that seems to apply to something with an expression involving a relation named "p=p1" -- which appears to be something about the determinant of the FD. What does renaming the determinant have to do with anything? ADDIT: Perhaps it's to show that a FD p->q holds but not q->p. I.e. the second leg example in my next paragraph.

If this were to be some sort of example or counter-example, it would surely need two legs: one relation value where the FD holds; another where the FD doesn't hold.

If "q=q1" is denoting a RENAME, you seem to have misunderstood the point: I'm aiming to express FDs without RENAME. If it's denoting a bijection: a) don't put "=" in the name, that's utterly confusing; b) where is all the machinery that expresses its properties as a bijection, or that gives its content as an example of a bijection?

I'm greatly surprised that a relatively simple piece of code has revealed a bug in QBQL. This rather calls into question any proofs or counter-examples using QBQL.

In short: I've used up a lot of time staring at this code. I'm really not sure what it's doing, but I suspect it has no bearing on the problem at hand. It certainly doesn't count as any proof or disproof without much more work and explanation of steps.

...

If you think it has some point, please try the whole exercise again.

I think it's not too hard to get rid of RENAME and indeed all use of attributes in RA expressions if we allow ourselves just two constructs. In the spirit of relvars I propose:

1. A relfun is a relational function

relfun TIMES(x rational, y rational, z rational) {
   return {z / y, z / x, x * y}  // believe!!
}
relfun TIMES_GROSS(QTY rational, WEIGHT rational, GROSS rational) {
   return TIMES  // equivalent to rename
}
relfun WHERE_SMITH(SNAME character) {
   return {"SMITH"}  // relational literal
}
relfun EXTEND_CREDIT(S# character, credit rational) {
   return {"S1",1000}, {"S2",2000}, {"S3",1500}, etc // relation literal
}
relfun PROJECT_CITY(CITY character) { 
   // empty, never used
}

2. A reltype is a relational type_of, which we will use often so we make it an operator (#).

relvar v1 = S{#PROJECT_CITY}  // project
relvar v2 = S join WHERE_SMITH  // restrict
relvar s3 = P join EXTEND_CREDIT // extend literal
relvar s4 = P join SP join TIMES_GROSS // extend computational

There are still lots of loose ends before this is a genuine language proposal, but I'm convinced that if you can define relfuns and reltypes as shown above you won't need either RENAME or attributes in RA expressions, and that appeals to me.

 

Andl - A New Database Language - andl.org