The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Meaning of a relation

OK, so what you are saying is that I have to consider each tuple in isolation according to the predicate I have assigned to it, but the predicate must always be of the type that the truth value of a tuple is determined by a logical "and" of the truth value of all the members.

As you apply operations, you have to consider carefully how that changes the predicate. Right, this, I think, is exactly what my question is about. Still seems to me like a pretty good analogy to a "unit of measure", which only a few languages have facilities for but is hugely helpful to avoid bugs. If it's possible to state in a computable way, of course.

When it comes to the relation in its entirety, we should perhaps logically think of each tuple as "may have this value" rather than of the set of tuples "one of these tuples may be true", but I'm not entirely convinced it matters. There are of course more options than "one of" or "all of", depending on if I have covered the entire universe of possibilities or not and what operations have been applied. Seems to me this also may change with the operation, so perhaps also a part of the "unit"?

The comment about UNION not being OR, yes, exactly, but in "A" the <or> also covers the same use as union. I think that just depends on the interpretation of the relation. If it's "one/some of" then <or> is a perfect name, as you say. It's just in the more usual database sense where I have an "all of" relation that the name doesn't match and perhaps I should really use an append or insert instead. If the choice of operator affects the interpretation of the relation (the "unit"?) then two distinct operators could have exactly the same effect on the tuples but widely different meaning.

Quote from dandl on January 12, 2021, 12:33 am
Quote from tobega on January 11, 2021, 12:05 pm

Anyway, at that time I finally got the impetus to start developing a syntax for my language, and the following year I had a working interpreter. There has always been an idea that I wanted to provide relations as a native construct, perhaps instead of Maps and Sets, and I have now started looking at some of the 2020 AdventOfCode problems and re-implementing them using relational algebra, adding syntax for it to Tailspin, as my language is called.

There is no single agreed relational algebra. My version of the (Extended) RA has 9 query operators: SPJRUN (select, project, rename, join, union, not), new, while and aggregate, plus assignment.

What's yours?

Well, I'm here, so I suppose my first approximation would aim at creating a "D", which I suppose leads me to a set-based, null-free RA, and probably "A", essentially, but I'm happy to consider other options if it helps make more semantic sense.

Yes, the aim is to create a 'D'. In doing that you have to decide on:

  • a type system: roughly as per TTM, although there is room for some variation.
  • a set of relational operators: not explicitly defined by TTM, 6-9 query operators, plus assignment.
  • a set of general language features: don't invent the wheel, copy something you know and like.
  • a set of database-specific features: roughly as per TTM, mainly to do with the catalog and operations on it.

Its a big job, probably 6-12 months. Are you up for it? There are easier roads to follow.

Thanks for your concern!

I already have a working programming language and have already added some relational algebra features.

If you're interested, here is the solution to day 7 of adventofcode 2020 with commented out sketches of solutions in sql and pseudocode https://github.com/tobega/aoc2020/blob/main/a7ra.tt The problem statement is here: https://adventofcode.com/2020/day/7

That solution is runnable. I had also planned to add features to solve day 6, but I decided to pause and reflect over both syntax and consequences of grouping and relation-valued attributes, which is leading further down the rabbit-hole. The tentative, still not runnable, solution is https://github.com/tobega/aoc2020/blob/main/a6ra.tt and the problem is, of course, at https://adventofcode.com/2020/day/6

Quote from tobega on January 12, 2021, 5:56 am

Sorry Tobe, most of your response is vague hand-waving. I'm not sure you understand what you're saying. I certainly don't. Please read Chapter 2 of 'Data Types and the Relational Model' (DTATRM), free downloadable from the TTM website. Especially 'Predicates and propositions' and 'Relation values vs relation variables'.

OK, so what you are saying is that I have to consider each tuple in isolation according to the predicate I have assigned to it, but the predicate must always be of the type that the truth value of a tuple is determined by a logical "and" of the truth value of all the members.

As you apply operations, you have to consider carefully how that changes the predicate. Right, this, I think, is exactly what my question is about. Still seems to me like a pretty good analogy to a "unit of measure", which only a few languages have facilities for but is hugely helpful to avoid bugs. If it's possible to state in a computable way, of course.

"unit of measure" is just a particular application of a type system. Yes strong static typing is helpful to avoid bugs. Most programming languages don't come 'batteries supplied' with units of measuring typing, but you can build one in modern languages.

I'm still not getting your analogy. What is it in the RA that "avoids bugs"? What is it in RA that corresponds to an ill-typed expression?

When it comes to the relation in its entirety, we should perhaps logically think of each tuple as "may have this value" rather than of the set of tuples "one of these tuples may be true", but I'm not entirely convinced it matters. There are of course more options than "one of" or "all of", depending on if I have covered the entire universe of possibilities or not and what operations have been applied. Seems to me this also may change with the operation, so perhaps also a part of the "unit"?

Chapter 2 explains the 'Closed World Assumption', and why TTM subscribes to it "noncontroversially". Whatever you're advocating there might be incoherent or controversial -- if I understood it.

The comment about UNION not being OR, yes, exactly, but in "A" the <or> also covers the same use as union.

I was careful to say SQL's UNION, one of whose 'features' is that it can return duplicate tuples in the result (look at UNION ALL). Consider what SQL's UNION does about nullable columns. In RA a relation is a set (no duplicates).

I think that just depends on the interpretation of the relation. If it's "one/some of" then <or> is a perfect name, as you say. It's just in the more usual database sense where I have an "all of" relation that the name doesn't match and perhaps I should really use an append or insert instead. If the choice of operator affects the interpretation of the relation (the "unit"?) then two distinct operators could have exactly the same effect on the tuples but widely different meaning.

If two relation values are identical, their <AND> vs their <OR> return the same result. That doesn't mean those two operators express the same operation. I've no idea what you mean by "have exactly the same effect". 2 + 2 = 4 = 2 × 2. What "effect" does + have on 2 that's different or the same vs ×?

Quote from tobega on January 10, 2021, 8:54 pm

...
But the tuple is indistinguishable from a tuple with the normal(?) interpretation "the employee works in the department on the given project". So what is missing? How do we keep track of when indistinguishable things have different meanings and what do these different meanings imply for further relational operations?

...

Even worse, I now have to accept that not every tuple is true, I now have to interpret the relation itself as having <or> between the tuples, so to speak. Which actually works with the "normal" interpretation of the tuples above, so maybe that should have been the interpretation all along, i.e. at least one of these tuples is true? But again, how to distinguish that from the interpretation that all tuples are true and what does it imply for further operations?

Best answered using an example, I believe.

Take two sets, E = {n | n mod 2 == 0 ("n is even")} and P = {n | n is prime}.

Enumerating these sets (and assuming only natural numbers are concerned), we have E = {2 4 6 8 10 ...} and P = {2 3 5 7 11 13 ...}.

If we see the number 2 appear in E then we know that "2 is even".  And if we see the number 2 appear in P then we know "2 is prime".  A bit more formally, the number 2 thus satisfies the predicate for E as well as the predicate for P.

The union of the two sets is Y = {n | n is even or n is prime}.

If we see the number 2 appear in that set, we know "2 is even or 2 is prime".  the disjunction is inside the predicate, so we do not know which of the two disjuncts cause the appearance.  All we know is that the disjunction is true (for the number 2).  If we want to know precisely which one(s) is(are) true, we have to inspect E resp. P.

That leads me to the first misconception that is identifiable in your thinking as expressed by "the "normal" meaning of tuples".  There is no such thing.  Just as there is no "normal meaning" of the number two.  "Meaning" is added by taking the predicate (/membership condition) into the mix.  Only if you know the predicate for Y (which is disjunctive in this case) will you know what "meaning" you can/should assign to ***THE APPEARANCE*** of the number 2 in that set.  Meaning goes along with ***appearance*** in the extension of a predicate.

Now replace E = {n | ... } with EMP = { TUP{EID} | The employee identified by EID works on project ALPHA} and P = {n | ...} with P = {TUP{EID} | The employee identified by EID works in dept. SALES}, then observe Y = EMP UNION PRJ = {TUP{EID} | The employee identified by EID works on project ALPHA OR works in dept. SALES} and then redo this whole story for yourself.

What does it say if TUP{EID:1} appears in Y? in EMP ? in PRJ ?  It's three times the very same tuple, thus three times the very same value, yet three different "meanings".  Which one of those three would you think should be the "normal" one ?

I believe I also saw mention of "meaning of a relation".  The "meaning of a relation" is the conjunction (the AND) of the propositions represented by each individual tuple [and under the CWA, also of the negations of all the propositions that correspond to tuples that could have been present in the relation, but aren't].  That proposition is the instantiation of the relation's predicate with the particular attribute values in the tuple.  Where it then gets confusing is if that relation predicate is itself disjunctive.

Defining sets can be done by predication (as in the E,P,Y,EMP,PRJ examples above) or by enumeration, say we just define the set E = {2 3}.  Such a set can also be defined by predication (i.e. such a set also has a membership condition) but that predicate/membership condition is nothing more than a verbose repetition of the enumerated members : E = {n | n=2 OR n=3}.

If we see the number 2 appear then we know "2=2 OR 2=3" and if we see the number 3 appear then we know "3=2 OR 3=3".  And if both appear in the set then the entire set tells us "(2=2 OR 2=3) AND (3=2 OR 3=3) [AND NOT(4=2 OR 4=3) AND NOT ...]".  Ditto if it's sets of tuples and the "entire set" is thus a relation.

 

Sorry if I've been unclear, when I've carelessly said "relation" I have been meaning a specific instance of a set of tuples, which I suppose should really be termed a "relation value"

Never mind my "unit" analogy, lets just term it the predicate of the relation, then, but, in the same way as it is pointless to add quantities of different units, it is pointless to combine two tuples with different predicates into the same set, even if they have the same datatypes. And, just as when you multiply quantities with different units the unit changes, so the predicate changes as you join the relation with another relation. Just saying that keeping track of the predicate seems to be very much akin to keeping track of the unit.

Now I have a problem because AntC and Erwin are contradicting each other. Take the example again from "A"'s <or> operator in my original post:

{{employee: E, department: D, project: J},{employee: E, department: D, project: K},{employee: E, department: D, project: L},{employee: E, department: B, project: J},{employee: E, department: C, project: J}}

Now AntC claims that each tuple must mean "Employee X works in department Y on project Z" (which is what I carelessly termed the "normal" interpretation), in which case we should also add "It may be true that..." in front of that, and consequently not all the tuples can be true, that is, the relation value is a disjunction of tuples.

Erwin, on the other hand, claims that all tuples must always be true (and the relation value represents the conjunction of all the tuples) but there is no "normal" interpretation of a tuple and it is perfectly fine for the tuple to mean "Employee X works in department Y or on project Z"

So what's it to be?

The thing I mentioned about two operators doing "the same thing" but having different meaning may depend on the outcome of the other current question, but consider that I have a set consisting of the tuple {employee: E, department: D} and another set consisting of the tuple {employee: E, department: C} where both mean "Employee X works in department Y"

I can put these together into a set containing both, with the <or> operator or the union operator, the outcome will look exactly the same as regards the set of tuples.

In one case, I might know that both are true, that E indeed shares their time between the departments, so I claim it would be more correct to use "union", and the relation value represents a conjunction of tuples.

In the other case, I know that perhaps only one of the statements is true, so it is eminently suitable to use <or> and the relation value represents a disjunction of tuples.

Quote from tobega on January 12, 2021, 1:09 pm

Sorry if I've been unclear, when I've carelessly said "relation" I have been meaning a specific instance of a set of tuples, which I suppose should really be termed a "relation value"

That's fine. Often we shorted "relation value" to "relation". But if you mean "relation variable", it's dangerous to shorten that to "relation", use the TTM term 'relvar'. What's the difference? A relvar has an associated predicate, and probably declared constraints that are a consequence of its predicate. In particular in TTM a relvar is required to have a key, see below.

A relvar has a relation (value) at all times, but that value might change from time to time from database operations -- just as the value for any program variable changes from time to time with assignments.

 

Never mind my "unit" analogy, lets just term it the predicate of the relation, then, but, in the same way as it is pointless to add quantities of different units, it is pointless to combine two tuples with different predicates into the same set,

Just wrong. I can add the number of apples to the number of oranges. The answer is not a number of apples nor a number of oranges; but its a number of pieces of fruit (for example). I can see that adding a length to a time is unlikely to have a useful meaning/interpretation, but that doesn't make it out-and-out wrong: it needs some meaningful interpretation.

Just wrong wrt the RM. Just wrong in so many ways. Do you mean "combine two tuples" or combine two relation values? How/why are you combining tuples? Do they have the same heading? What are you even saying? Especially since you don't say how you're combining them: is that UNION or INTERSECTION or what?

You appear not to have read Ch 2, as I asked. You are wasting our time.

even if they have the same datatypes. And, just as when you multiply quantities with different units the unit changes, so the predicate changes as you join the relation with another relation. Just saying that keeping track of the predicate seems to be very much akin to keeping track of the unit.

Now I have a problem because AntC and Erwin are contradicting each other.

I think that highly unlikely (and insulting). We are trying to explain the same thing using different wording, because your wording is a mess of vagueness and contradictions. You appear not to have read Ch 2, as I asked.

Take the example again from "A"'s <or> operator in my original post:

{{employee: E, department: D, project: J},{employee: E, department: D, project: K},{employee: E, department: D, project: L},{employee: E, department: B, project: J},{employee: E, department: C, project: J}}

Now AntC claims that each tuple must mean "Employee X works in department Y on project Z"

I can't see any evidence I said that.

(which is what I carelessly termed the "normal" interpretation),

The interpretation of a tuple in a relation value depends on the predicate for that value, which is derived from the predicate for the source relvar(s) and the relational operators used to derive the value.

in which case we should also add "It may be true that..." in front of that, and consequently not all the tuples can be true, that is, the relation value is a disjunction of tuples.

No that's not a relation. I did say that already.

Erwin, on the other hand, claims that all tuples must always be true (and the relation value represents the conjunction of all the tuples) but there is no "normal" interpretation of a tuple and it is perfectly fine for the tuple to mean "Employee X works in department Y or on project Z"

So what's it to be?

Now you're just trolling. Please come back when you've read Ch 2.

Quote from tobega on January 12, 2021, 1:37 pm

The thing I mentioned about two operators doing "the same thing" but having different meaning may depend on the outcome of the other current question, but consider that I have a set consisting of the tuple {employee: E, department: D} and another set consisting of the tuple {employee: E, department: C} where both mean "Employee X works in department Y"

They mean whatever the database designer specifies that they mean. Are the two sets derived from some relvar, perhaps by some relational operations? Perhaps one means this is the employees 'home' department and the other means this is where the employee is temporarily assigned.

I can put these together into a set containing both, with the <or> operator or the union operator, the outcome will look exactly the same as regards the set of tuples.

Fine. What is the predicate for that derived relation value?

In one case, I might know that both are true, that E indeed shares their time between the departments, so I claim it would be more correct to use "union", and the relation value represents a conjunction of tuples.

Hunh? "union" produces a disjunction. It's "intersection" that produces a conjunction. If you applied "intersection" you'd get an empty relation value. Representing a predicate (perhaps) "No employee is temporarily assigned to their home department".

In the other case, I know that perhaps only one of the statements is true, so it is eminently suitable to use <or> and the relation value represents a disjunction of tuples.

You appear to be saying you might use <or> or you might use <or>. What on earth are you trying to say?

What if your input relations each contain 2 tuples, the ones you identify plus {employee: F, department: D} and {employee: F, department: C} respectively? Then are you trying to build a predicate in which exactly one of the four is true? Or exactly one for each distinct Employee? What if one only of the input relations has {employee: G, department: D}? What predicate do you want now?

The relation value, in the absence of knowing the query that produced it doesn't "mean" anything. There's no reason to expect two relation values that happen to be equal but are derived from different queries "mean" the same thing.

Now I've answered more than enough of this from you. If you don't show evidence you've read Ch 2, I'm not going to answer more.

Quote from AntC on January 13, 2021, 1:40 am
Quote from tobega on January 12, 2021, 1:09 pm

Sorry if I've been unclear, when I've carelessly said "relation" I have been meaning a specific instance of a set of tuples, which I suppose should really be termed a "relation value"

That's fine. Often we shorted "relation value" to "relation". But if you mean "relation variable", it's dangerous to shorten that to "relation", use the TTM term 'relvar'. What's the difference? A relvar has an associated predicate, and probably declared constraints that are a consequence of its predicate. In particular in TTM a relvar is required to have a key, see below.

A relvar has a relation (value) at all times, but that value might change from time to time from database operations -- just as the value for any program variable changes from time to time with assignments.

 

Never mind my "unit" analogy, lets just term it the predicate of the relation, then, but, in the same way as it is pointless to add quantities of different units, it is pointless to combine two tuples with different predicates into the same set,

Just wrong. I can add the number of apples to the number of oranges. The answer is not a number of apples nor a number of oranges; but its a number of pieces of fruit (for example). I can see that adding a length to a time is unlikely to have a useful meaning/interpretation, but that doesn't make it out-and-out wrong: it needs some meaningful interpretation.

Actually you are not adding apples and oranges, you are first converting them to a common unit "fruit", i.e. M apple(s) * 1 fruit/apple + N orange(s) * 1 fruit/orange = (M+N) fruit(s).

Unfortunately we lose some information in the process and it is not easy to convert back.

Just wrong wrt the RM. Just wrong in so many ways. Do you mean "combine two tuples" or combine two relation values? How/why are you combining tuples? Do they have the same heading? What are you even saying? Especially since you don't say how you're combining them: is that UNION or INTERSECTION or what?

You appear not to have read Ch 2, as I asked. You are wasting our time.

even if they have the same datatypes. And, just as when you multiply quantities with different units the unit changes, so the predicate changes as you join the relation with another relation. Just saying that keeping track of the predicate seems to be very much akin to keeping track of the unit.

Now I have a problem because AntC and Erwin are contradicting each other.

I think that highly unlikely (and insulting). We are trying to explain the same thing using different wording, because your wording is a mess of vagueness and contradictions. You appear not to have read Ch 2, as I asked.

Take the example again from "A"'s <or> operator in my original post:

{{employee: E, department: D, project: J},{employee: E, department: D, project: K},{employee: E, department: D, project: L},{employee: E, department: B, project: J},{employee: E, department: C, project: J}}

Now AntC claims that each tuple must mean "Employee X works in department Y on project Z"

I can't see any evidence I said that.

My bad, I misread your statement below:

No. Always the logical connective between tuples in the same relation is AND -- by which I don't mean the A algebra's <AND>. That is, AND between the assertions formed by substituting each tuple's attribute values into the relation's predicate.

The mention of the "attribute values" threw me off, but you are saying the same as Erwin that there must be an "AND" between tuples so that all tuples must be true.

(which is what I carelessly termed the "normal" interpretation),

The interpretation of a tuple in a relation value depends on the predicate for that value, which is derived from the predicate for the source relvar(s) and the relational operators used to derive the value.

in which case we should also add "It may be true that..." in front of that, and consequently not all the tuples can be true, that is, the relation value is a disjunction of tuples.

No that's not a relation. I did say that already.

Erwin, on the other hand, claims that all tuples must always be true (and the relation value represents the conjunction of all the tuples) but there is no "normal" interpretation of a tuple and it is perfectly fine for the tuple to mean "Employee X works in department Y or on project Z"

So what's it to be?

Now you're just trolling. Please come back when you've read Ch 2.

Right, you do both seem to be saying the same thing, that in the example the only acceptable interpretation (the only way to have a valid relation value?) is to say that:

"All the tuples are true and each tuple means that Employee X works in Department Y OR on project Z"

The problem I have with that is that when I project on (Employee, Department), I get a relation value where it is suddenly very difficult to assert anything about the employee's relation to the department. Since it would (I presume) still be required to have "AND" between all the tuples, my only recourse seems to be to say that each tuple means "Employee X may work in Department Y".

Logically, we have now unfortunately lost some information, since the "may" could be  "doesn't" in each case. Previously I knew that the employee does in fact work in one of the departments. Unless of course there is something else that tells me that every employee must work in a department, but that is not clear.

Practically when doing algorithmic work, I think it is often more useful to think in the other way, so for the example I would want to say:

"One of the tuples is true and each tuple means Employee X works in Department Y on project Z"

The advantage of that is that under the projection on (Employee, Department), the logic trivially stays the same

"One of the tuples is true and each tuple means Employee X works in Department Y"

But perhaps Chapter 2 will show me the error of my ways, thanks for the pointer!

Quote from AntC on January 13, 2021, 1:59 am
Quote from tobega on January 12, 2021, 1:37 pm

The thing I mentioned about two operators doing "the same thing" but having different meaning may depend on the outcome of the other current question, but consider that I have a set consisting of the tuple {employee: E, department: D} and another set consisting of the tuple {employee: E, department: C} where both mean "Employee X works in department Y"

They mean whatever the database designer specifies that they mean. Are the two sets derived from some relvar, perhaps by some relational operations? Perhaps one means this is the employees 'home' department and the other means this is where the employee is temporarily assigned.

I can put these together into a set containing both, with the <or> operator or the union operator, the outcome will look exactly the same as regards the set of tuples.

Fine. What is the predicate for that derived relation value?

In one case, I might know that both are true, that E indeed shares their time between the departments, so I claim it would be more correct to use "union", and the relation value represents a conjunction of tuples.

Hunh? "union" produces a disjunction. It's "intersection" that produces a conjunction. If you applied "intersection" you'd get an empty relation value. Representing a predicate (perhaps) "No employee is temporarily assigned to their home department".

Say I get a list/set of employee to department assignments from the Main Street office and another from the High Street office. When I union those two sets, I get an AND between them (and by your own statements previously, there can only be an AND between tuples). Semantically I would say that is definitely not an OR.

Also, even if an employee has two different assignments, both are true, so still a semantic AND rather than an OR

In the other case, I know that perhaps only one of the statements is true, so it is eminently suitable to use <or> and the relation value represents a disjunction of tuples.

You appear to be saying you might use <or> or you might use <or>. What on earth are you trying to say?

What if your input relations each contain 2 tuples, the ones you identify plus {employee: F, department: D} and {employee: F, department: C} respectively? Then are you trying to build a predicate in which exactly one of the four is true? Or exactly one for each distinct Employee? What if one only of the input relations has {employee: G, department: D}? What predicate do you want now?

Indeed, that is exactly the relevant question.

This case perhaps gets a little contrived, sorry I can't make up a more believable one.

Consider that we have two places where we keep track of our employees' assignments to departments, but unfortunately we don't update them simultaneously. We do know, however, that at least one of the places always gets updated on a change of assignment. Now it makes more semantic sense to speak of OR, because in case the two sources agree, that collapses to the same tuple in the set, but if they differ, one of them is known to be true (and maybe both).

But we have run into trouble if we require an AND between all tuples, so what is a reasonable predicate here?

The relation value, in the absence of knowing the query that produced it doesn't "mean" anything. There's no reason to expect two relation values that happen to be equal but are derived from different queries "mean" the same thing.

Now I've answered more than enough of this from you. If you don't show evidence you've read Ch 2, I'm not going to answer more.

Fair enough, thanks for your patience so far!