The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Why not relational ordering operators?

PreviousPage 2 of 3Next
Quote from dandl on May 17, 2020, 11:14 pm
Quote from Erwin on May 17, 2020, 8:47 am
Quote from dandl on May 17, 2020, 2:16 am

RM Pro 2 says:

D shall include no concept of a “relation” whose tuples are distinguishable by ordinal position. Instead, for
every relation r expressible in D, the tuples of r shall be distinguishable by value.

A relational ordering operator means the definition of a less function. As per RM Pro 2 brief discussion in DTTRM, the operator is not relational, it is a generic operator with two tuple arguments of the same type and returning a Boolean result. There is no concept of 'distinguished by ordinal position', but it would enable some useful queries, such as first, last, skip, take, lead, lag and rank.

I find the idea mooted in DTATRM and found in TD of having a separate ARRAY type absolutely pointless and unnecessary; the ordering operator solution is simple and self-evident. Without this, SQL does the job better.

What job ?

"No concept of tuples distinguished by ordinal position" does ***not*** mean that "some useful queries" are ruled out.  They are not ruled out.  Re-read the information principle.  If information is relevant to the user, then that information shall be present as ***values of attributes in tuples in relations***.  The only thing forbidden is that the system starts offering facilities that go beyond this rule, such as, e.g., "ordered" relations where "position of a tuple" has a material effect on, e.g., the results of the relation equality operator, that is to say, where "equality of ordered relations" is like java's LinkedList.equals() and not Set.equals().  What is manifestly not forbidden, is relations whose heading includes a RANK or ORDINAL_POS attribute.  Fits the IP perfectly, and is therefore the way to go for your "some useful queries".

I think you're agreeing with me. TTM does not forbid attributes of that kind, and presumably the operators required to compute them. The functionless(tuple,tuple) is such an operator.

What are the rules for determining that given some tuple x and some tuple y, that less(x, y) returns true, or returns false?

Are those rules dependent on any quality of x or y that is not part of the standard definition of a tuple within a relational model?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Rel defines an operator called ORDER(...) that returns an ARRAY (an ordered, random-access bag of tuples) for a relation operand. It's shorthand for declaring an ARRAY and using LOAD to populate it. There is no PRINT.

Rel also defines its inverse called UNORDER() that returns a relation given an ARRAY operand.

The more I look at that, the more it looks like a dumb idea. I think it means you can't write a relational expression to compute the rank-based functions I listed, and I think it must be why D&D added a VSS for quota queries: they knew it was a good idea but not how best to do it.

You can certainly define a system inspired by the relational model that doesn't use relations, so that you can specify row/tuple orders (and have duplicate rows?) should you want them. That's what SQL does. A correspondent of mine described such systems (including SQL) as implementing the "bagatational" model because such row-ordered, duplicate-allowing containers are bags, not relations.

In the relational model, tuples appearing in a relation in some specified ordering is meaningless, as that's not a property of a relation. By definition, a specific tuple ordering can only exist in some non-relation construct. As already noted, the only way to "order" tuples in a relation is to represent the tuple order with a rank attribute.

I have no interest in proposing bags, ordered sets or anything other than relations as defined. Tuples in a relation are not ordered, but tuples may be ordered for the purpose of certain queries, as per DTATRM.

Andl - A New Database Language - andl.org
Quote from dandl on May 17, 2020, 11:08 pm
Quote from Darren Duncan on May 17, 2020, 7:38 am

Conceptually tuples don't have a natural sorting order in the general case so applying "less" etc to them is not useful for a normal user query.

What is actually useful is having a user-defined mapping function that takes a tuple as input and results in something which does have a natural sort order, and then when a user wants to sort tuples they say use this function to facilitate it.

No, that's the wrong concept. My less function ranks two tuples for the purpose of a query, such as ranking parts by weight.

This is a more general form of how SQL does it, where the SQL form has that mapping function extract one or more attribute in an ordered sequence.

If you mean ORDER BY, it's not easy to use that for ranking.

The only value I see in a generic system-defined tuple "less" function is if one wants to put a relation in AN canonical order for the purpose of hashing the entire relation for indexing it as an RVA or such thing.

The only value? I listed a series of queries that depend on it: rank, skip/take, lag/lead, running sum etc.

David, all of the system-defined operators I'm talking about are fully generic that they take ANY 2 tuples as input and have no other inputs, so their behaviour is governed solely by the properties of 2 generic tuples.

Say for example "less( {x: 4, y: 6}, {x: 2, y: 9} )" I don't believe makes sense as a generic system-defined operator, which of those 2 tuples do you think it would reasonably order first?

Whereas I'm saying that to meaningfully sort tuples one needs a user-defined operator that has specific knowledge of particular tuple types and how to handle them.

So is your "less" function system-defined or user-defined?

The only way I see a system-defined "less" function working here is if it actually is a higher-order function taking an additional argument that is a user-defined function with knowledge of the tuple type, but then conceptually we just have a user-defined "less" anyway.

Quote from dandl on May 17, 2020, 11:21 pm

Rel defines an operator called ORDER(...) that returns an ARRAY (an ordered, random-access bag of tuples) for a relation operand. It's shorthand for declaring an ARRAY and using LOAD to populate it. There is no PRINT.

Rel also defines its inverse called UNORDER() that returns a relation given an ARRAY operand.

The more I look at that, the more it looks like a dumb idea. I think it means you can't write a relational expression to compute the rank-based functions I listed, and I think it must be why D&D added a VSS for quota queries: they knew it was a good idea but not how best to do it.

It works particularly well for its intended purpose of providing functionality similar to a SQL cursor. It can be used for creating order-dependent operations, for which it also works well.

You can certainly define a system inspired by the relational model that doesn't use relations, so that you can specify row/tuple orders (and have duplicate rows?) should you want them. That's what SQL does. A correspondent of mine described such systems (including SQL) as implementing the "bagatational" model because such row-ordered, duplicate-allowing containers are bags, not relations.

In the relational model, tuples appearing in a relation in some specified ordering is meaningless, as that's not a property of a relation. By definition, a specific tuple ordering can only exist in some non-relation construct. As already noted, the only way to "order" tuples in a relation is to represent the tuple order with a rank attribute.

I have no interest in proposing bags, ordered sets or anything other than relations as defined. Tuples in a relation are not ordered, but tuples may be ordered for the purpose of certain queries, as per DTATRM.

If "tuples may be ordered for the purpose of certain queries", then you either need to represent the ordering with an attribute in an unordered collection of tuples, or order the tuples in an ordered collection of tuples.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on May 17, 2020, 11:14 pm
Quote from Darren Duncan on May 17, 2020, 11:00 pm
Quote from Dave Voorhis on May 17, 2020, 9:22 am

You can certainly define a system inspired by the relational model that doesn't use relations, so that you can specify row/tuple orders (and have duplicate rows?) should you want them. That's what SQL does. A correspondent of mine described such systems (including SQL) as implementing the "bagatational" model because such row-ordered, duplicate-allowing containers are bags, not relations.

In the relational model, tuples appearing in a relation in some specified ordering is meaningless, as that's not a property of a relation. By definition, a specific tuple ordering can only exist in some non-relation construct. As already noted, the only way to "order" tuples in a relation is to represent the tuple order with a rank attribute.

I would say there are actually THREE interesting variants of a tuple collection here, which are defined over sets, bags, and arrays respectively.

  • A normal relation is based on a set of tuples, being unordered and without duplicates.
  • A bag of tuples like SQL's TABLE is ALSO unordered, and its properties are a fairly straightforward generalization of a relation.
  • An array of tuples is the only option that is ordered, and it also allows duplicates, this is what the result of ORDER BY is.

All 3 of the above cases can be relation-like in having the concept of a heading and a tuple-composed body with generalizations of the same behaviors.

"Relation-like"?

That borders on a 4 being 3-like because they only differ by 1.

:-)

By "relation-like" I mean that most of the properties of relations apply to the others, or the others are more general cases.  It is the same as saying that a multiset is set-like.

Quote from Dave Voorhis on May 17, 2020, 11:18 pm
Quote from dandl on May 17, 2020, 11:14 pm
Quote from Erwin on May 17, 2020, 8:47 am
Quote from dandl on May 17, 2020, 2:16 am

RM Pro 2 says:

D shall include no concept of a “relation” whose tuples are distinguishable by ordinal position. Instead, for
every relation r expressible in D, the tuples of r shall be distinguishable by value.

A relational ordering operator means the definition of a less function. As per RM Pro 2 brief discussion in DTTRM, the operator is not relational, it is a generic operator with two tuple arguments of the same type and returning a Boolean result. There is no concept of 'distinguished by ordinal position', but it would enable some useful queries, such as first, last, skip, take, lead, lag and rank.

I find the idea mooted in DTATRM and found in TD of having a separate ARRAY type absolutely pointless and unnecessary; the ordering operator solution is simple and self-evident. Without this, SQL does the job better.

What job ?

"No concept of tuples distinguished by ordinal position" does ***not*** mean that "some useful queries" are ruled out.  They are not ruled out.  Re-read the information principle.  If information is relevant to the user, then that information shall be present as ***values of attributes in tuples in relations***.  The only thing forbidden is that the system starts offering facilities that go beyond this rule, such as, e.g., "ordered" relations where "position of a tuple" has a material effect on, e.g., the results of the relation equality operator, that is to say, where "equality of ordered relations" is like java's LinkedList.equals() and not Set.equals().  What is manifestly not forbidden, is relations whose heading includes a RANK or ORDINAL_POS attribute.  Fits the IP perfectly, and is therefore the way to go for your "some useful queries".

I think you're agreeing with me. TTM does not forbid attributes of that kind, and presumably the operators required to compute them. The functionless(tuple,tuple) is such an operator.

What are the rules for determining that given some tuple x and some tuple y, that less(x, y) returns true, or returns false?

Are those rules dependent on any quality of x or y that is not part of the standard definition of a tuple within a relational model?

It's a user defined function. In the DTATRM example it's defined by

RANK P BY ( DESC WEIGHT AS WEIGHT_RANK )

In Andl it's defined by .order()

// ordered and grouped on CITY descending, with subtotalling/running sum
S .order(%-CITY) .select{ * 
ord:=ord(),
ordg:=ordg(),
lag:=lag(STATUS,1), 
lead:=lead(STATUS,1), 
nth:=nth(STATUS,1), 
sum:=sum(STATUS), // running sum within group
max:=hi(STATUS),
min:=lo(STATUS),
ave:=fold(+,STATUS)/fold(+,1),
}

In C# it might look like this (City descending):

less = (t1,t2) => t1.City > t2.City

 

Andl - A New Database Language - andl.org
Quote from Darren Duncan on May 17, 2020, 11:25 pm
Quote from dandl on May 17, 2020, 11:08 pm
Quote from Darren Duncan on May 17, 2020, 7:38 am

Conceptually tuples don't have a natural sorting order in the general case so applying "less" etc to them is not useful for a normal user query.

What is actually useful is having a user-defined mapping function that takes a tuple as input and results in something which does have a natural sort order, and then when a user wants to sort tuples they say use this function to facilitate it.

No, that's the wrong concept. My less function ranks two tuples for the purpose of a query, such as ranking parts by weight.

This is a more general form of how SQL does it, where the SQL form has that mapping function extract one or more attribute in an ordered sequence.

If you mean ORDER BY, it's not easy to use that for ranking.

The only value I see in a generic system-defined tuple "less" function is if one wants to put a relation in AN canonical order for the purpose of hashing the entire relation for indexing it as an RVA or such thing.

The only value? I listed a series of queries that depend on it: rank, skip/take, lag/lead, running sum etc.

David, all of the system-defined operators I'm talking about are fully generic that they take ANY 2 tuples as input and have no other inputs, so their behaviour is governed solely by the properties of 2 generic tuples.

Say for example "less( {x: 4, y: 6}, {x: 2, y: 9} )" I don't believe makes sense as a generic system-defined operator, which of those 2 tuples do you think it would reasonably order first?

Whereas I'm saying that to meaningfully sort tuples one needs a user-defined operator that has specific knowledge of particular tuple types and how to handle them.

So is your "less" function system-defined or user-defined?

User-defined. See my answer elsewhere.

The only way I see a system-defined "less" function working here is if it actually is a higher-order function taking an additional argument that is a user-defined function with knowledge of the tuple type, but then conceptually we just have a user-defined "less" anyway.

Exactly. What else did you think I meant?

Andl - A New Database Language - andl.org

If "tuples may be ordered for the purpose of certain queries", then you either need to represent the ordering with an attribute in an unordered collection of tuples, or order the tuples in an ordered collection of tuples.

Well, you could do it that way but it's not actually necessary. All the queries I listed will need to treat the set of tuples as if they were in some order, but there is no actual necessity to surface that ordering as a visible attribute. The query might be to divide a group of students into 5 groups A-E of equal size based on test score. Another query would be a running sum of money amount by date. Another would be to calculate a moving average of a stock price. In each case there is no particular need to represent the ordering as an attribute, but the final calculated value will certainly be an attribute.

[Yes, I know you can write code to do any of those, but this is about adding power to the query language. We already have sufficiently powerful GP languages.]

Andl - A New Database Language - andl.org
Quote from dandl on May 18, 2020, 1:07 am
Quote from Darren Duncan on May 17, 2020, 11:25 pm

The only way I see a system-defined "less" function working here is if it actually is a higher-order function taking an additional argument that is a user-defined function with knowledge of the tuple type, but then conceptually we just have a user-defined "less" anyway.

Exactly. What else did you think I meant?

Honestly, I was thinking of "less" in its traditional sense of being a dyadic operator whose only arguments are the 2 things being compared.  But we're on the same page now.

Quote from dandl on May 18, 2020, 1:23 am

If "tuples may be ordered for the purpose of certain queries", then you either need to represent the ordering with an attribute in an unordered collection of tuples, or order the tuples in an ordered collection of tuples.

Well, you could do it that way but it's not actually necessary. All the queries I listed will need to treat the set of tuples as if they were in some order, but there is no actual necessity to surface that ordering as a visible attribute. The query might be to divide a group of students into 5 groups A-E of equal size based on test score. Another query would be a running sum of money amount by date. Another would be to calculate a moving average of a stock price. In each case there is no particular need to represent the ordering as an attribute, but the final calculated value will certainly be an attribute.

[Yes, I know you can write code to do any of those, but this is about adding power to the query language. We already have sufficiently powerful GP languages.]

I was referring to the final calculated value. It either needs to be an attribute -- as you're doing -- or a structure that supports ordered tuples.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
PreviousPage 2 of 3Next