The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Function relations

There are 3 places in TD that embed formulae into the RA. They look something like this (if I have the syntax right):

S WHERE STATUS > 20
EXTEND P : { WEIGHT_LB := WEIGHT * 2.2 }
SUMMARIZE SP BY S# :  { TOT_QTY := SUM(QTY) }

My proposal is to replace all occurrences of formulae by Function Relations, akin to Algebra-A relation literals and HHT algorithmic relations. Features:

  • An FR implements a specific single function, with known positional arguments and a known return type
  • An FR may be a predicate, a computation or an aggregation, substituting for WHERE, EXTEND or SUMMARIZE respectively
  • An FR is invoked by providing names for the attributes that occupy the positions of arguments (or literal arguments) and return type, in similar fashion to PROJECT and RENAME
  • There is a library of useful FRs; the means to add user-defined FRs is expected but outside this scope
  • The compiler is free to require explicit type matching or resolve overloaded FRs, as the implementer wishes.

S MATCHING GT{STATUS,20}
P JOIN TIMES{WEIGHT,2.2,WEIGHT_LB}
SP JOIN SUM{S#,QTY}

Note that the aggregation arguments are all the grouping attributes. The return type replaces the aggregated column of that same name. RENAME it if you want.

That's it. The result is a pure relational query sub-language, relations all the way down, guaranteed safe and terminating. It can be embedded into a host language or sent to a server for execution. It can even be cross-compiled into SQL or used to implement a visual programming language as per Knime. Obviously the limiting factor is the FR library available in the target environment.

Note the guarantee of safe and terminating imposes restrictions on FRs. In particular, an FR must not throw an error, must not cause a side-effect, must not loop indefinitely, etc. An FR that cannot return a value (eg divide by zero or NaN etc) will simply not return a value for those argument values. Unsafe FRs might be permitted if really needed, but that's outside scope.

Andl - A New Database Language - andl.org
Quote from dandl on March 30, 2020, 1:39 am

There are 3 places in TD that embed formulae into the RA.

Rather than "formulae", I think you're trying to talk about (some) appearances of <attribute ref>s [DTATRM Chapter 5] or 'open expressions'  [not sure where that term got introduced, but see DBE Ch 11, bullet point in the intro "Expressions in general are of two kinds, open and closed."]

I'd be interested to know if D&D are aware that Tutorial D is a Relational Algebra, let alone the Relational Algebra.

They look something like this (if I have the syntax right):

S WHERE STATUS > 20
EXTEND P : { WEIGHT_LB := WEIGHT * 2.2 }
SUMMARIZE SP BY S# :  { TOT_QTY := SUM(QTY) }

I think those examples are too simplistic, and incomplete. Some more grist (haven't checked the syntax):

S RENAME {SCITY := CITY}

S GROUP {STUFF := {SNAME, CITY, STATUS} }   // see also UNGROUP, WRAP, UNWRAP

EXTEND (P JOIN SP) : {EXT_WEIGHT_LB := (WEIGHT * 2.2) * QTY }   // what's the <attribute ref>(s) for the RHS of :=?

My proposal is to replace all occurrences of formulae by Function Relations, akin to Algebra-A relation literals and HHT algorithmic relations.

"akin" is unclear. So are you proposing something that's a relation or not? In what respects does it behave like a relation? For example, what is its key (declared or inferred), do Functional Dependencies hold?

Are these things visible to a Tutorial D programmer, or merely under the hood in the implementation. If the latter, why should anyone care?

Features:

  • An FR implements a specific single function, with known positional arguments and a known return type

If the args are positional, use round parentheses not curly braces. Given that D&D have gone to such lengths to consistently use attribute naming rather than positional notation, it seems a backwards step to use positional arguments -- both Appendix A and HHT avoid that.

  • An FR may be a predicate, a computation or an aggregation, substituting for WHERE, EXTEND or SUMMARIZE respectively
  • An FR is invoked by providing names for the attributes that occupy the positions of arguments (or literal arguments) and return type, in similar fashion to PROJECT and RENAME

I don't understand how that works: an attribute name appearing in an argument position denotes what? You also show numeric literals in argument position. What do they denote, and how does that differ from using an attribute name? If these are read-only functions I'm looking for referential transparency.

  • There is a library of useful FRs; the means to add user-defined FRs is expected but outside this scope
  • The compiler is free to require explicit type matching or resolve overloaded FRs, as the implementer wishes.

S MATCHING GT{STATUS,20}
P JOIN TIMES{WEIGHT,2.2,WEIGHT_LB}

See my less simplistic example above. The sub-term (WEIGHT * 2.2) doesn't have an attribute name.

The result of an EXTEND must have the same cardinality as the l.h. relation. How do you guarantee that using JOIN?

SP JOIN SUM{S#,QTY}

No. That doesn't appear to be summarising SP. Explanation below is unclear.

Note that the aggregation arguments are all the grouping attributes. The return type replaces the aggregated column of that same name. RENAME it if you want.

That's it. The result is a pure relational query sub-language, relations all the way down, guaranteed safe and terminating. It can be embedded into a host language or sent to a server for execution. It can even be cross-compiled into SQL or used to implement a visual programming language as per Knime. Obviously the limiting factor is the FR library available in the target environment.

Note the guarantee of safe and terminating imposes restrictions on FRs. In particular, an FR must not throw an error, must not cause a side-effect, must not loop indefinitely, etc. An FR that cannot return a value (eg divide by zero or NaN etc) will simply not return a value for those argument values. Unsafe FRs might be permitted if really needed, but that's outside scope.

 

Quote from dandl on March 30, 2020, 1:39 am

There are 3 places in TD that embed formulae into the RA. They look something like this (if I have the syntax right):

S WHERE STATUS > 20
EXTEND P : { WEIGHT_LB := WEIGHT * 2.2 }
SUMMARIZE SP BY S# :  { TOT_QTY := SUM(QTY) }

My proposal is to replace all occurrences of formulae by Function Relations, akin to Algebra-A relation literals and HHT algorithmic relations. Features:

  • An FR implements a specific single function, with known positional arguments and a known return type
  • An FR may be a predicate, a computation or an aggregation, substituting for WHERE, EXTEND or SUMMARIZE respectively
  • An FR is invoked by providing names for the attributes that occupy the positions of arguments (or literal arguments) and return type, in similar fashion to PROJECT and RENAME
  • There is a library of useful FRs; the means to add user-defined FRs is expected but outside this scope
  • The compiler is free to require explicit type matching or resolve overloaded FRs, as the implementer wishes.

S MATCHING GT{STATUS,20}
P JOIN TIMES{WEIGHT,2.2,WEIGHT_LB}
SP JOIN SUM{S#,QTY}

I guess there might be some pedagogical value in adding such a capability. I wouldn't consider replacing expressions with Function Relations (and I don't think D & D would consider it either) as it appears to make Tutorial D less practical, less ergonomic, and less intuitive, particularly for typical real world uses that involve considerably more complex expressions than those shown here.

Note that the aggregation arguments are all the grouping attributes. The return type replaces the aggregated column of that same name. RENAME it if you want.

That's it. The result is a pure relational query sub-language, relations all the way down, guaranteed safe and terminating. It can be embedded into a host language or sent to a server for execution. It can even be cross-compiled into SQL or used to implement a visual programming language as per Knime. Obviously the limiting factor is the FR library available in the target environment.

Note the guarantee of safe and terminating imposes restrictions on FRs. In particular, an FR must not throw an error, must not cause a side-effect, must not loop indefinitely, etc. An FR that cannot return a value (eg divide by zero or NaN etc) will simply not return a value for those argument values. Unsafe FRs might be permitted if really needed, but that's outside scope.

That might be a worthwhile language in its own right, but it would be a seriously hobbled Tutorial D to the point of un-usability, at least for the sorts of things I do with it.

I should add that I agree with all of AntC's comments. Mine here are additions.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

I guess there might be some pedagogical value in adding such a capability. I wouldn't consider replacing expressions with Function Relations (and I don't think D & D would consider it either) as it appears to make Tutorial D less practical, less ergonomic, and less intuitive, particularly for typical real world uses that involve considerably more complex expressions than those shown here.

Perhaps it wasn't clear -- this a proposal for a query sub-language, not extensions or changes to TD. I used vaguely TD-like syntax rather than invent my own.

That might be a worthwhile language in its own right, but it would be a seriously hobbled Tutorial D to the point of un-usability, at least for the sorts of things I do with it.

That's the intention. This query language has no type system, so it definitely isn't TD or even another D. It's really just at the level of  SQL/DQL, not all the other stuff, just an augmented RA. It needs a host language and an extension language for all that other stuff. But we already have plenty of those.

 

Andl - A New Database Language - andl.org
Quote from dandl on March 30, 2020, 1:39 am

There are 3 places in TD that embed formulae into the RA. They look something like this (if I have the syntax right):

S WHERE STATUS > 20
EXTEND P : { WEIGHT_LB := WEIGHT * 2.2 }
SUMMARIZE SP BY S# :  { TOT_QTY := SUM(QTY) }

My proposal is to replace all occurrences of formulae by Function Relations, akin to Algebra-A relation literals and HHT algorithmic relations. Features:

  • An FR implements a specific single function, with known positional arguments and a known return type
  • An FR may be a predicate, a computation or an aggregation, substituting for WHERE, EXTEND or SUMMARIZE respectively
  • An FR is invoked by providing names for the attributes that occupy the positions of arguments (or literal arguments) and return type, in similar fashion to PROJECT and RENAME
  • There is a library of useful FRs; the means to add user-defined FRs is expected but outside this scope
  • The compiler is free to require explicit type matching or resolve overloaded FRs, as the implementer wishes.

S MATCHING GT{STATUS,20}
P JOIN TIMES{WEIGHT,2.2,WEIGHT_LB}
SP JOIN SUM{S#,QTY}

Note that the aggregation arguments are all the grouping attributes. The return type replaces the aggregated column of that same name. RENAME it if you want.

That's it. The result is a pure relational query sub-language, relations all the way down, guaranteed safe and terminating. It can be embedded into a host language or sent to a server for execution. It can even be cross-compiled into SQL or used to implement a visual programming language as per Knime. Obviously the limiting factor is the FR library available in the target environment.

Note the guarantee of safe and terminating imposes restrictions on FRs. In particular, an FR must not throw an error, must not cause a side-effect, must not loop indefinitely, etc. An FR that cannot return a value (eg divide by zero or NaN etc) will simply not return a value for those argument values. Unsafe FRs might be permitted if really needed, but that's outside scope.

I had a similar thought several years ago but when I tried to spell out to my coauthor he didn't understand me at all and I'm afraid I gave up the effort.

As a matter of fact the ISBL team had effectively the same idea back in the 1970s but they told me they couldn't make it work.  That was at a time when I didn't understand the idea either.

Hugh

 

Coauthor of The Third Manifesto and related books.

I think those examples are too simplistic, and incomplete. Some more grist (haven't checked the syntax):

S RENAME {SCITY := CITY}

S GROUP {STUFF := {SNAME, CITY, STATUS} }   // see also UNGROUP, WRAP, UNWRAP

Not rey my target, they don't evaluate a function or interact with the type system, and I"m not tackling RVA/TVAs for now.

EXTEND (P JOIN SP) : {EXT_WEIGHT_LB := (WEIGHT * 2.2) * QTY }   // what's the <attribute ref>(s) for the RHS of :=?

How would you do it in Algebra-A? Two separate FRs? Maybe this:

(P JOIN SP) JOIN TIMES{WEIGHT,2.2,XXX} JOIN TIMES{XXX,QTY,EXT_WEIGHT_LB}

or this:

(P JOIN SP) JOIN TIMES{WEIGHT,2.2,QTY,EXT_WEIGHT_LB}  // assumes a multi-argument TIMES FR

My proposal is to replace all occurrences of formulae by Function Relations, akin to Algebra-A relation literals and HHT algorithmic relations.

"akin" is unclear. So are you proposing something that's a relation or not? In what respects does it behave like a relation? For example, what is its key (declared or inferred), do Functional Dependencies hold?

It behaves as a function with relation(s) as arguments and return; the output is a relation and is consumed as such. I guess its arguments are the key, and the return value attribute is dependent on the arguments.

Features:

  • An FR implements a specific single function, with known positional arguments and a known return type

If the args are positional, use round parentheses not curly braces. Given that D&D have gone to such lengths to consistently use attribute naming rather than positional notation, it seems a backwards step to use positional arguments -- both Appendix A and HHT avoid that.

I used braces to emphasis that this is a name, not a value. The positional notation avoids the nastiness of the Algebra-A TIMES and its renames.

So WEIGHT in the first argument position for TIMES means that it is the first argument and also that it will be named WEIGHT in the generated output relation.

  • An FR may be a predicate, a computation or an aggregation, substituting for WHERE, EXTEND or SUMMARIZE respectively
  • An FR is invoked by providing names for the attributes that occupy the positions of arguments (or literal arguments) and return type, in similar fashion to PROJECT and RENAME

I don't understand how that works: an attribute name appearing in an argument position denotes what?

The name of that attribute in the output. Given:

P JOIN TIMES{WEIGHT,2.2,WEIGHT_LB}

The body of the FR as computed will be appear to be 4 tuples:{ WEIGHT 12.0, WEIGHT_LB 26.4 } and so on. As needed for the JOIN to work.

You also show numeric literals in argument position. What do they denote, and how does that differ from using an attribute name? If these are read-only functions I'm looking for referential transparency.

A literal constant value instead of a variable. Literal arguments do not appear in the output relation.

  • There is a library of useful FRs; the means to add user-defined FRs is expected but outside this scope
  • The compiler is free to require explicit type matching or resolve overloaded FRs, as the implementer wishes.

S MATCHING GT{STATUS,20}
P JOIN TIMES{WEIGHT,2.2,WEIGHT_LB}

See my less simplistic example above. The sub-term (WEIGHT * 2.2) doesn't have an attribute name.

The result of an EXTEND must have the same cardinality as the l.h. relation. How do you guarantee that using JOIN?

How could it be otherwise? As long as the FR provides a joinable tuple for each distinct pair of argument values, what else could it be?

SP JOIN SUM{S#,QTY}

No. That doesn't appear to be summarising SP. Explanation below is unclear.

Actually yes, this isn't right at all. Back to the drawing board.

 

Andl - A New Database Language - andl.org
Quote from dandl on March 30, 2020, 8:36 am

I guess there might be some pedagogical value in adding such a capability. I wouldn't consider replacing expressions with Function Relations (and I don't think D & D would consider it either) as it appears to make Tutorial D less practical, less ergonomic, and less intuitive, particularly for typical real world uses that involve considerably more complex expressions than those shown here.

Perhaps it wasn't clear -- this a proposal for a query sub-language, not extensions or changes to TD. I used vaguely TD-like syntax rather than invent my own.

Ah. Got it. Yes, I misunderstood.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Yes, just to be clear, as long as the goal is to produce a 'super-SQL' then TTM/D is a good try at what it might look like. The arcane type system is a huge hurdle, making it almost impossible to merge with any reasonably common GP language. Tweaking the type system might be the way to go.

But my goal here is quite different. The SQL DQL/DML is extraordinarily widely used as glue for application code to get access to data, but it fails if the data isn't in an RDBMS. You can't even do an SQL query on data in a spreadsheet, which much of it seems to be. The RA tells us the basic framework of how to do it, and we can knock up an implementation of the RA in no time at all, but the embedded expression language used in WHERE and EXTEND clauses and importing a type system is now the barrier. My Relation Functions inspired by Algebra-A and HHT are a way to resolve that.

The end result should be a query sub-language that is:

  • syntactically simple: just names, literals and some punctuation; plus assignment
  • semantically simple: just relations/tables all the way down
  • firmly based on the RA, so easy to explain
  • reasonably easy to compile and implement
  • can target local execution or SQL generation.

At the moment I'm still battling Knime and Java, so this will have to wait.

Andl - A New Database Language - andl.org

I had a similar thought several years ago but when I tried to spell out to my coauthor he didn't understand me at all and I'm afraid I gave up the effort.

As a matter of fact the ISBL team had effectively the same idea back in the 1970s but they told me they couldn't make it work.  That was at a time when I didn't understand the idea either.

Thanks Hugh. And then came SQL, and everyone thought the problem is solved. Except it keeps coming back, in the places SQL can't reach.

We've all had the benefit of a lot of thinking between then and now.

Andl - A New Database Language - andl.org

You might want to look at SQLish to get some ideas, even though it's not quite what you want.  The idea there is to provide something with the full power of the RM that still looks something like SQL.