An alternative to open expressions

#1 · July 21, 2020, 1:05 am

Just to make my position clear, the 'open expression' on p3 of the Tutorial D spec is effectively a combination of two features found in other languages:

an iterator, defined by the places where an OE can appear
a parameterless anonymous function (or lambda) with names in an outer scope provided by the iterator.

The alternative I present is the relcon (per App-A) or relfun (relational function), defined as an operator in an Extended Relational Algebra (ERA), which extends App-A.

This is the formalisation of a relational constant or ‘relcon’, and relies on some previously defined function f. Separate formalisations are provided for monadic and dyadic functions. Extending them to n-adic functions is left as an exercise.

Monadic
Given attribute names X,Y, types Tx,Ty and function f with the type signature Tx->Ty then s = RELCON(X,Y,f) is defined as follows. 

Hs = {<X,Tx>, <Y,Ty>}
Bs = { ts : ∃ vx ∈ Tx, ∃ vy ∈ Ty, f(vx) = vy,
   ts = {<X,Tx,vx>, <Y,Ty,vy>} }

Dyadic
Given attribute names X,Y,Z, types Tx,Ty,Tz and function f with the type signature Tx->Ty->Tz then s = RELCON(X,Y,Z,f) is defined as follows. 

Hs = {<X,Tx>, <Y,Ty>, <Z,Tz>}
Bs = { ts : ∃ vx ∈ Tx, ∃ vy ∈ Ty, ∃ vz ∈ Tz, f(vx,vy) = vz),
   ts = {<X,Tx,vx>, <Y,Ty,vy, <Z,Tz,vz>} }

Wherever an open expression appears, it can be replaced by the shorthand AND RELCON(). Note that there is no iterator and no lambda.

Andl - A New Database Language - andl.org

#2 · July 21, 2020, 4:15 am

Quote from AntC on July 21, 2020, 4:15 am

Quote from dandl on July 21, 2020, 1:05 am

Just to make my position clear, the 'open expression' on p3 of the Tutorial D spec is effectively a combination of two features found in other languages:

an iterator, defined by the places where an OE can appear

"iterator" sounds too much like an implementation/algorithmic concern: TTM does not prescribe implementation; it wants to prescribe semantics. (I'm not saying it's always very clear as to the semantics.) The semantics is that the open expression is to apply for each tuple from a source [**], defined by the context in which the OE appears.

[**] remember OEs can appear in tuple expressions (so the 'source' is that single tuple) as well as in relational expressions (EXTEND, WHERE, ...) or aggregating expressions (SUM(SP JOIN P, QTY * WEIGHT)).

The implementation might use something similar to Google's map-reduce or Functional Programming's 'fold'. 'Iterate' to my mind carries with it a notion of state (a counter or indexed looping), which sits poorly with those other abstractions. Specifically, the map step of map-reduce is usually treated as stateless; FP's non-strict fold (aka 'right fold') can be executed in parallel, whereas there's a strict variant (aka 'left fold') that's implemented via a single accumulator which therefore can't be executed in parallel.

a parameterless anonymous function (or lambda) with names in an outer scope provided by the iterator.

The names brought into scope are specifically attribute names from an outer tuple expression or relational expression. That might be via something as simple as a relvar name; then I'm not seeing your "anonymous" is making anything clear. Neither is it useful to think of a relvar or relation expression as a function.

Oh, you mean the Open Expression denotes an anonymous function. That wasn't clear. The 'function' might be as simple as a single attribute name. Then calling that "anonymous" isn't very helpful.

Wherever an open expression appears, it can be replaced by the shorthand AND RELCON(). ...

I'm not seeing how that goes with SUM(SP, QTY) or SUM(SP JOIN P, QTY * WEIGHT). How/where do I insert AND RELCON()?

To recap:

Just to make my position clear, ...

I think you haven't. You've managed to be both too vague/high-level; and too implementation-specific.

Quote from dandl on July 21, 2020, 1:05 am

Just to make my position clear, the 'open expression' on p3 of the Tutorial D spec is effectively a combination of two features found in other languages:

an iterator, defined by the places where an OE can appear

"iterator" sounds too much like an implementation/algorithmic concern: TTM does not prescribe implementation; it wants to prescribe semantics. (I'm not saying it's always very clear as to the semantics.) The semantics is that the open expression is to apply for each tuple from a source [**], defined by the context in which the OE appears.

[**] remember OEs can appear in tuple expressions (so the 'source' is that single tuple) as well as in relational expressions (EXTEND, WHERE, ...) or aggregating expressions (SUM(SP JOIN P, QTY * WEIGHT)).

The implementation might use something similar to Google's map-reduce or Functional Programming's 'fold'. 'Iterate' to my mind carries with it a notion of state (a counter or indexed looping), which sits poorly with those other abstractions. Specifically, the map step of map-reduce is usually treated as stateless; FP's non-strict fold (aka 'right fold') can be executed in parallel, whereas there's a strict variant (aka 'left fold') that's implemented via a single accumulator which therefore can't be executed in parallel.

a parameterless anonymous function (or lambda) with names in an outer scope provided by the iterator.

The names brought into scope are specifically attribute names from an outer tuple expression or relational expression. That might be via something as simple as a relvar name; then I'm not seeing your "anonymous" is making anything clear. Neither is it useful to think of a relvar or relation expression as a function.

Oh, you mean the Open Expression denotes an anonymous function. That wasn't clear. The 'function' might be as simple as a single attribute name. Then calling that "anonymous" isn't very helpful.

Wherever an open expression appears, it can be replaced by the shorthand AND RELCON(). ...

I'm not seeing how that goes with SUM(SP, QTY) or SUM(SP JOIN P, QTY * WEIGHT). How/where do I insert AND RELCON()?

To recap:

Just to make my position clear, ...

I think you haven't. You've managed to be both too vague/high-level; and too implementation-specific.

#3 · July 21, 2020, 6:47 am

Quote from dandl on July 21, 2020, 6:47 am

Quote from AntC on July 21, 2020, 4:15 am

Quote from dandl on July 21, 2020, 1:05 am

Just to make my position clear, the 'open expression' on p3 of the Tutorial D spec is effectively a combination of two features found in other languages:

an iterator, defined by the places where an OE can appear

"iterator" sounds too much like an implementation/algorithmic concern: TTM does not prescribe implementation; it wants to prescribe semantics. (I'm not saying it's always very clear as to the semantics.) The semantics is that the open expression is to apply for each tuple from a source [**], defined by the context in which the OE appears.

The moment you say "for each" you are describing an iterator. Open expressions are not prescribed by TTM, they appear only in the language TD. I am drawing comparisons between this language and other languages, and I am saying that the semantic intent of TTM is better served by the mechanism described below than by the iterator-lambda-with OE.

[**] remember OEs can appear in tuple expressions (so the 'source' is that single tuple) as well as in relational expressions (EXTEND, WHERE, ...) or aggregating expressions (SUM(SP JOIN P, QTY * WEIGHT)).

Ah, yes. I only ever intended to propose an alternative to one use of OE, not to all of them. I do apologise, I should have made it clearer that my comments apply only to the OE as it appears in particular contexts, which I did not identify. They are more or less (a) and (d) in the list on p3 (dealing with sets of tuples in WHERE, EXTEND, UPDATE, DELETE). I have another operator to deal with aggregation, and the remaining features appear to be language specific, not relevant here.

The implementation might use something similar to Google's map-reduce or Functional Programming's 'fold'. 'Iterate' to my mind carries with it a notion of state (a counter or indexed looping), which sits poorly with those other abstractions. Specifically, the map step of map-reduce is usually treated as stateless; FP's non-strict fold (aka 'right fold') can be executed in parallel, whereas there's a strict variant (aka 'left fold') that's implemented via a single accumulator which therefore can't be executed in parallel.

I agree and I'm familiar with them, but they don't help. You still need an iterator to feed the map-reduce engine, and the OE depends on the iterator for access to attributes.

a parameterless anonymous function (or lambda) with names in an outer scope provided by the iterator.

The names brought into scope are specifically attribute names from an outer tuple expression or relational expression. That might be via something as simple as a relvar name; then I'm not seeing your "anonymous" is making anything clear. Neither is it useful to think of a relvar or relation expression as a function.

Anonymous function is the concept, lambda is what it's called in specific languages (originally Lisp, now Java, C#, etc).

Oh, you mean the Open Expression denotes an anonymous function. That wasn't clear. The 'function' might be as simple as a single attribute name. Then calling that "anonymous" isn't very helpful.

I thought I made that very clear: an OE (of this kind) is what in other languages would be the combination of

an iterator: IEnumerable in C#, iterator in C++ or Java

a parameterless anonymous function (lambda in most languages, pointer-to-function in C++)

a scope inclusion mechanism (similar to with in Pascal or using in C#).

If you're not very familiar with all those languages, the comparisons may not help.

Wherever an open expression appears, it can be replaced by the shorthand AND RELCON(). ...

I'm not seeing how that goes with SUM(SP, QTY) or SUM(SP JOIN P, QTY * WEIGHT). How/where do I insert AND RELCON()?

Sorry, they don't, see above. Maybe in a later post.

To recap:

Just to make my position clear, ...

I think you haven't. You've managed to be both too vague/high-level; and too implementation-specific.

No, this is not about implementation. The first part is about comparing features in different languages; the second is about building new shorthands on top of an App-A inspired ERA. Implementation comes much later.

Quote from AntC on July 21, 2020, 4:15 am

Quote from dandl on July 21, 2020, 1:05 am

Just to make my position clear, the 'open expression' on p3 of the Tutorial D spec is effectively a combination of two features found in other languages:

an iterator, defined by the places where an OE can appear

"iterator" sounds too much like an implementation/algorithmic concern: TTM does not prescribe implementation; it wants to prescribe semantics. (I'm not saying it's always very clear as to the semantics.) The semantics is that the open expression is to apply for each tuple from a source [**], defined by the context in which the OE appears.

The moment you say "for each" you are describing an iterator. Open expressions are not prescribed by TTM, they appear only in the language TD. I am drawing comparisons between this language and other languages, and I am saying that the semantic intent of TTM is better served by the mechanism described below than by the iterator-lambda-with OE.

[**] remember OEs can appear in tuple expressions (so the 'source' is that single tuple) as well as in relational expressions (EXTEND, WHERE, ...) or aggregating expressions (SUM(SP JOIN P, QTY * WEIGHT)).

Ah, yes. I only ever intended to propose an alternative to one use of OE, not to all of them. I do apologise, I should have made it clearer that my comments apply only to the OE as it appears in particular contexts, which I did not identify. They are more or less (a) and (d) in the list on p3 (dealing with sets of tuples in WHERE, EXTEND, UPDATE, DELETE). I have another operator to deal with aggregation, and the remaining features appear to be language specific, not relevant here.

The implementation might use something similar to Google's map-reduce or Functional Programming's 'fold'. 'Iterate' to my mind carries with it a notion of state (a counter or indexed looping), which sits poorly with those other abstractions. Specifically, the map step of map-reduce is usually treated as stateless; FP's non-strict fold (aka 'right fold') can be executed in parallel, whereas there's a strict variant (aka 'left fold') that's implemented via a single accumulator which therefore can't be executed in parallel.

I agree and I'm familiar with them, but they don't help. You still need an iterator to feed the map-reduce engine, and the OE depends on the iterator for access to attributes.

a parameterless anonymous function (or lambda) with names in an outer scope provided by the iterator.

The names brought into scope are specifically attribute names from an outer tuple expression or relational expression. That might be via something as simple as a relvar name; then I'm not seeing your "anonymous" is making anything clear. Neither is it useful to think of a relvar or relation expression as a function.

Anonymous function is the concept, lambda is what it's called in specific languages (originally Lisp, now Java, C#, etc).

Oh, you mean the Open Expression denotes an anonymous function. That wasn't clear. The 'function' might be as simple as a single attribute name. Then calling that "anonymous" isn't very helpful.

I thought I made that very clear: an OE (of this kind) is what in other languages would be the combination of

an iterator: IEnumerable in C#, iterator in C++ or Java
a parameterless anonymous function (lambda in most languages, pointer-to-function in C++)
a scope inclusion mechanism (similar to with in Pascal or using in C#).

If you're not very familiar with all those languages, the comparisons may not help.

Wherever an open expression appears, it can be replaced by the shorthand AND RELCON(). ...

I'm not seeing how that goes with SUM(SP, QTY) or SUM(SP JOIN P, QTY * WEIGHT). How/where do I insert AND RELCON()?

Sorry, they don't, see above. Maybe in a later post.

To recap:

Just to make my position clear, ...

I think you haven't. You've managed to be both too vague/high-level; and too implementation-specific.

No, this is not about implementation. The first part is about comparing features in different languages; the second is about building new shorthands on top of an App-A inspired ERA. Implementation comes much later.

Andl - A New Database Language - andl.org

#4 · July 21, 2020, 7:08 am

An operator like WHERE is pretty simple (and dare I say it, rather familiarly SQL-like.) You can say things like S WHERE STATUS = 20. No matter what conceptual hurdles there might be to appreciate (or implement) its semantics in terms of language or model theory, it's hard to beat its simplicity from a user's point of view.

How would you write S WHERE STATUS = 20 under your alternative?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#5 · July 21, 2020, 9:09 am

Quote from Dave Voorhis on July 21, 2020, 7:08 am

An operator like WHERE is pretty simple (and dare I say it, rather familiarly SQL-like.) You can say things like

S WHERE STATUS = 20

S WHERE STATUS = 20. No matter what conceptual hurdles there might be to appreciate (or implement) its semantics in terms of language or model theory, it's hard to beat its simplicity from a user's point of view.

How would you write

S WHERE STATUS = 20

S WHERE STATUS = 20 under your alternative?

I'd write S JOIN REL{TUP{STATUS 20}}

S WHERE STATUS > 20 would be trickier.

Hugh

Coauthor of The Third Manifesto and related books.

#6 · July 21, 2020, 10:54 am

Quote from Hugh on July 21, 2020, 9:09 am

Quote from Dave Voorhis on July 21, 2020, 7:08 am

An operator like WHERE is pretty simple (and dare I say it, rather familiarly SQL-like.) You can say things like

S WHERE STATUS = 20

S WHERE STATUS = 20. No matter what conceptual hurdles there might be to appreciate (or implement) its semantics in terms of language or model theory, it's hard to beat its simplicity from a user's point of view.

How would you write

S WHERE STATUS = 20

S WHERE STATUS = 20 under your alternative?

I'd write S JOIN REL{TUP{STATUS 20}}

Indeed, but all the intuitiveness and readability of S WHERE STATUS = 20goes away, at least for a typical user.

S WHERE STATUS > 20 would be trickier.

That's what I was going to ask next. :-)

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#7 · July 22, 2020, 12:48 am

The answer comes is two different forms.

First, is the answer in terms of the algebraic formulation the shorthands will expand to. We assume the existence of an external provider of an Integer type and a library of comparison functions and literal values of that type. Attribute names are single-quoted strings. Then the answers are:

AND(RELCON('STATUS',IntegerEq(20)))

AND(RELCON('STATUS',IntegerGt(20)))

The other form is whatever the language designer chooses, provided it uses shorthands that can be expanded into the algebraic form. If you like TD/SQL syntax, then go with that. Just bear in mind that STATUS > 20 is not an open expression in this formulation, it's a shorthand for an ERA formula. There is no iterator, no 'current tuple' and symbol scopes may work differently.

And before you ask, there is no restriction on what kind of function that might be. For example:

AND(RELCON('STATUS',Eval("$1 ge 20 and $1 lt 30")))

AND(RELCON('STATUS',v => v >= 20 && v < 30))

The last one really is a lambda, and is how I implemented this in my C#-as-D project.

Andl - A New Database Language - andl.org

#8 · July 23, 2020, 8:32 am

Extending the previous post, there are actually two kinds of relcon: a predicate, and one that returns a value. They are defined formally as follows.

a)	Monadic predicate
Given attribute names X, types Tx and function f with the type signature Tx->Boolean then s = RELCON(X,f) is defined as follows. 

Hs = {<X,Tx>}
Bs = { ts : ∃ vx ∈ Tx, f(vx) = true, ts = {<X,Tx,vx>} }

b)	Monadic value returning
Given attribute names X,Y, types Tx,Ty and function f with the type signature Tx->Ty then s = RELCON(X,Y,f) is defined as follows. 

Hs = {<X,Tx>, <Y,Ty>}
Bs = { ts : ∃ vx ∈ Tx, ∃ vy ∈ Ty, f(vx) = vy,
   ts = {<X,Tx,vx>, <Y,Ty,vy>} }

In the first, the relcon only contains tuples for which the function returns true. This is the one to use as the basis for the WHERE shorthand.

In the second, the relcon contains a tuple for every distinct set of arguments, and the return value of the function. This is the one to use as the basis for the EXTEND shorthand.

Andl - A New Database Language - andl.org

#9 · July 23, 2020, 11:59 am

Quote from dandl on July 23, 2020, 8:32 am

Extending the previous post, there are actually two kinds of relcon: a predicate, and one that returns a value.

HHT 1975 (and Appendix A) see that as one algorithmic relation (or one relcon) with two uses. Supposing we have to hand a Appendix A-style relcon PLUS with attributes {X, Y, Z} such that X + Y == Z for all tuples. Consider two queries:

Given relvar R1 with attributes {X, Y, Z} amongst others: R1 WHERE (X + Y) = Z
Given relvar R2 with attributes {X, Y} amongst others, but lacking Z: EXTEND R2 : { Z := X + Y}

Both queries can be implemented in the A algebra using <AND>:

R1 <AND> PLUS
R2 <AND> PLUS

Why would I want to use a different relcon for WHERE vs for EXTEND? Similar considerations using an equi-relcon for R1 WHERE X = Z and RENAME R2 {Z := X}. (Achieve the RENAME using COMPOSE, which has an <AND> inside its implementation.)

Appendix A and HHT 1975 seem to have this already figured out.

Is it worth some different machinery for WHERE conditions mentioning only a single attribute? I can't see why: again use <AND> to a single-attribute relcon.

#10 · July 23, 2020, 2:15 pm

Quote from dandl on July 23, 2020, 2:15 pm

Quote from AntC on July 23, 2020, 11:59 am

Quote from dandl on July 23, 2020, 8:32 am

Extending the previous post, there are actually two kinds of relcon: a predicate, and one that returns a value.

HHT 1975 (and Appendix A) see that as one algorithmic relation (or one relcon) with two uses. Supposing we have to hand a Appendix A-style relcon PLUS with attributes {X, Y, Z} such that X + Y == Z for all tuples. Consider two queries:

Given relvar R1 with attributes {X, Y, Z} amongst others: R1 WHERE (X + Y) = Z

Given relvar R2 with attributes {X, Y} amongst others, but lacking Z: EXTEND R2 : { Z := X + Y}

I understand the point and I considered it carefully. While there is a superficial attraction in using the same relcon for two different purposes, it only works in practice for a narrow set of cases. I see no way to use it as the basis for a construct such as WHERE STATUS >= 20. If you have an answer for that, I'd be interested.

Both queries can be implemented in the A algebra using <AND>:

R1 <AND> PLUS

R2 <AND> PLUS

Why would I want to use a different relcon for WHERE vs for EXTEND?

Because WHERE rests naturally on a predicate function and EXTEND rests naturally on a value-returning function. I see no point in trying to combine them.

Similar considerations using an equi-relcon for R1 WHERE X = Z and RENAME R2 {Z := X}. (Achieve the RENAME using COMPOSE, which has an <AND> inside its implementation.)

The first by your method would be AND(R1,RELCON('X','Z',dup)), where dup is a function that returns its argument as its value. By my preferred method it would be AND(R1,RELCON('X','Z',equals)). Not much to choose between them.

RENAME would be REMOVE(AND(R2,RELCON('Z','X',dup)),'X'). I think App-A already points out that RENAME is not primitive. Finding a minimal set was never my goal.

Appendix A and HHT 1975 seem to have this already figured out.

By my reading they were still at an early stage in 'figuring it out'.

Is it worth some different machinery for WHERE conditions mentioning only a single attribute? I can't see why: again use <AND> to a single-attribute relcon.

That's exactly what I propose. What do you propose differently?

Quote from AntC on July 23, 2020, 11:59 am

Quote from dandl on July 23, 2020, 8:32 am

Extending the previous post, there are actually two kinds of relcon: a predicate, and one that returns a value.

HHT 1975 (and Appendix A) see that as one algorithmic relation (or one relcon) with two uses. Supposing we have to hand a Appendix A-style relcon PLUS with attributes {X, Y, Z} such that X + Y == Z for all tuples. Consider two queries:

Given relvar R1 with attributes {X, Y, Z} amongst others: R1 WHERE (X + Y) = Z

Given relvar R2 with attributes {X, Y} amongst others, but lacking Z: EXTEND R2 : { Z := X + Y}

I understand the point and I considered it carefully. While there is a superficial attraction in using the same relcon for two different purposes, it only works in practice for a narrow set of cases. I see no way to use it as the basis for a construct such as WHERE STATUS >= 20. If you have an answer for that, I'd be interested.

Both queries can be implemented in the A algebra using <AND>:

R1 <AND> PLUS

R2 <AND> PLUS

Why would I want to use a different relcon for WHERE vs for EXTEND?

Because WHERE rests naturally on a predicate function and EXTEND rests naturally on a value-returning function. I see no point in trying to combine them.

Similar considerations using an equi-relcon for R1 WHERE X = Z and RENAME R2 {Z := X}. (Achieve the RENAME using COMPOSE, which has an <AND> inside its implementation.)

The first by your method would be AND(R1,RELCON('X','Z',dup)), where dup is a function that returns its argument as its value. By my preferred method it would be AND(R1,RELCON('X','Z',equals)). Not much to choose between them.

RENAME would be REMOVE(AND(R2,RELCON('Z','X',dup)),'X'). I think App-A already points out that RENAME is not primitive. Finding a minimal set was never my goal.

Appendix A and HHT 1975 seem to have this already figured out.

By my reading they were still at an early stage in 'figuring it out'.

Is it worth some different machinery for WHERE conditions mentioning only a single attribute? I can't see why: again use <AND> to a single-attribute relcon.

That's exactly what I propose. What do you propose differently?

Andl - A New Database Language - andl.org

The Forum for Discussion about The Third Manifesto and Related Matters

An alternative to open expressions