The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Anonymous operators revisited

Page 1 of 2Next

I recently noticed that anonymous operators could be a way to pass expressions to operators, as in TD's WHERE operator. In order to define such an operator as a user-defined operator, the evaluation of the argument expression has to be deferred. Passing operators are a possible way to do this.

Imagine an operator which finds an element of an array:

OPERATOR find (a array int, cond operator(int) RETURNS boolean) RETURNS int;
  VAR i int;
  FOR i := 0 to length(a) - 1;
    IF cond(a[i]) THEN
      RETURN i;
    END IF;
  END FOR;
  RETURN -1;
END OPERATOR;

With anonymous operators, this operator could be used as follows:

VAR a array int;
-- initialize a
VAR i int;
-- Get array element with value 1
i := find (a,
    OPERATOR (v int) RETURNS boolean;
      RETURN v = 1;
    END OPERATOR;
);

To simplify this, syntactic sugar could be added in the style of Java's lambda syntax:

VAR a array int;
-- initialize a
VAR i int;
-- Get array element with value 1
i := find (a, v -> v = 1);

When anonymous operators are used in this way, it would be useful to bind variable references in the operator definition to variables in the enclosing scope. Of course this raises some questions (e.g. should they be writeable or read-only)?

 

Quote from Rene Hartmann on March 28, 2020, 9:18 am

I recently noticed that anonymous operators could be a way to pass expressions to operators, as in TD's WHERE operator. In order to define such an operator as a user-defined operator, the evaluation of the argument expression has to be deferred. Passing operators are a possible way to do this.

Imagine an operator which finds an element of an array:

OPERATOR find (a array int, cond operator(int) RETURNS boolean) RETURNS int;
  VAR i int;
  FOR i := 0 to length(a) - 1;
    IF cond(a[i]) THEN
      RETURN i;
    END IF;
  END FOR;
  RETURN -1;
END OPERATOR;

With anonymous operators, this operator could be used as follows:

VAR a array int;
-- initialize a
VAR i int;
-- Get array element with value 1
i := find (a,
    OPERATOR (v int) RETURNS boolean;
      RETURN v = 1;
    END OPERATOR;
);

To simplify this, syntactic sugar could be added in the style of Java's lambda syntax:

VAR a array int;
-- initialize a
VAR i int;
-- Get array element with value 1
i := find (a, v -> v = 1);

When anonymous operators are used in this way, it would be useful to bind variable references in the operator definition to variables in the enclosing scope. Of course this raises some questions (e.g. should they be writeable or read-only)?

It's generally reasonable for "update" lambdas to mutate their enclosing scope in an imperative language, though it might be desirable to make it at least somewhat awkward and syntactically obvious (e.g., like Java does, somewhat) to prevent accidental unpleasantness.

But a lambda used in a WHERE clause probably shouldn't be an "update" lambda. It should almost certainly be strictly read-only, not only because the idea of mutating state inside a WHERE should be generally questionable, but also because a side effect of query optimisation and internal representations might be that the lambda gets unpredictably executed more than once for some tuples and perhaps not at all for others.

Of course, a reasonable D could define WHERE so that it only accepts a lambda argument.

Should it be useful in subsequent discussion, here is my overview of lambdas implemented in Rel: https://reldb.org/c/index.php/read/anonymous-and-first-class-operators/

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Rene Hartmann on March 28, 2020, 9:18 am

I recently noticed that anonymous operators could be a way to pass expressions to operators, as in TD's WHERE operator. In order to define such an operator as a user-defined operator, the evaluation of the argument expression has to be deferred.

Recently? The technique you show below was supported in ALGOL 60. See for example 'Jensen's Device' in the appendix to the language Report; and of course this was run-of-the-mill with LISP (1964-ish) LAMBDA expressions.

"deferred" is an obtuse way to put it. If you're passing an operator/function/expression to a function, you're doing functional programming (which might or might not be within an explicitly FP language). A non-FP language example would be Google's map/reduce, where you pass a function that effects the mapping and a function that effects the reducing/aggregation. Then the way to describe the behaviour is not "evaluation ... has to be deferred": because at the point of passing the operator, it doesn't know what arguments its going to get. Instead say the function to which the operator is passed applies the operator (to arguments evaluated inside its scope) and evaluates the expression.

Passing operators are a possible way to do this.

See also 'partial application' and 'operator sections' in FP.

Imagine an operator which finds an element of an array:

OPERATOR find (a array int, cond operator(int) RETURNS boolean) RETURNS int;
  VAR i int;
  FOR i := 0 to length(a) - 1;
    IF cond(a[i]) THEN
      RETURN i;
    END IF;
  END FOR;
  RETURN -1;
END OPERATOR;

Hmm. There might be many elements of the array that meet cond( ).

And ugh if not found you return an invalid index. That's how Null got invented.

With anonymous operators, this operator could be used as follows:

VAR a array int;
-- initialize a
VAR i int;
-- Get array element with value 1
i := find (a,
    OPERATOR (v int) RETURNS boolean;
      RETURN v = 1;
    END OPERATOR;
);

To simplify this, syntactic sugar could be added in the style of Java's lambda syntax:

Yeah. Java got lambdas from FP; and I guess had to rather kludge it to make it fit Java's peculiarities. Why not go back to the original sources?

VAR a array int;
-- initialize a
VAR i int;
-- Get array element with value 1
i := find (a, v -> v = 1);

When anonymous operators are used in this way, it would be useful to bind variable references in the operator definition to variables in the enclosing scope.

You perhaps mean ALGOL's call-by simple name [per Jensen's device] vs call-by value. Or you perhaps mean partial evaluation/operator sections. Or perhaps you mean non-local scope: a bad idea.

Here's the standard library routine in Haskell

find :: Foldable t => (a -> Bool) -> t a -> Maybe a
find p = listToMaybe . concatMap (\ x -> if p x then [x] else [])

First line is the type signature. Working from the middle towards the outsides, find takes two arguments of which the second t a is a structure t (which might be an array)  over elements of type a; the first (a -> Bool) is a condition from element type a returning Bool. find's result is type Maybe a aka an 'option' type: it either returns the leftmost element it's found, tagged as Just x or it returns bare tag Nothing. What it doesn't return is a value that's out-of-type. The Foldable t => is a type-level restriction that the structure t supports iterating over, as with map, fold, concatMap.

The implementation (second line) actually builds a lazy structure with all the elements that match, then (somewhat arbitrarily) returns the leftmost. There's an explicit lambda expression passed as an argument (\ x -> ...) . The condition/predicate argument p to find gets applied to successive elements of the structure as (if p x then ...). Example use: note that String "hello" is a linear structure/sequence of characters.

Data.Foldable> find (== 'l') "hello"
Just 'l'
Data.Foldable> find (== 'g') "hello"
Nothing

(== 'l') is an operator section/partial application: operator == partially applied to its r.h. operand only; 'l' is single character "ell".

In general in Haskell, operations over Foldable structures lend themselves to parallel evaluation across the elements. That would be overkill in this example. So it'd be easy enough to produce a variant of find specialised to Arrays (or Lists) that returned a Maybe index/offset, rather than an element.

Of course this raises some questions (e.g. should they be writeable or read-only)?

 

No they shouldn't be writeable. Or do you want an invocation that fails to find anything to update the array/structure? Or to do arbitrary IO? Use functions/expressions that are guaranteed by their type to have no side-effects. Jensen's device turned out to be horrible in its side-effects.

Well, TD is not an FP language, and I don't want to turn it into one. The idea is to allow more genericity by making it possible to define user-defined operators that accept expressions, similar to the built-in WHERE operator. Adopting a few FP-ish features seems a possible way to do this.

Another idea is that, if the user-defined operator being passed has a tuple argument, an expression is accepted as a valid argument, and variables are bound against tuple attributes.

So the following code:

VAR a ARRAY TUPLE { foo CHAR, bar CHAR };
-- initialize a ...
VAR i int;
i := find (a,
    OPERATOR (t TUPLE { * }) RETURNS boolean;
      RETURN t.foo = 'Iwantthis';
    END OPERATOR;
);

could be simplified to:

VAR a ARRAY TUPLE { foo CHAR, bar CHAR };
-- initialize a ...
VAR i int;
i := find (a, foo = 'Iwantthis');

That way we would be even closer to WHERE.

Quote from Rene Hartmann on March 28, 2020, 12:09 pm

Well, TD is not an FP language, and I don't want to turn it into one.

I wasn't getting into language wars or suggesting TD should become an FP language. The obvious improvement to any language over relations-as-sets is to borrow ideas of map/reduce/fold with anonymous functions/lambdas applying to the elements of the set, viz. tuples. That's what Hall, Hitchcock and Todd 1975 were tackling -- way ahead of their time.

The idea is to allow more genericity by making it possible to define user-defined operators that accept expressions, similar to the built-in WHERE operator.

Hmm. The 'Open expression' [D&D's term] on rhs of WHERE is not an argument/not first-class. It (and 'Open expressions' generally) are strange beasts in terms of programming language typing and semantics.

Adopting a few FP-ish features seems a possible way to do this.

Another idea is that, if the user-defined operator being passed has a tuple argument, an expression is accepted as a valid argument, and variables are bound against tuple attributes.

So the following code:

VAR a TUPLE { foo CHAR, bar CHAR };
-- initialize a ...
VAR i int;
i := find (a,
    OPERATOR (t TUPLE { * }) RETURNS boolean;
      RETURN t.foo = 'Iwantthis';
    END OPERATOR;
);

(I'll overlook your wanting to turn Tutorial D into SQL: t.foo ?? foo FROM t.)

We've frequently considered such genericity: it seems to be beyond Tutorial D. What's the type of the OPERATOR in your example? It mentions attribute foo but it seems it's required to apply over tuples with possibly more attributes. Furthermore it requires foo to be of type CHAR. How do you express that in the type signature (explicit or inferred) for the OPERATOR?

could be simplified to:

VAR a TUPLE { foo CHAR, bar CHAR };
-- initialize a ...
VAR i int;
i := find (a, foo = 'Iwantthis');

That way we would be even closer to WHERE.

Getting closer to WHERE is probably not a Good Thing, see above. Is your find a function? What's its type/it would have to be ad-hoc polymorphic. What's that foo = 'Iwantthis' bit? What is its type?

Addit: em I think you've pushed your find beyond its original intent. It's going to return an Int having been applied to a TUPLE? Perhaps you intend VAR a be an array of TUPLE?

In passing:the RA and TTM/D implementations typically treat the argument of WHERE/EXTEND or equivalent much like a lambda, with the arguments being the attributes of the 'current tuple'. Andl implements aggregation using fold(), and that takes a function as an argument (to be iterated as per OO Pre 6).

Otherwise, these are routine issues that arise in the design of any language. Nothing very new.

Andl - A New Database Language - andl.org
Quote from dandl on March 28, 2020, 1:10 pm

In passing:the RA and TTM/D implementations typically treat the argument of WHERE/EXTEND or equivalent much like a lambda, with the arguments being the attributes of the 'current tuple'.

Then what type to they infer for the lambda expression? Must have attribute foo at type CHAR; might have other attributes at types unknown. I guess they infer a distinct type at each WHERE/EXTEND/etc expression:

S WHERE S# = S#('S3')
SP WHERE S# = S#('S3')

Those two S# = S#('S3') are distinct lambda expressions, despite their superficial identicality.

What I like about the Appendix A treatment is that it transforms WHERE to <AND>, and transforms the 'Open expression' to a relation literal (or an algorithmic relation, in HHT 1975 terms). OK a relation literal is first-class, a very familiar beast.

And what I like about Tropashko's treatment on top of Appendix A is that it transforms all RA operators to take only relations as operands and return only relations as result. No more of those weird subscripty-things in angle brackets that are in Codd's RA. And we can in effect treat attribute-comma-lists as first-class, manipulate them as sets of attributes, etc.

Quote from AntC on March 28, 2020, 1:58 pm
Quote from dandl on March 28, 2020, 1:10 pm

In passing:the RA and TTM/D implementations typically treat the argument of WHERE/EXTEND or equivalent much like a lambda, with the arguments being the attributes of the 'current tuple'.

Then what type to they infer for the lambda expression? Must have attribute foo at type CHAR; might have other attributes at types unknown. I guess they infer a distinct type at each WHERE/EXTEND/etc expression:

S WHERE S# = S#('S3')
SP WHERE S# = S#('S3')

Those two S# = S#('S3') are distinct lambda expressions, despite their superficial identicality.

What I like about the Appendix A treatment is that it transforms WHERE to <AND>, and transforms the 'Open expression' to a relation literal (or an algorithmic relation, in HHT 1975 terms). OK a relation literal is first-class, a very familiar beast.

And what I like about Tropashko's treatment on top of Appendix A is that it transforms all RA operators to take only relations as operands and return only relations as result. No more of those weird subscripty-things in angle brackets that are in Codd's RA. And we can in effect treat attribute-comma-lists as first-class, manipulate them as sets of attributes, etc.

At the expense of giving up a certain degree of static type safety.

Tradeoffs, as usual.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from AntC on March 28, 2020, 12:52 pm

Addit: em I think you've pushed your find beyond its original intent. It's going to return an Int having been applied to a TUPLE? Perhaps you intend VAR a be an array of TUPLE?

Yes, it must be an array of TUPLE, I forgot the ARRAY. I edited my post accordingly.

find() would be OPERATOR find (a ARRAY TUPLE { * }, cond operator(TUPLE { * }) RETURNS boolean) RETURNS int;

 

Quote from AntC on March 28, 2020, 1:58 pm
Quote from dandl on March 28, 2020, 1:10 pm

In passing:the RA and TTM/D implementations typically treat the argument of WHERE/EXTEND or equivalent much like a lambda, with the arguments being the attributes of the 'current tuple'.

Then what type to they infer for the lambda expression? Must have attribute foo at type CHAR; might have other attributes at types unknown. I guess they infer a distinct type at each WHERE/EXTEND/etc expression:

S WHERE S# = S#('S3')
SP WHERE S# = S#('S3')

Those two S# = S#('S3') are distinct lambda expressions, despite their superficial identicality.

Yes they are. But there would seem to be nothing wrong with:

S WHERE FX
LET FX := T => T.S# = S#('S3')

What I like about the Appendix A treatment is that it transforms WHERE to <AND>, and transforms the 'Open expression' to a relation literal (or an algorithmic relation, in HHT 1975 terms). OK a relation literal is first-class, a very familiar beast.

IMO HHT is closer than App-A. Those are not relation literals, those are relation functions. Something like

TIMES{A,B}) => { A,B,C := A*B }

And what I like about Tropashko's treatment on top of Appendix A is that it transforms all RA operators to take only relations as operands and return only relations as result. No more of those weird subscripty-things in angle brackets that are in Codd's RA. And we can in effect treat attribute-comma-lists as first-class, manipulate them as sets of attributes, etc.

That would certainly be my preference. I wasn't aware that his was a completed work.

Andl - A New Database Language - andl.org
Page 1 of 2Next