The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

What are the nodes in a Visual RA?

PreviousPage 3 of 4Next
Quote from dandl on March 22, 2020, 12:37 am
Quote from Erwin on March 21, 2020, 4:57 pm
Quote from dandl on March 21, 2020, 12:47 pm

As presented in TD [EXTEND] is Turing Complete, so it can be used to write unsafe and non-terminating queries.

I'm curious.  You have an example of a query where the EXTEND is "unsafe" or "non-terminating" ?  Or an idea of the kind of strategy one would have to follow if one wanted to actually write one such ?

Only way I see is to include an "unsafe" or "non-terminating" expression in the extend list itself.  But that is not a problem of the RA part, that's a problem of the "scalar algebra" part ...

As far as I know any query in TD is 'unsafe', in the sense that it may fail to terminate, may return different outputs for identical inputs or may trigger an arbitrary error. EXTEND is just a special case of 'unsafeness' because the new value is provided by the invocation of an operator, which itself is 'unsafe'.

The particular language features responsible include loops, recursion, global state accessible to local scope, and language constructs that trigger runtime errors.

These are all true. They're common to all general-purpose programming languages. TD is an illustration of a general-purpose programming language with integrated persistence and operators based on the relational model. It is not a query language, though you can write expressions in it that are notionally equivalent to those in a query language.

But can it crash, fail to terminate, exhibit impure or non-idempotent behaviour, etc.?

Of course. With programming language power comes a certain responsibility. Take away the need for that responsibility and you lose power.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on March 21, 2020, 2:39 pm
Quote from dandl on March 21, 2020, 12:47 pm
Quote from Dave Voorhis on March 21, 2020, 8:52 am
Quote from dandl on March 20, 2020, 11:10 pm

An AST does not represent data flow, it is merely a diagrammatic representation of function application. LISP without the parentheses. Wrong paradigm.

No, but it can be subverted to become dataflow-like.  Given an EXTEND expression like p := 2 * (y + 3), it obviously converts to an AST, but the AST is notionally akin to a dataflow something like:

[y] --\
      [+] --\
[3] --/     [*] --> [p}
[2] --------/

I.e., the value of y and the value of the literal 3 flow into the '+' operator, the result of which flows into the '*' operator, etc.

I understood what you meant, but it doesn't help. If the nodes and edges are constrained to operate on tables or even rows, you can't easily subvert them to operate on simple values. Certainly Knime won't allow it.

Ah, a limitation of KNIME.

More a limitation of the dataflow paradigm. Knime has variables and loops and other stuff, but this is very much about seeing how far the pure dataflow can get.

So far, my efforts at using text expressions hinder transparency. With the EXTEND model you see the output of a node that adds a new value, but you can't see what it's inputs are except by parsing the text. I don't find that a sweet spot at all.

Different strokes for different folks, I guess.

I'm pondering a node that acts purely as a function. It takes a relation as input but chooses which attributes it will use as inputs/arguments. It outputs a relation comprising only the function arguments and result. It does not EXTEND an input relation, it merely computes a NEW VALUE. Then as per A and HHT, you are free to JOIN that output as you wish.

I don't mind the computed function being very simple and just picked from a list, or arbitrarily complex and written using a sub-language, as long as the inputs and outputs are transparent.

Yes, calling functions is reasonable, though if you integrate editing them into the environment you pretty much wind up with something like Rel's visual query language -- nodes for RA operators and text for scalar (and certain sub-) expressions.

The point is: getting away from the EXTEND paradigm. As per App-A and HHT you can model TIMES as a relation operator: relation in, relation out.

  • The input is some relation with one or more numeric attributes, say {{ A=3,B=7,other... },{ A=2,B=11,other...}, ...}
  • The output is {{ A=3, B=7, C=21 },{ A=2,B=11,C=22}, ...}

The output is join-compatible with the input.

I didn't know EXTEND was a paradigm; I thought it was an operator. Are you trying to get away from it because of other KNIME limitations?

There is nothing quite like EXTEND 'as an operator' in any other language I know, so it's hard to know exactly what abstraction it represents. It has a heritage in SQL, but it's much more powerful. As presented in TD it's Turing Complete, so it can be used to write unsafe and non-terminating queries, which is definitely not a normal expectation of the RA.

True, but TD is a conventional programming language with database capabilities, rather than a query language.

The abstraction EXTEND represents is EXTEND.

The abstraction EXTEND represents is JOIN, specialised to the case where the r.h. operand has attribute(s) not in common with the l.h. operand, and which are Functionally Determined from those that are in common.

We might express the FD columns using a formula and what looks like assignment to one of the attributes not in common. That formula might involve relational operators to other relvars/queries from the database, arbitrary computation, possibly non-terminating computation, possibly TCLOSE over some relvar. But it's still a JOIN across an FD. Otherwise EXTEND would have the potential for returning a different number of tuples to what started in the l.h. operand.

I don't get why David says "There is nothing quite like EXTEND 'as an operator' in any other language I know". There is map over a collection in any number of languages, not just FP, and in particular the map part of map-reduce. map can return a collection where the elements are of different type to what it started with -- in particular with extra components/fields.

There is nothing quite like EXTEND 'as an operator' in any other language I know, so it's hard to know exactly what abstraction it represents. It has a heritage in SQL, but it's much more powerful. As presented in TD it's Turing Complete, so it can be used to write unsafe and non-terminating queries, which is definitely not a normal expectation of the RA.

True, but TD is a conventional programming language with database capabilities, rather than a query language.

The abstraction EXTEND represents is EXTEND.

The abstraction EXTEND represents is JOIN, specialised to the case where the r.h. operand has attribute(s) not in common with the l.h. operand, and which are Functionally Determined from those that are in common.

Good summary. I agree. If you project away all the LH attributes that are not arguments to the function, you have a a relational Function (new value) operator.

We might express the FD columns using a formula and what looks like assignment to one of the attributes not in common. That formula might involve relational operators to other relvars/queries from the database, arbitrary computation, possibly non-terminating computation, possibly TCLOSE over some relvar. But it's still a JOIN across an FD. Otherwise EXTEND would have the potential for returning a different number of tuples to what started in the l.h. operand.

I don't get why David says "There is nothing quite like EXTEND 'as an operator' in any other language I know". There is map over a collection in any number of languages, not just FP, and in particular the map part of map-reduce. map can return a collection where the elements are of different type to what it started with -- in particular with extra components/fields.

Sorry, my bad. I did intend 'query language'. EXTEND-like constructs are widespread in GP languages. I'm looking for EXTEND-like features in a language that has roughly the capability of App-A, essentially a FOL RA augmented by Function. Then if the function is constrained to be pure and safe (akin to a lookup), the query language is too.

Andl - A New Database Language - andl.org
Quote from Dave Voorhis on March 22, 2020, 1:01 am
Quote from dandl on March 22, 2020, 12:55 am

Which is true, but I don't know of any practical language, query or programming, that prevents non-termination.

I was under the impression that SQL queries, including fix-point recursion in CTE, always terminate. [I'm specifically excluding SQL procedural variants.] SQL is highly practical, even though there are queries beyond its capabilities.

An SQL query has no state other than the database (unless you want to include database 'settings' too). SQL queries can fail of course, but usually for highly predictable reasons. Does that make them 'unsafe'?

I was assuming SQL with at least user-defined functions, because I don't think there are any practical (and popular) SQL implementations without them. Except SQLite, maybe?

I wasn't. Every SQL has one or more extension languages, in SQLite it happens to be C.

If you (somehow) constrain the functions to be pure, safe and terminating, the query language is too.

A safe, terminating query language is a most practical thing to have. It can execute a vast range of queries (but not all), and it lends itself to optimisation. The only thing it really can't do is replace a GP language, but we already have plenty of those so no problem!

 

Andl - A New Database Language - andl.org
Quote from dandl on March 22, 2020, 3:36 am
Quote from Dave Voorhis on March 22, 2020, 1:01 am
Quote from dandl on March 22, 2020, 12:55 am

Which is true, but I don't know of any practical language, query or programming, that prevents non-termination.

I was under the impression that SQL queries, including fix-point recursion in CTE, always terminate. [I'm specifically excluding SQL procedural variants.] SQL is highly practical, even though there are queries beyond its capabilities.

An SQL query has no state other than the database (unless you want to include database 'settings' too). SQL queries can fail of course, but usually for highly predictable reasons. Does that make them 'unsafe'?

I was assuming SQL with at least user-defined functions, because I don't think there are any practical (and popular) SQL implementations without them. Except SQLite, maybe?

I wasn't. Every SQL has one or more extension languages, in SQLite it happens to be C.

If you (somehow) constrain the functions to be pure, safe and terminating, the query language is too.

A safe, terminating query language is a most practical thing to have. It can execute a vast range of queries (but not all), and it lends itself to optimisation. The only thing it really can't do is replace a GP language, but we already have plenty of those so no problem!

For some time, it's been recognised in academia (and generally in industry, at least among those who consider such things), that guaranteed termination in a general query language is impractical, as the lack of user-defined functions -- or built-in functions written in some potentially non-terminating language -- precludes most real-world work.

You could conceivably provide a wide selection of built-in functions that have been tested -- or analysed and proven -- to always terminate, and constrain the language to use only those, but that's not a safe and terminating language, that would be -- by definition -- an unsafe and non-terminating language that has been constrained and proven to be safe and terminating as far as we know.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on March 22, 2020, 10:00 am
Quote from dandl on March 22, 2020, 3:36 am
Quote from Dave Voorhis on March 22, 2020, 1:01 am
Quote from dandl on March 22, 2020, 12:55 am

Which is true, but I don't know of any practical language, query or programming, that prevents non-termination.

I was under the impression that SQL queries, including fix-point recursion in CTE, always terminate. [I'm specifically excluding SQL procedural variants.] SQL is highly practical, even though there are queries beyond its capabilities.

An SQL query has no state other than the database (unless you want to include database 'settings' too). SQL queries can fail of course, but usually for highly predictable reasons. Does that make them 'unsafe'?

I was assuming SQL with at least user-defined functions, because I don't think there are any practical (and popular) SQL implementations without them. Except SQLite, maybe?

I wasn't. Every SQL has one or more extension languages, in SQLite it happens to be C.

If you (somehow) constrain the functions to be pure, safe and terminating, the query language is too.

A safe, terminating query language is a most practical thing to have. It can execute a vast range of queries (but not all), and it lends itself to optimisation. The only thing it really can't do is replace a GP language, but we already have plenty of those so no problem!

For some time, it's been recognised in academia (and generally in industry, at least among those who consider such things), that guaranteed termination in a general query language is impractical, as the lack of user-defined functions -- or built-in functions written in some potentially non-terminating language -- precludes most real-world work.

Do you have a reference for that? I've used a lot of query tools that depend on SQL queries to retrieve  data via ODBC, JDBC or similar, and never found much need to write SQL functions. It's quite different if your job is the care and feeding of a database, but there are thousands of potential consumers of data for whom plain SQL is plenty.

My goal is to provide tools for non-programmers to retrieve and massage data obtained from disparate data sources. An SQL function that works on one database is unlikely to work on another, and SQLite is high on the list of needing to be supported. I think your concept of 'real world work' is heavily coloured by what you think is important, not what other people actually need.

You could conceivably provide a wide selection of built-in functions that have been tested -- or analysed and proven -- to always terminate, and constrain the language to use only those, but that's not a safe and terminating language, that would be -- by definition -- an unsafe and non-terminating language that has been constrained and proven to be safe and terminating as far as we know.

No, by definition that is safe and terminating, but may fail to satisfy that definition because of errors in implementation. We can't prove there are no bugs.

But it matters not. I'm not trying to construct proofs, I'm trying to find how far you can get in providing general access to data for use by clever people who are not programmers. I'm assuming a user who is comfortable with Excel, interacting with a GUI and writing the odd formula. For this project I care only about people who are their own customer, not about people who write systems for others. A general purpose programming language is too high a barrier for this purpose, and in my opinion, totally unnecessary.

 

It's easy to prove that a function is safe and terminating, if you really want to. You can write them in a language that provides no external state, no external scope, no general loops, no general recursion (JEXL seems to go close). You can allow access to well-defined and tested system libraries, such as maths and string functions. You can quite easily get to the point where any breach of the safe and terminating guarantee is a bug, to be treated as such.

Andl - A New Database Language - andl.org
Quote from dandl on March 23, 2020, 4:26 am
Quote from Dave Voorhis on March 22, 2020, 10:00 am
Quote from dandl on March 22, 2020, 3:36 am
Quote from Dave Voorhis on March 22, 2020, 1:01 am
Quote from dandl on March 22, 2020, 12:55 am

Which is true, but I don't know of any practical language, query or programming, that prevents non-termination.

I was under the impression that SQL queries, including fix-point recursion in CTE, always terminate. [I'm specifically excluding SQL procedural variants.] SQL is highly practical, even though there are queries beyond its capabilities.

An SQL query has no state other than the database (unless you want to include database 'settings' too). SQL queries can fail of course, but usually for highly predictable reasons. Does that make them 'unsafe'?

I was assuming SQL with at least user-defined functions, because I don't think there are any practical (and popular) SQL implementations without them. Except SQLite, maybe?

I wasn't. Every SQL has one or more extension languages, in SQLite it happens to be C.

If you (somehow) constrain the functions to be pure, safe and terminating, the query language is too.

A safe, terminating query language is a most practical thing to have. It can execute a vast range of queries (but not all), and it lends itself to optimisation. The only thing it really can't do is replace a GP language, but we already have plenty of those so no problem!

For some time, it's been recognised in academia (and generally in industry, at least among those who consider such things), that guaranteed termination in a general query language is impractical, as the lack of user-defined functions -- or built-in functions written in some potentially non-terminating language -- precludes most real-world work.

Do you have a reference for that?

No, it's not like we were all advocating strictly non-terminating query languages until someone wrote a definitive paper and we all changed our minds. It was an obvious evolution driven by pragmatism.

I've used a lot of query tools that depend on SQL queries to retrieve  data via ODBC, JDBC or similar, and never found much need to write SQL functions. It's quite different if your job is the care and feeding of a database, but there are thousands of potential consumers of data for whom plain SQL is plenty.

The consumers are almost certainly using a large set of pre-built functions. You can hardly do anything in a business database without tripping over date and time functions, followed by string functions, and both are dwarfed by mathematical operators.

Again, you can make all of them built-in, define the language to preclude user-defined functions, and test and analyse your way to some reasonable promises about non-termination, but it's not a non-terminating language.

My goal is to provide tools for non-programmers to retrieve and massage data obtained from disparate data sources. An SQL function that works on one database is unlikely to work on another, and SQLite is high on the list of needing to be supported. I think your concept of 'real world work' is heavily coloured by what you think is important, not what other people actually need.

No, it's coloured by 30+ years developing database-driven applications for users.

You could conceivably provide a wide selection of built-in functions that have been tested -- or analysed and proven -- to always terminate, and constrain the language to use only those, but that's not a safe and terminating language, that would be -- by definition -- an unsafe and non-terminating language that has been constrained and proven to be safe and terminating as far as we know.

No, by definition that is safe and terminating,

No, by definition it is not safe and terminating, because the language used to write the functions is not safe and terminating.

Indeed, that's why "safe and terminating" is a theoretical ideal: the host machine can't guarantee safe and terminating. You can only make promises based on testing, but not guarantees based on choice of structure, architecture, or model.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on March 23, 2020, 4:26 am

It's easy to prove that a function is safe and terminating, if you really want to. You can write them in a language that provides no external state, no external scope, no general loops, no general recursion (JEXL seems to go close). You can allow access to well-defined and tested system libraries, such as maths and string functions. You can quite easily get to the point where any breach of the safe and terminating guarantee is a bug, to be treated as such.

In fact you can get hugely more power than that by total functional programming, which limits recursion to structural recursion and provides corecursion over codata (separated from data by the type system), allowing the use of stream I/O.  The result is not Turing-complete, but a huge number of algorithms can still be written.   Wikipedia; Turner's paper; LTU discussion.  Note particularly that Turner's arguments against ⊥ in programming are precisely D & D's arguments against NULL, though he doesn't go into as much detail about the disaster of 3VL.  He also says that programmers made a choice fifty years ago to go for Turing-completeness at the expense of a lot of silly and disastrous programs, and thinks it is time to reconsider that.

Someone named Douglas McClean mentioned in a later LTU that he was working on a toy functional D-alike, but I haven't been able to find out anything else about it.

 

Thank you for those references.

There has been a major move in the C# community to adopt a similar style. In both Andl and C#I find I can write lots of code with no destructive assignment or explicit loops. I've had some similar success with Java (definitely harder than C#) and JavaScript (with React). It would indeed be an interesting thing if the default behaviour of our GP languages had been a functional style, and you had to explicitly call out parts of the program that manipulated state.

Both Excel (spreadsheets and expressions) and SQL (as a query language) are immensely powerful and widely used sub-TC languages. I think that's a space worth exploring.

Andl - A New Database Language - andl.org
Quote from johnwcowan on March 23, 2020, 1:32 pm
Quote from dandl on March 23, 2020, 4:26 am

It's easy to prove that a function is safe and terminating, if you really want to. You can write them in a language that provides no external state, no external scope, no general loops, no general recursion (JEXL seems to go close). You can allow access to well-defined and tested system libraries, such as maths and string functions. You can quite easily get to the point where any breach of the safe and terminating guarantee is a bug, to be treated as such.

In fact you can get hugely more power than that by total functional programming, which limits recursion to structural recursion and provides corecursion over codata (separated from data by the type system), allowing the use of stream I/O.  The result is not Turing-complete, but a huge number of algorithms can still be written.   Wikipedia; Turner's paper; LTU discussion.  Note particularly that Turner's arguments against ⊥ in programming are precisely D & D's arguments against NULL, though he doesn't go into as much detail about the disaster of 3VL.  He also says that programmers made a choice fifty years ago to go for Turing-completeness at the expense of a lot of silly and disastrous programs, and thinks it is time to reconsider that.

 

Thanks John, Turner's ideas (I've tried volunteering them here before) are quite subtle; they've not seen much uptake in the FP community.

I have to disagree about comparing total functional programming/avoiding ⊥ aka 'bottom' with excising SQL NULL. NULL does not give you partial functions (although it often seems that way): you can always test whether some column is NULL and form a total function including the test. Whereas you can't test whether some expression yields bottom or is partially defined -- or rather you can try to test for bottom, but that gives a function that is partially defined/never returns. There is no 3VL with bottom: there are the conventional 2 values plus program non-termination/interrupted by the run-time system.

D&D's arguments are that naive programmers end up in total confusion with 3VL; sophisticated programmers can work round it at cost of obfuscated code and subtle bugs; and the PL's type 'system' (if that's the right word for SQL) can give none of the static guarantees we'd like a type system to give.

PreviousPage 3 of 4Next