The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

What are the nodes in a Visual RA?

Page 1 of 4Next

Assume you are defining some elements in a visual programming language, which presents as a directed graph: nodes connected by edges in a GUI. You wish to define nodes that implement the Relational Algebra, but you cannot use any kind of familiar textual syntactic elements. What nodes do you need? Assume that:

  • Nodes and edges are closed over relations: every edge carries a relation from one node to another, nodes have an arbitrary (but fixed) number of input and output relations
  • Nodes are configured by an arbitrary set of selections using normal GUI elements: text boxes, check boxes, buttons, etc, but textual language elements are prohibited
  • You are free to write implementation code in your choice of favorite language, but not a language processor
  • Input data and output viewers are provided by the system and are not your concern.

It's easy enough to define:

  • Join (with Semijoin, Antijoin as options)
  • Project
  • Rename
  • Union (Minus, Intersection, Difference as options)
  • Divide (and variants)
  • Transitive Closure

But what about:

  • Selection (WHERE in TD and SQL)
  • New Value (EXTENDS in TD)
  • Aggregation (SUMMARIZE in TD, GROUP BY in SQL)
  • Generalised Transitive Closure (while in Alice and Andl, CTE in SQL)

These are usually expressed in a programming language that implement what we might call an attribute algebra: expressions evaluating to individual values. Without being able to write an expression, is it possible to define nodes along the lines of function relations (as per Algebra A and HHT) and rely on JOIN to combine them with other data?

For context, I'm trying to do exactly this in Knime. It really brings the distinction between the RA and the attribute algebra into stark relief.

Andl - A New Database Language - andl.org
Quote from dandl on March 18, 2020, 10:46 pm

Assume you are defining some elements in a visual programming language, which presents as a directed graph: nodes connected by edges in a GUI. You wish to define nodes that implement the Relational Algebra, but you cannot use any kind of familiar textual syntactic elements. What nodes do you need? Assume that:

  • Nodes and edges are closed over relations: every edge carries a relation from one node to another, nodes have an arbitrary (but fixed) number of input and output relations
  • Nodes are configured by an arbitrary set of selections using normal GUI elements: text boxes, check boxes, buttons, etc, but textual language elements are prohibited
  • You are free to write implementation code in your choice of favorite language, but not a language processor
  • Input data and output viewers are provided by the system and are not your concern.

It's easy enough to define:

  • Join (with Semijoin, Antijoin as options)
  • Project
  • Rename
  • Union (Minus, Intersection, Difference as options)
  • Divide (and variants)
  • Transitive Closure

But what about:

  • Selection (WHERE in TD and SQL)
  • New Value (EXTENDS in TD)
  • Aggregation (SUMMARIZE in TD, GROUP BY in SQL)
  • Generalised Transitive Closure (while in Alice and Andl, CTE in SQL)

These are usually expressed in a programming language that implement what we might call an attribute algebra: expressions evaluating to individual values. Without being able to write an expression, is it possible to define nodes along the lines of function relations (as per Algebra A and HHT) and rely on JOIN to combine them with other data?

For context, I'm trying to do exactly this in Knime. It really brings the distinction between the RA and the attribute algebra into stark relief.

That's exactly what the visual query language in Rel does. I haven't found anything better for formulating and debugging massively complex queries, because query subexpressions can be grouped visually and evaluated independently.

Though note that it's a work-in-progress and has some obviously rough bits.

Selection, EXTEND, and aggregation are straightforward -- they have dialogs that allow specifying arbitrary expressions. I haven't implemented generalised transitive closure, the assumption being that developers can write user-defined operators in the usual fashion.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Indeed, but can you solve the problem as I set it, with no "dialogs that allow specifying arbitrary expressions"?

HHT, on rereading, makes a case for having two distinct kinds of relations.

  • Factual: contains tuples asserting truth in accordance with a business-related predicate
  • Functional: no stored tuples, just the application of a function to some arguments.

Why not? But is there enough in A or HHT to say how?

[BTW I prefer new-value, based on Alice. EXTEND is very much a TD peculiarity.]

Andl - A New Database Language - andl.org

In Datalog, factual relations are called external, and functional relations, which include views, are called internal.  There is this difference between functional relations and views, however: a view has only a finite number of tuples, whereas an ordinary functional relation such as < may have an infinite number (or at least bounded only by the representable ranges of numbers).

Quote from dandl on March 19, 2020, 3:11 am

Indeed, but can you solve the problem as I set it, with no "dialogs that allow specifying arbitrary expressions"?

HHT, on rereading, makes a case for having two distinct kinds of relations.

  • Factual: contains tuples asserting truth in accordance with a business-related predicate
  • Functional: no stored tuples, just the application of a function to some arguments.

Why not? But is there enough in A or HHT to say how?

[BTW I prefer new-value, based on Alice. EXTEND is very much a TD peculiarity.]

With no dialogs allowing arbitrary expressions, you're going to have to define arbitrary expressions somehow.  You can define arbitrary expressions using nodes and connections, same as for RA operators.  Think abstract syntax trees, but drawn in software by the user.

I've tried that before, in an icons-on-strings visual Java I wrote almost 20 years ago. It works, but it has the usual problem of visual programming languages: even relatively simple but real expressions require complex diagrams that border on unreadable.

It's far better if you can mix RA operators with conventional text-based expressions.  That seems to hit a sweet spot in terms of providing visual clarity without undue clutter.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from johnwcowan on March 19, 2020, 2:10 pm

In Datalog, factual relations are called external, and functional relations, which include views, are called internal.  There is this difference between functional relations and views, however: a view has only a finite number of tuples, whereas an ordinary functional relation such as < may have an infinite number (or at least bounded only by the representable ranges of numbers).

But Datalog is a programming language and does not resolve this tension between 'set of instantiated tuples' and 'function to be iterated over set of tuples'. You cannot instantiate an infinite or unbounded relation, so how do you represent that visually?

Andl - A New Database Language - andl.org
Quote from Dave Voorhis on March 19, 2020, 3:04 pm
Quote from dandl on March 19, 2020, 3:11 am

Indeed, but can you solve the problem as I set it, with no "dialogs that allow specifying arbitrary expressions"?

HHT, on rereading, makes a case for having two distinct kinds of relations.

  • Factual: contains tuples asserting truth in accordance with a business-related predicate
  • Functional: no stored tuples, just the application of a function to some arguments.

Why not? But is there enough in A or HHT to say how?

[BTW I prefer new-value, based on Alice. EXTEND is very much a TD peculiarity.]

With no dialogs allowing arbitrary expressions, you're going to have to define arbitrary expressions somehow.  You can define arbitrary expressions using nodes and connections, same as for RA operators.  Think abstract syntax trees, but drawn in software by the user.

I've tried that before, in an icons-on-strings visual Java I wrote almost 20 years ago. It works, but it has the usual problem of visual programming languages: even relatively simple but real expressions require complex diagrams that border on unreadable.

It's far better if you can mix RA operators with conventional text-based expressions.  That seems to hit a sweet spot in terms of providing visual clarity without undue clutter.

An AST does not represent data flow, it is merely a diagrammatic representation of function application. LISP without the parentheses. Wrong paradigm.

So far, my efforts at using text expressions hinder transparency. With the EXTEND model you see the output of a node that adds a new value, but you can't see what it's inputs are except by parsing the text. I don't find that a sweet spot at all.

I'm pondering a node that acts purely as a function. It takes a relation as input but chooses which attributes it will use as inputs/arguments. It outputs a relation comprising only the function arguments and result. It does not EXTEND an input relation, it merely computes a NEW VALUE. Then as per A and HHT, you are free to JOIN that output as you wish.

I don't mind the computed function being very simple and just picked from a list, or arbitrarily complex and written using a sub-language, as long as the inputs and outputs are transparent.

Andl - A New Database Language - andl.org
Quote from dandl on March 19, 2020, 11:57 pm

But Datalog is a programming language

Not in the usual sense.  It is a query-and-update language like SQL, not a Turing-complete language like Tutorial D.  It is equivalent in power to the RA + TCLOSE - DIFFERENCE, although DIFFERENCE can be provided by adding stratified or otherwise restricted negation.

so how do you represent that visually?

I can't help there, as I am physiologically incapable of visualizing anything:  "seeing, I see not."

Quote from dandl on March 20, 2020, 12:14 am
Quote from Dave Voorhis on March 19, 2020, 3:04 pm
Quote from dandl on March 19, 2020, 3:11 am

Indeed, but can you solve the problem as I set it, with no "dialogs that allow specifying arbitrary expressions"?

HHT, on rereading, makes a case for having two distinct kinds of relations.

  • Factual: contains tuples asserting truth in accordance with a business-related predicate
  • Functional: no stored tuples, just the application of a function to some arguments.

Why not? But is there enough in A or HHT to say how?

[BTW I prefer new-value, based on Alice. EXTEND is very much a TD peculiarity.]

With no dialogs allowing arbitrary expressions, you're going to have to define arbitrary expressions somehow.  You can define arbitrary expressions using nodes and connections, same as for RA operators.  Think abstract syntax trees, but drawn in software by the user.

I've tried that before, in an icons-on-strings visual Java I wrote almost 20 years ago. It works, but it has the usual problem of visual programming languages: even relatively simple but real expressions require complex diagrams that border on unreadable.

It's far better if you can mix RA operators with conventional text-based expressions.  That seems to hit a sweet spot in terms of providing visual clarity without undue clutter.

An AST does not represent data flow, it is merely a diagrammatic representation of function application. LISP without the parentheses. Wrong paradigm.

No, but it can be subverted to become dataflow-like.  Given an EXTEND expression like p := 2 * (y + 3), it obviously converts to an AST, but the AST is notionally akin to a dataflow something like:

[y] --\
      [+] --\
[3] --/     [*] --> [p}
[2] --------/

I.e., the value of y and the value of the literal 3 flow into the '+' operator, the result of which flows into the '*' operator, etc.

So far, my efforts at using text expressions hinder transparency. With the EXTEND model you see the output of a node that adds a new value, but you can't see what it's inputs are except by parsing the text. I don't find that a sweet spot at all.

Different strokes for different folks, I guess.

I'm pondering a node that acts purely as a function. It takes a relation as input but chooses which attributes it will use as inputs/arguments. It outputs a relation comprising only the function arguments and result. It does not EXTEND an input relation, it merely computes a NEW VALUE. Then as per A and HHT, you are free to JOIN that output as you wish.

I don't mind the computed function being very simple and just picked from a list, or arbitrarily complex and written using a sub-language, as long as the inputs and outputs are transparent.

Yes, calling functions is reasonable, though if you integrate editing them into the environment you pretty much wind up with something like Rel's visual query language -- nodes for RA operators and text for scalar (and certain sub-) expressions.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

An AST does not represent data flow, it is merely a diagrammatic representation of function application. LISP without the parentheses. Wrong paradigm.

No, but it can be subverted to become dataflow-like.  Given an EXTEND expression like p := 2 * (y + 3), it obviously converts to an AST, but the AST is notionally akin to a dataflow something like:

[y] --\
      [+] --\
[3] --/     [*] --> [p}
[2] --------/

I.e., the value of y and the value of the literal 3 flow into the '+' operator, the result of which flows into the '*' operator, etc.

I understood what you meant, but it doesn't help. If the nodes and edges are constrained to operate on tables or even rows, you can't easily subvert them to operate on simple values. Certainly Knime won't allow it.

So far, my efforts at using text expressions hinder transparency. With the EXTEND model you see the output of a node that adds a new value, but you can't see what it's inputs are except by parsing the text. I don't find that a sweet spot at all.

Different strokes for different folks, I guess.

I'm pondering a node that acts purely as a function. It takes a relation as input but chooses which attributes it will use as inputs/arguments. It outputs a relation comprising only the function arguments and result. It does not EXTEND an input relation, it merely computes a NEW VALUE. Then as per A and HHT, you are free to JOIN that output as you wish.

I don't mind the computed function being very simple and just picked from a list, or arbitrarily complex and written using a sub-language, as long as the inputs and outputs are transparent.

Yes, calling functions is reasonable, though if you integrate editing them into the environment you pretty much wind up with something like Rel's visual query language -- nodes for RA operators and text for scalar (and certain sub-) expressions.

The point is: getting away from the EXTEND paradigm. As per App-A and HHT you can model TIMES as a relation operator: relation in, relation out.

  • The input is some relation with one or more numeric attributes, say {{ A=3,B=7,other... },{ A=2,B=11,other...}, ...}
  • The output is {{ A=3, B=7, C=21 },{ A=2,B=11,C=22}, ...}

The output is join-compatible with the input.

Andl - A New Database Language - andl.org
Page 1 of 4Next