The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Another language with relational constructs added

http://blog.hydromatic.net/2020/02/25/morel-a-functional-language-for-data.html

 

So is this a D? Does it have:

  • relational and tuple types
  • type/heading inference across relational operations
  • natural join
  • grouping and aggregation
  • (generalised) transitive closure?
Andl - A New Database Language - andl.org
Quote from tobega on November 2, 2021, 12:24 pm

http://blog.hydromatic.net/2020/02/25/morel-a-functional-language-for-data.html

Thanks @tobega, I've a feeling I've seen 'Morel' before -- presumably discussed here. I have to say the author has some ... errm ... idiosyncratic ideas about programming and programming languages, and indeed about the relational model.

SQL has several deficiences relating to nested collections, ... After several months trying to figure out how to add these features to SQL, ...

Talk of "nested collections" is a shibboleth for somebody who doesn't understand the relational model nor normalisation. We don't nest stuff in RM -- that's a hierarchical model.

Interactivity

People tend to write SQL queries in a REPL (read-eval-print loop). Small functional programs can be written in the same way, so there is a good fit there.

"People"? IOW amateurs? No, professional programmers write (using a text editor) SQL embedded in higher-order languages, calling the SQL statements in the style of subroutines. REPLs (which every programming language has, not specifically SQL or Functional) are for ad-hoc testing of your code. 'Small functional programs' written in a REPL is perhaps using the language in 'calculator mode'. That's not "writing" code.

If you want to build ad-hoc database queries, I much prefer QBE style point-and-click (Query By Example), because raw SQL is horribly verbose [see below], and suffers fat finger syndrome. Most QBEs can then 'save' your query to a repository, where it'll appear as (stylised) SQL code.

Conciseness

SQL is concise. Many useful queries are only a few lines long.

I've heard SQL called many things. But 'concise' never. Indeed Chris and (especially) Hugh have comprehensively critiqued SQL for its verboseness and syntactic repetitiveness. Consider how much repetition/inability to name sub-expressions there is in a typical SELECT ... GROUP BY ... HAVING ... query.

I guess conciseness is in the eye of the beholder/concise compared to what?

Functional programming languages also tend to be concise, for similar reasons to SQL: they are strongly typed, and the language has good type inference. Type inference means that you don’t need to explicitly specify types often, if ever.

Yes functional languages tend to be concise compared to Java or COBOL or ... SQL. That is absolutely not "for similar reasons" to SQL. Being strongly typed is nothing to do with it (and SQL _isn't_ strongly typed). It has a lot to do with the ability to abstract patterns and get type-safe higher-order functions/capabilities that OOP languages express through 'generics', and capabilities that SQL just doesn't have.

In SQL you _do_ "need to explicitly specify types". That's usually in the schema decl/CREATE TABLE .... Having done that, it's true you typically don't mention types in queries.

In substantial FP applications of course you "explicitly specify types", or you'd go mad. Indeed in 'roughing-out' an application you start by writing the types of the major components, to get a compiler check that it hangs together. Only in the REPL/calculator mode would you put snippets of code without explicit typing.

It's not clear how much genuine FP is in Morel anyway:

By the way, the interpreter is written in Java, and is quite a nice implementation of Standard ML for the JVM,

So the author has invented some syntax and written an interpreter in Java. Does Standard ML really get a look in? The typical way to embed 'Domain Specific Languages' inside a FP, is to layer on domain-specific data structures, and provide functions to access them. The snippets of Morel code don't look at all to me like function calls. There's already an extensive body of work embedding database query languages in FP languages. The code looks like function calls, that get translated under the hood to SQL/passed as strings to a database engine. That's the industrial-strength way to do it.

Presumably Morel has its own data model and access engine. (I guess declared in Java.) Accessing that is never going to be performant over industrial volumes of data.

The author seems to have heard of Codd/'Relationally Complete'/mentions "relational algebra". Then something like Codd's algebra or calculus would be a better fit to FP style. Or set comprehensions -- which FP languages support natively.

separating the “query” parts of a program, that consist only of relational operators, from the “looping” parts of a program. This is unproven at this point, ...

"Unproven" only to somebody who's never seen the extensive literature within FP of folds (map-reduce) or comprehensions -- which rely on higher-order functions. I think this guy doesn't know what he's talking about.

 

Quote from dandl on November 2, 2021, 11:06 pm

So is this a D?

Nope. The quoted snippet of code is

from e in hr.emps,
    d in hr.depts
where e.deptno = d.deptno
yield {e.id, e.deptno, ename = e.name, dname = d.name};

So this is SQL spelled funny. The author says

The equivalent query in SQL looks very similar:

Quite. A Tutorial D equivalent might be (taking some guesses about the schema)

(emps RENAME {ename := name}) JOIN (depts RENAME {dname := name}) { id, deptno, ename, dname }

(And I'd criticise that for not being concise enough.) A Codd algebra might be

σ{id, deptno, ename, dname}( ρ{ename\name}(emps) ⊗ ρ{dname\name}(depts) )       -- using ⊗ for join/Codd's bowtie

Does it have:

  • relational and tuple types
  • type/heading inference across relational operations
  • natural join
  • grouping and aggregation
  • (generalised) transitive closure?

I'd guess: nope, nope, nope, nope, nope -- in that order.

Or in the words of that Aussie kid in the Mitre 10 advert: "You're dreaming".

I'm saying "nope" because of the naïve SQL equivalent given:

SELECT e.id, e.deptno, e.name AS ename, d.name AS dname
FROM hr.emps AS e,
    hr.depts AS d
WHERE e.deptno = d.deptno;

(Those explicit ASs are cute.) Why isn't that ... FROM hr.emps AS e JOIN hr.depts AS d ON e.deptno = d.deptno ?

Quote from dandl on November 2, 2021, 11:06 pm

So is this a D? Does it have:

  • relational and tuple types
  • type/heading inference across relational operations
  • natural join
  • grouping and aggregation
  • (generalised) transitive closure?

But does it have support for nulls?  Duplicate rows?  Does it use a 3-valued logic?

I see nothing about updating and constraints, or even how to make a database.

Hugh

Coauthor of The Third Manifesto and related books.
Quote from Hugh on November 3, 2021, 11:56 am

But does it have support for nulls?  Duplicate rows?  Does it use a 3-valued logic?

Hi Hugh, good questions. On a quick scan (which is all I'm prepared to give it), there's no mention of NULL or NIL, no examples with 'empty' values, no talk of outer joins (but then there's no talk of any sort of joins -- the examples are all Cartesian products).

OTOH if nulls aren't supported, and given the write-ups are aimed at an SQL audience, I'd expect it to say so.

Re Duplicate rows, there's a blog post about aggregates/grouping (so contra my earlier response to @dandl, there is some capability), which seems to be aware of SQL's problems with duplicates, and rather flounders around (it's crying out for 'Relation-Valued Attributes') without saying one way or the other. I'm guessing it doesn't allow duplicate rows in query results.

I see nothing about updating and constraints, or even how to make a database.

You can enter table literals at the command prompt, and assign them to a variable. Or put the assignment into a sourcefile. AFAICT those tablevars can't be UPDATEd, just destructively re-assigned to; assignments don't persist beyond the end of the session. (I guess you could call some code to write the current value into a sourcefile.)

Re "make a database", here's some snippets from the docos:

  • Foreign values allow external data, such as the contents of a JDBC database, to be handled as if it is in memory;
  • We will use Apache Calcite's Table API to represent an external collection,
  • A schema (in the JDBC sense of the word)

Others here will be able to tell if a JDBC schema can declare constraints. I see no way within Morel to manage constraints, nor take advantage of them in executing queries. OTOH

  • Evaluating that [query] would entail preparing a JDBC query.
  • We will use [Apache?] RelBuilder to build relational expressions.

It's sensible to subcontract query execution/optimisation to the database engine.


 

After a flurry of blog posts in the first half of last year, the project seems to be more or less dormant. Curiously there's been some very recent activity on its github repository (to do with introducing JOIN?), but no releases or announcements. I guess that's the typical fate of 'hobby' projects.

 


Re @dandl's q about 'tuple types': Morel's 'records' are a very thin veneer over positional tuples. Indeed, for every tuple type you can access the fields either using a label, or an integer:

emps.deptno ≡ emps.3

The position seems not to be alphabetic by label name, but rather canonical according the the declaration/schema for the table.

Again, this is very like (old) SQL-style -- but I thought SQL had deprecated numeric field offsets? Is this more JDBC style?

 

Quote from dandl on November 2, 2021, 11:06 pm

So is this a D? Does it have:

  • relational and tuple types
  • type/heading inference across relational operations
  • natural join
  • grouping and aggregation
  • (generalised) transitive closure?

 

 

@AntC thanks for thorough assessment. I agree it seems very much to be very little, but interesting to note nonetheless as an indicator of a possible whiff of a breeze in this direction.

I have looked at the reference given by tobega.

Morel's use of a functional language for relational algebra is fine in principle, since the point with both is that when you execute a function/expression of functions, there are no side effects. What you 'get out' of the expression is determined solely by what you 'put in' the expression, so to speak.

However in Chris Date's textbooks called "An Introduction to Database Systems", his chapter on relational algebra always had a section on "What is the Algebra for ?" In it he makes the point that (quote from his 8th edition) "The fundamental intent of the algebra is to allow the writing of algebra expressions. Those expressions in turn are intended to serve a variety of purposes, including retrieval but not limited to retrieval alone". He then gives a list of 'applications' (quote) in which such expressions may be used.

He also asserts that 'relationally complete' refers to the algebra, not the 'applications'.

Since relational algebra has now long been established by TTM, I find 2 questions of interest :

    1. Is Morel relationally complete ? The answer should include answers to Hugh Darwen's questions.
    2. As relational algebra is now so well-established, a key question for a real relational DBMS now is how to express the 'applications' ? And preferably in a consistent manner.

As Morel only handles the retrieval application, for me it is disappointing.

In RAQUEL, 'applications' (called 'actions') are expressed by assignments. The nature of assignments is generalised to achieve consistency. Just as there are a variety of algebra operators, so there are a variety of assignments, in order to be able to execute a variety of applications/actions.

By differentiating between a 'relvar-before-assignment' and a 'relvar-after-assignment' in the statement's parse tree, one can create an 'assignment algebra' to go with the (relationally complete) 'operator algebra'. As both forms of algebra can be combined in a statement, it makes for a more powerful language without increasing its complexity.

Please will Morel look at things like this.

Quote from David Livingstone on November 4, 2021, 11:33 am

Morel's use of a functional language for relational algebra is fine in principle, since the point with both is that when you execute a function/expression of functions, there are no side effects. What you 'get out' of the expression is determined solely by what you 'put in' the expression, so to speak.

However in Chris Date's textbooks called "An Introduction to Database Systems", his chapter on relational algebra always had a section on "What is the Algebra for ?" In it he makes the point that (quote from his 8th edition) "The fundamental intent of the algebra is to allow the writing of algebra expressions. Those expressions in turn are intended to serve a variety of purposes, including retrieval but not limited to retrieval alone".

'fine in principle', but not all "Functional Languages" are equal, which leads to ML being not so good in practice. On retrieval, I see TTM includes a Very Strong Suggestion to support quota queries. These would be problematic in a REPL environment: ML has 'strict' semantics (call by value). It'll evaluate the whole query, rather than lazily streaming it to you page at a time.

ML also has a rather primitive IO model, which I think will be awkward for interacting with persistent storage such as ... oh, a DBMS.

He then gives a list of 'applications' (quote) in which such expressions may be used.

He also asserts that 'relationally complete' refers to the algebra, not the 'applications'.

Since relational algebra has now long been established by TTM, I find 2 questions of interest :

    1. Is Morel relationally complete ?

The author claims it has a Relationally complete query language embedded in a Turing complete general programming language (i.e. Standard ML).

... The answer should include answers to Hugh Darwen's questions.

Mmm? 'Relationally Complete' is Codd's notion, not Date&Darwen's. Does supporting Null or allowing duplicate rows make a language less than Relationally Complete? (The q is moot, because Morel appears not to.) Is there a need to declare constraints? Codd 1972 doesn't mention it.

    1. As relational algebra is now so well-established, a key question for a real relational DBMS now is how to express the 'applications' ? And preferably in a consistent manner.

As Morel only handles the retrieval application, for me it is disappointing.

To the extent Morel interacts with JDBC, I suspect supporting 'applications' involves a rather non-consistent manner of breaking out into Java or operating system commands.

In RAQUEL, 'applications' (called 'actions') are expressed by assignments. The nature of assignments is generalised to achieve consistency. Just as there are a variety of algebra operators, so there are a variety of assignments, in order to be able to execute a variety of applications/actions.

By differentiating between a 'relvar-before-assignment' and a 'relvar-after-assignment' in the statement's parse tree, one can create an 'assignment algebra' to go with the (relationally complete) 'operator algebra'. As both forms of algebra can be combined in a statement, it makes for a more powerful language without increasing its complexity.

Then is CREATE TABLE/declare types and constraints achieved via 'assignments'?

Please will Morel look at things like this.

I suspect the niche of ML programmers wanting to write SQL-in-drag is small. Most would be happy to use Datalog/Prolog. ML abbreviates 'Meta Language': "Historically, ML was conceived to develop proof tactics in the LCF theorem prover (whose language, pplambda, a combination of the first-order predicate calculus and the simply-typed polymorphic lambda calculus, had ML as its metalanguage)" sez the wiki.

Thank you for your reply, AntC.  I found it helpful.

Quote from AntC on November 5, 2021, 11:07 am
Then is CREATE TABLE/declare types and constraints achieved via 'assignments'?

Yes.