The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Meaning of a relation

Quote from dandl on January 17, 2021, 12:08 am
Quote from tobega on January 16, 2021, 4:20 pm
Quote from dandl on January 16, 2021, 8:40 am

Joy has nothing to do with it, and a Turing complete type system is not needed. To implement D requires operations on types (Haskell can do that) and a suitable record-like starting point (standard Haskell cannot). I'll have (another) look at Julia and see if it (now) has anything suitable.

 

I'm getting increasingly curious about the details of this. What was the problem in Haskell? What operations on types are needed? Is that where C# fell?

 

TTM deals with tuple/relation types in a rather idiosyncratic way.

Hmm, it's not 'idiosyncratic' as far as (most) relational models are concerned. There's an impedance mis-match between defining headings (and tuples and relation values) as sets vs the way most ALGOL/C-derived languages define data structures as Algebraic Data Types to be accessed positionally.

You could possibly work round that if the language had enough polymorphism to support re-positioning a data structure to match some other's structure. This would have to be at the type level. The language also needs a way to examine a relation's heading to validate a to-be-extended attribute name isn't already there, and a to-be accessed attribute has the expected type.

To implement this in an existing language you have to add means to:

  • parse a heading, and either define a new one or reuse an existing one
  • define/reuse a heading inferred for values returned by operations of the RA.
  • generate/reuse a tuple/relation type and type name from a heading
  • make type information (mainly the heading) available for the implementation to use.

There is a simpler approach based on generic types, where the heading is a parameter of a tuple/relation type.

To explain why @dandl is (quite correctly) talking about types, contrast with how a program accesses SQL: it concocts a string at run-time, passes that to SQL, which passes back a string of results. That string-passing might be wrapped in various abstraction layers to avoid the most egregious errors, never the less it's all happening at run-rime. There's no compile-time guarantees the program correctly targets the schema. Indeed the schema might get changed after the program has been thoroughly tested, in which case that it runs without reporting error is probably worse than just crashing immediately.

So the way to get better assurance with SQL is via 'stored procedures' which are 'compiled' at the time of building the schema. But 'stored procedures' allow only limited forms of query -- certainly not ad-hoc or schema-changing queries, and again the request is via strings.

It's not clear to me whether anything in what @Tobe has talked about with Julia is statically type-safe or is all happening at run-time(?)

This can be implemented in C#, but there is more obligation on the user to write the code in a particular way and some of the type checking has to be deferred to runtime. I got it working, but a good solution would require a compiler mod.

Without something along these lines, you can't really include the RA in a language.

IIRC part of the C# approach used templating/macros -- which again compromise type-safety, hence the need for some runtime checking.

So the reason I used Haskell (which I guess would count as 'second-tier' by Dave's measure?) is not because it's Functional Programming but because it has a powerful type system that doesn't rely on templating/macros. Also there have been various attempts dating back to at least 2004 to implement type-indexed data structures. That is, structures where you access at runtime 'contents' via a type-level request that can be validated statically at compile time -- both that there is a field of that index-name/type, and that the content is Int/String/etc/a nested structure which is also type-indexed, as expected by the caller.

Haskell's standard 'record' (labelled fields) is not adequate, and those 2004 attempts didn't use it: each record and its set of labels is tightly tied to pre-defined data types. There's no ability to (for example) extend such a record with a new label, or compare two records to see which labels they have in common.

So what we need from a type system is (in TTM terminology) a compile-time check for 'Join-compatability' and 'same-heading', aka SQL's 'Union-compatability' roughly speaking. Plus an ability to require some heading does not include attribute-name A -- because we want to extend with an A, or rename to an A. (Note those 2004 attempts struggled to express that. Easily you could end up with a structure with multiple fields type-labelled A because they were at different positions, and Haskell is really positionally-based.)

What was possible in Haskell needed a swag of extensions. Now that in itself is not scary: the Haskell standard (latest 2010) is a long way behind the extensions. Some extensions are experimental, but the ones needed here are not: indeed, most of them have been mature, and in the two main compilers since ~1998 (the previous Haskell standard). But they do need careful use, and getting them to model the RM -- or at least something like the TTM style -- is still bleeding-edge. There is in development something called 'Dependent Haskell', with Dependent Typing based on Coq/Agda. There seems to be a veil of tears building this into Haskell, especially trying to maintain backwards compatibility. Indeed one of the difficulties is the 'features' already built into Haskell to try to mimic Dependent Typing. Arguably, it would be simpler to rip those out, build on the knowledge gained, go back to Haskell vintage ~2010 and start again.

Orthogonal to this is a wholly separate extension in a very old Haskell compiler (support ceased ~2006). Which looks much more like what TTM needs: anonymous/extensible records:

  • anonymous in the sense the structure isn't tied to a pre-declared data type; your code just barrels ahead with a 'record literal' like (a = 1, b = "hello"), and you can compare == (b = "hello", a = 1) and the answer is type-correct and True. So there's type inference which makes the label part of the type, and canonicalises the ordering of labels. You can embed structures of that type into other structures like Vectors and sets; you can embed records inside records (a = 1, b = "hello", c = (d = True, e = 3.14))
  • extensible in the sense you can x := (b = "hello", a = 1); y := (d = False | x) prepend additional labelled fields to an existing record; and z := (d = True | y) statically reject at compile time attempts to produce duplicate labels or access non-existent labels or access the content at the wrong type.
  • Another neat feature is that you can symbolically express 'this record has labels a, b, and possibly others but not c, d. And you can assign the 'possibly others' to a program variable that is type-correct, and is indeed just another record structure.

But (and apart from being unsupported) the implementation relied on a load of specialised type-inference inside the compiler. You can express 'same heading' but not 'Join-compatible'. Therefore you can't Join. The best you can do is (if you know the attributes in common) strip them off one by one, then re-assemble a concatenated record. If you get that wrong you'll at least get compile-time rejection (with an impenetrable error message).

And that's a general failing: it's not possible to write highly generic code being oblivious to the label names. (Frustratingly, the abstract semantics behind it is much richer. If only that could be implemented ...)

I guess there are kinds of popularity, defined by the size and stature of the community. The Big 5 languages have communities in the millions, and are general purpose in design and in practice. This means they have a vast infrastructure, but it also means there will be lots of niche requirements where they don't do so well. Only Java, C# and C/C++ are statically typed. Although I have been able to implement D using C# it has gaps that would need compiler mods to plug.

But for a D the popularity requirement is not so much about being general purpose but more about having an adequate community and pre-existing language libraries. The TTM requirements for D are oriented towards a database language (potentially an SQL replacement) so the lack of libraries for UI (for example) would not be an impediment. A 'real' language might well have arrays, lists and other collections, generics, metaprogramming, and could be OO or FP or something else. In principle any language on the TIOBE list of 50 should be worth considering. Haskell is at 44, Julia is as 23, above Fortran, Cobol and Ada.

Andl - A New Database Language - andl.org

 

TTM deals with tuple/relation types in a rather idiosyncratic way.

Hmm, it's not 'idiosyncratic' as far as (most) relational models are concerned. There's an impedance mis-match between defining headings (and tuples and relation values) as sets vs the way most ALGOL/C-derived languages define data structures as Algebraic Data Types to be accessed positionally.

Point taken. C language structs have a strong lineage all the way back to assembler memory layouts. The first release of a C prototype had no structs, and it not usable for its chosen purpose: writing large parts of Unix, including its own compiler. Arguably, Java/C# just took the easy way, but there are many languages with record types that have an implicit ordering.

So there is nothing intrinsically wrong with going away from positional structs, there just aren't a bunch of (statically typed) languages to show how it's done.

You could possibly work round that if the language had enough polymorphism to support re-positioning a data structure to match some other's structure. This would have to be at the type level. The language also needs a way to examine a relation's heading to validate a to-be-extended attribute name isn't already there, and a to-be accessed attribute has the expected type.

Yes, kind of. I agree with everything else here, but:

This can be implemented in C#, but there is more obligation on the user to write the code in a particular way and some of the type checking has to be deferred to runtime. I got it working, but a good solution would require a compiler mod.

Without something along these lines, you can't really include the RA in a language.

IIRC part of the C# approach used templating/macros -- which again compromise type-safety, hence the need for some runtime checking.

This is a point that needs exploring. C++ generics are text macros triggered by type matching; Java generics are a compiler layer over objects and type coercion; but C# generics create a parameterised type which is completely type safe. Generics as we know them exist because of the need to create collection types, but I don't know any other languages that serve as good examples, except perhaps Haskell.

So the reason I used Haskell (which I guess would count as 'second-tier' by Dave's measure?) is not because it's Functional Programming but because it has a powerful type system that doesn't rely on templating/macros. Also there have been various attempts dating back to at least 2004 to implement type-indexed data structures. That is, structures where you access at runtime 'contents' via a type-level request that can be validated statically at compile time -- both that there is a field of that index-name/type, and that the content is Int/String/etc/a nested structure which is also type-indexed, as expected by the caller.

And that's the crux of my point: you don't need records, they just cause trouble. In a D you will never have a 'pointer to tuple' you need to deref, the RA just isn't done that way. So I proposed:

  • Re-interpret headings as a type H (which has no values) representing a set of names (for uniqueness), each with a corresponding attribute type.
  • Re-interpret the TTM tuple and relation types as simple generic value types parameterised by a heading: T<H> or R<H>.
  • The operators of the RA are defined in terms of T<H> or R<H> and H arguments, and the return type is always a T<H> or R<H> with H inferred.

I did a complete implementation in C#, but you have to construct the headings by hand and if you get it wrong it will blow up at runtime. But it works.

Maybe it could be done in Haskell, I have no idea. But you really don't need records, they add nothing.

Andl - A New Database Language - andl.org
Quote from dandl on January 17, 2021, 1:15 pm

 

It's unlikely it would work. Here's the list again.

To implement this in an existing language you have to add means to:

    • parse a heading, and either define a new one or reuse an existing one
    • define/reuse a heading inferred for values returned by operations of the RA.
    • generate/reuse a tuple/relation type and type name from a heading
    • make type information (mainly the heading) available for the implementation to use.

I've had a trawl through the Julia docs, and it looks much as I remember it. There is an extremely powerful early stage macro facility that interacts with ASTs and can do symbol and code generation, like the Lisp it inherits from. What I did not find was any sign of interaction between this and the type system, or the kind of static type system smarts found in Haskell.

And if you really can't generate a heading that is a set of <name,type> but rather need to fake it by sorting, then chances are Julia can't do the rest either. But feel free to prove me wrong, I think everyone here would like to find that there is a language that can do this.

Well, at least I would learn more about Julia types than I ever wanted. Attempt ongoing here: https://github.com/tobega/RelationalData

I'll start with NamedTuple and treat the attributes as a set in any operation and see how it goes. Julia doesn't seem to care about order when merging NamedTuples so maybe I can get them to change the equals-method.

I assume that a good enough proof-of-concept would be to have immutable relation values that can be involved in some operations like project and join? Beyond that, if it works, would anyone be willing to help with how to interface to existing databases?

 

When it comes to Julia there isn't really a compile step as such. There is a step where the interpreted code is translated into LLVM and optimised, which happens just before it is run, and this does utilize any type information given in order to optimize things, and it could possibly complain about type problems already at this stage, but mostly type-checking would be runtime. However, it is strongly typed in the sense that if there is a mismatch between actual type and designated type (if a type has been designated), the program will fail and notify the problem.

I've been meaning to ask, what is the TTM stance on Union-types? Or, e.g. declaring an attribute as the wildcard type, Any in Julia?

Quote from tobega on January 18, 2021, 9:21 am

When it comes to Julia there isn't really a compile step as such. There is a step where the interpreted code is translated into LLVM and optimised, which happens just before it is run, and this does utilize any type information given in order to optimize things, and it could possibly complain about type problems already at this stage, but mostly type-checking would be runtime. However, it is strongly typed in the sense that if there is a mismatch between actual type and designated type (if a type has been designated), the program will fail and notify the problem.

I'm not sure what Julia is used for, Wikipedia says this:

Julia is a high-level, high-performance, dynamic programming language. While it is a general-purpose language and can be used to write any application, many of its features are well suited for numerical analysis and computational science.

Users looking for these features don't care about static typing. This is not a good sign for a data-oriented language.

Distinctive aspects of Julia's design include a type system with parametric polymorphism in a dynamic programming language; with multiple dispatch as its core programming paradigm.

This is precisely what is needed for a D, if you can figure out how to harness it, and make it static(!).

I've been meaning to ask, what is the TTM stance on Union-types? Or, e.g. declaring an attribute as the wildcard type, Any in Julia?

The TTM type system is set in an earlier era. It has 'generated types' where parametric types might be better. It talks about operators being 'logically distinct', which I read as implying multiple dispatch without saying so. It has product types without using the name, which are complicated by requirements for 'possible representations' and 'components'. But no, it does not have sum (union) types, although you can fake it using the IM (Inheritance Model). Ask Dave about that.

 

Andl - A New Database Language - andl.org
Quote from tobega on January 18, 2021, 9:21 am

When it comes to Julia there isn't really a compile step as such. There is a step where the interpreted code is translated into LLVM and optimised, which happens just before it is run, and this does utilize any type information given in order to optimize things, and it could possibly complain about type problems already at this stage, but mostly type-checking would be runtime. However, it is strongly typed in the sense that if there is a mismatch between actual type and designated type (if a type has been designated), the program will fail and notify the problem.

In a sense, that's the easy part. In any language, you can create relation/tuple and even scalar type system constructs that are run-time checked. That exists internally, at least, in most D implementations. By creating your own language, what is runtime-checked in the host language can be compile-checked in your own language.

Having type-checking happen in user-space mostly at runtime is a compromise too far, for me at least.

I've been meaning to ask, what is the TTM stance on Union-types? Or, e.g. declaring an attribute as the wildcard type, Any in Julia?

Tutorial D permits union types, but it's not as rigorous as it could be in terms of forcing obligatory compile-time handling of all the possible types of the union.

Wildcard types are notionally supportable as type ALPHA in Tutorial D, I guess.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on January 18, 2021, 12:34 pm

I've been meaning to ask, what is the TTM stance on Union-types? Or, e.g. declaring an attribute as the wildcard type, Any in Julia?

The TTM type system is set in an earlier era. It has 'generated types' where parametric types might be better. It talks about operators being 'logically distinct', which I read as implying multiple dispatch without saying so. It has product types without using the name, which are complicated by requirements for 'possible representations' and 'components'. But no, it does not have sum (union) types, although you can fake it using the IM (Inheritance Model). Ask Dave about that.

It has UNION types called UNION types under the Inheritance Model, though as I noted in a previous post, they are not as strong as they could be.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Considering popularity, the question is really what value-add would we bring that is significant enough for potential users?

I guess the typical C or C++ programmer these days doesn't come anywhere near data in databases so anything in those languages would be pointless.

The typical Java and C# programmer only uses basic crud operations moving data back-and-forth between database and web page, what interest would they have? (In Java, many also got mired in JPA and Hibernate, they will never be able to extricate themselves.)

Same for Ruby, I suppose, Ruby on rails, everything is wired by convention. Python similar story, although maybe the data scientists are interested?

Julia was originally conceived to replace FORTRAN (fast computation but too cumbersome for quick coding) and Python (nimble for proofs-of-concept and small calculations but way too slow for anything serious) in research departments, mainly physics to begin with. Turns out it also competes with R to some extent, and the data scientists are discovering it as well.

As for compile-time checking, I agree that runtime checks are not acceptable in a compiled language, but should we really disqualify languages that aren't compiled? Users of these have developed methods for dealing with lack of compiler help, usually by them being really easy to just rinse-and-repeat, but also by testing. Uncle Bob used to be a huge fan of compile-time static type-checking but is now saying that we have already mucked about with all possible programming languages and should just pick "the last programming language" and that it should be Clojure. Not that he's right, IMO, but what if he was? And what if the trend is to non-compiled languages overall? Python, Typescript + javascript, Dart, Clojure, Ruby, Julia. (Oh, what about Typescript's type system, BTW? A lot of simple web apps are being built on Node, perhaps some time will be on Deno, but Typescript is certainly much more popular than TIOBE indicates, since its mostly just javascript and many web searches would be for basic APIs). Actually, both Typescript and the current direction of Dart indicate that users like strong typing, but not necessarily only in compiled languages.

Quote from tobega on January 18, 2021, 2:00 pm

...

The typical Java and C# programmer only uses basic crud operations moving data back-and-forth between database and web page, what interest would they have? (In Java, many also got mired in JPA and Hibernate, they will never be able to extricate themselves.)

...

Likewise Entity Framework and C#.

There is a certain breed of anti-SQL sentiment that propels use of Hibernate and EF, but the alternative isn't LINQ or Streams or some new language. The core EF/Hibernate fan doesn't like any of that, either -- though they'll grudgingly use LINQ or Streams if forced. The biggest fans of Hibernate and EF typically don't want anything remotely query-like or declarative in their code. What they want is record-by-agonising-record processing in their favourite programming language, just like they learned in (or before) first-year university.

Nested for loops (or foreach loops) for the win. :-(

I suspect that's where some of the knee-jerk NoSQL fanboyism comes from -- anything but SQL, or anything too SQL-like, which unsurprisingly LINQ gets lumped into and perhaps surprisingly (or perhaps not) Java Streams winds up there too.

Julia is getting a lot of love of late, and I know of one commercial project that is using it at the core of a relational-model -based product, but it's Datalog rather than Codd (or post-Codd, i.e., TTM et al.) RM. It's not yet released as far as I know. Or maybe it is -- I haven't checked lately. It's these guys: https://www.relational.ai/

As for compile-time checking, I agree that runtime checks are not acceptable in a compiled language, but should we really disqualify languages that aren't compiled? Users of these have developed methods for dealing with lack of compiler help, usually by them being really easy to just rinse-and-repeat, but also by testing. Uncle Bob used to be a huge fan of compile-time static type-checking but is now saying that we have already mucked about with all possible programming languages and should just pick "the last programming language" and that it should be Clojure. Not that he's right, IMO, but what if he was? And what if the trend is to non-compiled languages overall? Python, Typescript + javascript, Dart, Clojure, Ruby, Julia. (Oh, what about Typescript's type system, BTW? A lot of simple web apps are being built on Node, perhaps some time will be on Deno, but Typescript is certainly much more popular than TIOBE indicates, since its mostly just javascript and many web searches would be for basic APIs). Actually, both Typescript and the current direction of Dart indicate that users like strong typing, but not necessarily only in compiled languages.

I use "compiled" in the loose "modern" sense, i.e., not necessarily to mean the classic sense of an emitted object file and no deployed source, but to mean all source is fully checked to the extent possible (within the semantics of the language) prior to any execution.

That means the real consideration is the extent to which errors can be identified before any or all execution, i.e., fully static type checking; and the extent to which all possible run-time errors must have explicit code paths (and this too is statically checked before any execution.)

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org