Big Data/Data Science/AI, etc
Quote from Dave Voorhis on June 8, 2019, 10:14 amQuote from dandl on June 8, 2019, 1:09 amQuote from Dave Voorhis on June 7, 2019, 3:15 pmWe created a Big Data undergrad course a few years ago. I was its programme leader1. We attracted so few students that we shut it down. It turns out that 16/17 year old kids have no idea what "data science" is, but whatever it is, it sounds like no fun at all.
I don't think it works at under-grad. It looks more like a Masters to me.
Yes. We offer a Masters in Big Data Analytics and it's reasonably successful.
Quote from dandl on June 8, 2019, 1:09 amQuote from Dave Voorhis on June 7, 2019, 3:15 pmIt's inspired by using Rel for "production" desktop data management, and having reflected at length on its strengths and limitations. In particular, I've recognised there are the things it doesn't do at all (or does badly) but that it needs to do really, really well. The solution won't be a D and won't even be a new language per se, but it will respect the ideals of TTM, even though it won't necessarily embrace all the prescriptions and proscriptions. It will help programmers and "power users" work with data and build data-oriented applications, using popular languages and tools in a new but familiar -- and hopefully very appealing and powerful -- way.
Is it a secret, or do you plan to blog about it?
I intend to 'blog about it when I have something to release. Otherwise, I'm just arm-waving about vapourware.
Quote from dandl on June 8, 2019, 1:09 amQuote from Dave Voorhis on June 7, 2019, 3:15 pmWe created a Big Data undergrad course a few years ago. I was its programme leader1. We attracted so few students that we shut it down. It turns out that 16/17 year old kids have no idea what "data science" is, but whatever it is, it sounds like no fun at all.
I don't think it works at under-grad. It looks more like a Masters to me.
Yes. We offer a Masters in Big Data Analytics and it's reasonably successful.
Quote from dandl on June 8, 2019, 1:09 amQuote from Dave Voorhis on June 7, 2019, 3:15 pmIt's inspired by using Rel for "production" desktop data management, and having reflected at length on its strengths and limitations. In particular, I've recognised there are the things it doesn't do at all (or does badly) but that it needs to do really, really well. The solution won't be a D and won't even be a new language per se, but it will respect the ideals of TTM, even though it won't necessarily embrace all the prescriptions and proscriptions. It will help programmers and "power users" work with data and build data-oriented applications, using popular languages and tools in a new but familiar -- and hopefully very appealing and powerful -- way.
Is it a secret, or do you plan to blog about it?
I intend to 'blog about it when I have something to release. Otherwise, I'm just arm-waving about vapourware.
Quote from johnwcowan on June 8, 2019, 2:48 pmQuote from Dave Voorhis on June 8, 2019, 10:14 amQuote from dandl on June 8, 2019, 1:09 amIs it a secret, or do you plan to blog about it?
I intend to 'blog about it when I have something to release. Otherwise, I'm just arm-waving about vapourware.
"What is now proved was once, only imagin'd." —William Blake, "Proverbs of Hell"
Quote from Dave Voorhis on June 8, 2019, 10:14 am
Quote from dandl on June 8, 2019, 1:09 am
Is it a secret, or do you plan to blog about it?
I intend to 'blog about it when I have something to release. Otherwise, I'm just arm-waving about vapourware.
"What is now proved was once, only imagin'd." —William Blake, "Proverbs of Hell"
Quote from johnwcowan on June 12, 2019, 6:11 pmQuote from David Livingstone on June 7, 2019, 10:36 amA lot of computing is conditioned by compiled languages. There everything has to be static so that a programmer can write program code that works for all (the limited range of) possible cases. Interpreted languages can be be dynamic. Therefore a programmer can write code that inspects each case first, and then processes the data on the basis of what is found.
Perhaps we just need to add more dynamic typing to relational DBMSs ?
David Livingstone
There is no such necessary relationship between interpreted languages and dynamic typing. Common Lisp and Scheme are dynamically typed languages, but there are implementations of both that compile to machine code (via C or not) ahead of time. Cecil, also dynamically typed, has as far as I know only a compiler implementation. Often a good deal of runtime type checking can be compiled away.
That isn't even considering the JIT compilers for Smalltalk, Self, Java and other JVM languages, and C# and other CLR languages, Pure, Julia, etc.
Quote from David Livingstone on June 7, 2019, 10:36 am
A lot of computing is conditioned by compiled languages. There everything has to be static so that a programmer can write program code that works for all (the limited range of) possible cases. Interpreted languages can be be dynamic. Therefore a programmer can write code that inspects each case first, and then processes the data on the basis of what is found.
Perhaps we just need to add more dynamic typing to relational DBMSs ?
David Livingstone
There is no such necessary relationship between interpreted languages and dynamic typing. Common Lisp and Scheme are dynamically typed languages, but there are implementations of both that compile to machine code (via C or not) ahead of time. Cecil, also dynamically typed, has as far as I know only a compiler implementation. Often a good deal of runtime type checking can be compiled away.
That isn't even considering the JIT compilers for Smalltalk, Self, Java and other JVM languages, and C# and other CLR languages, Pure, Julia, etc.
Quote from David Livingstone on June 14, 2019, 6:48 pmQuote from johnwcowan on June 12, 2019, 6:11 pmQuote from David Livingstone on June 7, 2019, 10:36 aThere is no such necessary relationship between interpreted languages and dynamic typing.
I take your point. The relationship is not NECESSARY.
To clarify perhaps - what I was thinking of were languages which allow meta data about variables, etc, to be obtained via some operator/assignment/... in the language - often called 'Reflection' these days - AND then allow that data to be used within an expression or statement in the language.
For example in such a relational DB language, one could write an expression(s) to obtain the sets of attributes (names and types) of several existing DB relvars, and then (say) write an expression which derives a new set of attributes from those sets (using Unions, Intersects, Differences, as required), and then finally create a new DB relvar with that new set of derived attributes.
All I meant to say was that in my experience, the habits of thought of programmers who use compiled languages all the time, tend not to think in ways consistent with the above example.
But as you point out, every language is different in what it provides, and every programmer has different experiences which condition their typical way of thinking about coding.
David Livingstone
Quote from johnwcowan on June 12, 2019, 6:11 pmQuote from David Livingstone on June 7, 2019, 10:36 a
There is no such necessary relationship between interpreted languages and dynamic typing.
I take your point. The relationship is not NECESSARY.
To clarify perhaps - what I was thinking of were languages which allow meta data about variables, etc, to be obtained via some operator/assignment/... in the language - often called 'Reflection' these days - AND then allow that data to be used within an expression or statement in the language.
For example in such a relational DB language, one could write an expression(s) to obtain the sets of attributes (names and types) of several existing DB relvars, and then (say) write an expression which derives a new set of attributes from those sets (using Unions, Intersects, Differences, as required), and then finally create a new DB relvar with that new set of derived attributes.
All I meant to say was that in my experience, the habits of thought of programmers who use compiled languages all the time, tend not to think in ways consistent with the above example.
But as you point out, every language is different in what it provides, and every programmer has different experiences which condition their typical way of thinking about coding.
David Livingstone
Quote from dandl on June 15, 2019, 12:50 amQuote from David Livingstone on June 14, 2019, 6:48 pmQuote from johnwcowan on June 12, 2019, 6:11 pmQuote from David Livingstone on June 7, 2019, 10:36 aThere is no such necessary relationship between interpreted languages and dynamic typing.
I take your point. The relationship is not NECESSARY.
To clarify perhaps - what I was thinking of were languages which allow meta data about variables, etc, to be obtained via some operator/assignment/... in the language - often called 'Reflection' these days - AND then allow that data to be used within an expression or statement in the language.
For example in such a relational DB language, one could write an expression(s) to obtain the sets of attributes (names and types) of several existing DB relvars, and then (say) write an expression which derives a new set of attributes from those sets (using Unions, Intersects, Differences, as required), and then finally create a new DB relvar with that new set of derived attributes.
All I meant to say was that in my experience, the habits of thought of programmers who use compiled languages all the time, tend not to think in ways consistent with the above example.
But as you point out, every language is different in what it provides, and every programmer has different experiences which condition their typical way of thinking about coding.
Here I sit firmly on the fence, but I reject your core proposition. I love using dynamic languages for small programs, because I get results fast. Ruby is my preferred poison, but I've used them all.
And I hate using dynamic languages for larger programs, because they break so easily. My habit is to rely heavily on the compiler and IDE to catch typos, reduce typing and allow brutal refactoring. When I write JavaScript I have to waste heaps of time on fixing dumb bugs and writing silly unit tests because I don't have a compiler to do it for me. JS sucks, TS rules.
But your core proposition is a purported advantage for dynamic languages in managing de novo types, and this I reject. There is none. Static compilation can easily figure out the types of newly created relvars and other types. Either the programmer wrote them and can rely on the compiler to check them, or the language provides compile-time type operators and the language can generate them so they're never wrong. Haskell is one example of a language that does this, but there are others. An industrial D should be one of them.
Quote from David Livingstone on June 14, 2019, 6:48 pmQuote from johnwcowan on June 12, 2019, 6:11 pmQuote from David Livingstone on June 7, 2019, 10:36 aThere is no such necessary relationship between interpreted languages and dynamic typing.
I take your point. The relationship is not NECESSARY.
To clarify perhaps - what I was thinking of were languages which allow meta data about variables, etc, to be obtained via some operator/assignment/... in the language - often called 'Reflection' these days - AND then allow that data to be used within an expression or statement in the language.
For example in such a relational DB language, one could write an expression(s) to obtain the sets of attributes (names and types) of several existing DB relvars, and then (say) write an expression which derives a new set of attributes from those sets (using Unions, Intersects, Differences, as required), and then finally create a new DB relvar with that new set of derived attributes.
All I meant to say was that in my experience, the habits of thought of programmers who use compiled languages all the time, tend not to think in ways consistent with the above example.
But as you point out, every language is different in what it provides, and every programmer has different experiences which condition their typical way of thinking about coding.
Here I sit firmly on the fence, but I reject your core proposition. I love using dynamic languages for small programs, because I get results fast. Ruby is my preferred poison, but I've used them all.
And I hate using dynamic languages for larger programs, because they break so easily. My habit is to rely heavily on the compiler and IDE to catch typos, reduce typing and allow brutal refactoring. When I write JavaScript I have to waste heaps of time on fixing dumb bugs and writing silly unit tests because I don't have a compiler to do it for me. JS sucks, TS rules.
But your core proposition is a purported advantage for dynamic languages in managing de novo types, and this I reject. There is none. Static compilation can easily figure out the types of newly created relvars and other types. Either the programmer wrote them and can rely on the compiler to check them, or the language provides compile-time type operators and the language can generate them so they're never wrong. Haskell is one example of a language that does this, but there are others. An industrial D should be one of them.
Quote from johnwcowan on June 15, 2019, 1:15 amQuote from dandl on June 15, 2019, 12:50 amAnd I hate using dynamic languages for larger programs, because they break so easily. My habit is to rely heavily on the compiler and IDE to catch typos, reduce typing and allow brutal refactoring. When I write JavaScript I have to waste heaps of time on fixing dumb bugs and writing silly unit tests because I don't have a compiler to do it for me. JS sucks, TS rules.
~~ shrug ~~ To each their own. I have lots of experience with both dynamic and static languages at both large and small scales, and I don't have strong preferences for either. Then again, I am not obsessed with speed of development.
But your core proposition is a purported advantage for dynamic languages in managing de novo types, and this I reject. There is none. Static compilation can easily figure out the types of newly created relvars and other types. Either the programmer wrote them and can rely on the compiler to check them, or the language provides compile-time type operators and the language can generate them so they're never wrong. Haskell is one example of a language that does this, but there are others. An industrial D should be one of them.
Reflection is basically an escape hatch by which static languages can do dynamic things. That's distinct from type inference, which is an unmitigated good and all statically typed programming languages should have it.
Quote from dandl on June 15, 2019, 12:50 am
And I hate using dynamic languages for larger programs, because they break so easily. My habit is to rely heavily on the compiler and IDE to catch typos, reduce typing and allow brutal refactoring. When I write JavaScript I have to waste heaps of time on fixing dumb bugs and writing silly unit tests because I don't have a compiler to do it for me. JS sucks, TS rules.
~~ shrug ~~ To each their own. I have lots of experience with both dynamic and static languages at both large and small scales, and I don't have strong preferences for either. Then again, I am not obsessed with speed of development.
But your core proposition is a purported advantage for dynamic languages in managing de novo types, and this I reject. There is none. Static compilation can easily figure out the types of newly created relvars and other types. Either the programmer wrote them and can rely on the compiler to check them, or the language provides compile-time type operators and the language can generate them so they're never wrong. Haskell is one example of a language that does this, but there are others. An industrial D should be one of them.
Reflection is basically an escape hatch by which static languages can do dynamic things. That's distinct from type inference, which is an unmitigated good and all statically typed programming languages should have it.
Quote from AntC on June 15, 2019, 5:01 amQuote from David Livingstone on June 14, 2019, 6:48 pmQuote from johnwcowan on June 12, 2019, 6:11 pmQuote from David Livingstone on June 7, 2019, 10:36 aThere is no such necessary relationship between interpreted languages and dynamic typing.
I take your point. The relationship is not NECESSARY.
To clarify perhaps - what I was thinking of were languages which allow meta data about variables, etc, to be obtained via some operator/assignment/... in the language - often called 'Reflection' these days - AND then allow that data to be used within an expression or statement in the language.
For example in such a relational DB language, one could write an expression(s) to obtain the sets of attributes (names and types) of several existing DB relvars, and then (say) write an expression which derives a new set of attributes from those sets (using Unions, Intersects, Differences, as required), and then finally create a new DB relvar with that new set of derived attributes.
David (L), I think you're being unduly pessimistic about the capabilities of static type inference. The headings (and keys) of relations/relvars are known statically. Then we can infer the heading (and keys) of any derived relational expression. Thanks to TTM's approach of headings being sets, as you say, we need only a few relational operators:
- The Union of two headings is the heading of the
JOIN
of relations with those headings.- For the Difference of two headings there's no operation within "the usual operators of the relational algebra" (RM Pre 18), but if your algebra is expressively complete [**], you can take the heading from 'generalised
REMOVE
':r1 REMOVE r2
isr1
project away all attributes in common withr2
.- For the Intersection (of headings), there's already ISBL generalised
UNION
≡ SQLUNION CORRESPONDING
≡ Tropashko'sInner Union
: note this is union of the tuples but intersection of the headings, that is: project each operand relation on the attributes in common, union the projected tuples.All I meant to say was that in my experience, the habits of thought of programmers who use compiled languages all the time, tend not to think in ways consistent with the above example.
I use compiled/static languages all the time. I've already built libraries of operators that derive Union/Intersection/Difference of headings statically. All I did was borrow work from people doing that (in Haskell) fifteen years ago. In my experience, the habits of thought of programmers who use dynamically-typed languages all the time, tend not to not understand how powerful static type inference can be.
Haskell also has powerful facilities for 'Reflection'/generic programming -- statically type-safe. But I didn't need to draw on any of that.
[Note **] 'expressively complete'. I deliberately avoided Codd's 'relationally complete' there. I find his 1972 definitions fraught with difficulties, and almost certainly inadequate (insofar as it's possible to make them precise). For me, 'complete' includes at least "closed over relations", then IMO all relational operators should take only relations as operands (as do
JOIN, UNION, [NOT] MATCHING
), even if we allow some operations with (sets of) attribute names baked in (such as project/remove,RENAME/EXTEND
). Arguably those baked-in attribute names are not operands, because (at least in Tutorial D) they must be name literals/constants, and (sets of) attribute names are not first-class.Then I prefer Tropashko's set of operators, which can be proved to be expressively complete under a lattice interpretation of relations. In fact we need only one operator -- I choose
JOIN
-- all others can be defined in terms of it, using lattice properties.
Quote from David Livingstone on June 14, 2019, 6:48 pmQuote from johnwcowan on June 12, 2019, 6:11 pmQuote from David Livingstone on June 7, 2019, 10:36 aThere is no such necessary relationship between interpreted languages and dynamic typing.
I take your point. The relationship is not NECESSARY.
To clarify perhaps - what I was thinking of were languages which allow meta data about variables, etc, to be obtained via some operator/assignment/... in the language - often called 'Reflection' these days - AND then allow that data to be used within an expression or statement in the language.
For example in such a relational DB language, one could write an expression(s) to obtain the sets of attributes (names and types) of several existing DB relvars, and then (say) write an expression which derives a new set of attributes from those sets (using Unions, Intersects, Differences, as required), and then finally create a new DB relvar with that new set of derived attributes.
David (L), I think you're being unduly pessimistic about the capabilities of static type inference. The headings (and keys) of relations/relvars are known statically. Then we can infer the heading (and keys) of any derived relational expression. Thanks to TTM's approach of headings being sets, as you say, we need only a few relational operators:
- The Union of two headings is the heading of the
JOIN
of relations with those headings. - For the Difference of two headings there's no operation within "the usual operators of the relational algebra" (RM Pre 18), but if your algebra is expressively complete [**], you can take the heading from 'generalised
REMOVE
':r1 REMOVE r2
isr1
project away all attributes in common withr2
. - For the Intersection (of headings), there's already ISBL generalised
UNION
≡ SQLUNION CORRESPONDING
≡ Tropashko'sInner Union
: note this is union of the tuples but intersection of the headings, that is: project each operand relation on the attributes in common, union the projected tuples.
All I meant to say was that in my experience, the habits of thought of programmers who use compiled languages all the time, tend not to think in ways consistent with the above example.
I use compiled/static languages all the time. I've already built libraries of operators that derive Union/Intersection/Difference of headings statically. All I did was borrow work from people doing that (in Haskell) fifteen years ago. In my experience, the habits of thought of programmers who use dynamically-typed languages all the time, tend not to not understand how powerful static type inference can be.
Haskell also has powerful facilities for 'Reflection'/generic programming -- statically type-safe. But I didn't need to draw on any of that.
[Note **] 'expressively complete'. I deliberately avoided Codd's 'relationally complete' there. I find his 1972 definitions fraught with difficulties, and almost certainly inadequate (insofar as it's possible to make them precise). For me, 'complete' includes at least "closed over relations", then IMO all relational operators should take only relations as operands (as do JOIN, UNION, [NOT] MATCHING
), even if we allow some operations with (sets of) attribute names baked in (such as project/remove, RENAME/EXTEND
). Arguably those baked-in attribute names are not operands, because (at least in Tutorial D) they must be name literals/constants, and (sets of) attribute names are not first-class.
Then I prefer Tropashko's set of operators, which can be proved to be expressively complete under a lattice interpretation of relations. In fact we need only one operator -- I choose JOIN
-- all others can be defined in terms of it, using lattice properties.
Quote from johnwcowan on June 15, 2019, 10:23 pmQuote from AntC on June 15, 2019, 5:01 amThen I prefer Tropashko's set of operators, which can be proved to be expressively complete under a lattice interpretation of relations. In fact we need only one operator -- I choose
JOIN
-- all others can be defined in terms of it, using lattice properties.Tropashko gives UNION and JOIN as his minimal base at the expense of a number of infinite relations, but how do you reduce UNION to JOIN?
I wonder if the nullology prescription should be extended to omnology (totology? ugly either way) as well. Has anyone done the work to demonstrate that all the operators do the Right Thing on relations (necessarily virtual) with an infinite number of tuples? Tropashko particularly depends on EQ and NEQ, the obvious relations whose tuples are every pair of the form (k, k) and the complement of it.
I also would like an explanation of TENSOR_PRODUCT as applied to relations (I know from nothing about tensors).
Quote from AntC on June 15, 2019, 5:01 amThen I prefer Tropashko's set of operators, which can be proved to be expressively complete under a lattice interpretation of relations. In fact we need only one operator -- I choose
JOIN
-- all others can be defined in terms of it, using lattice properties.
Tropashko gives UNION and JOIN as his minimal base at the expense of a number of infinite relations, but how do you reduce UNION to JOIN?
I wonder if the nullology prescription should be extended to omnology (totology? ugly either way) as well. Has anyone done the work to demonstrate that all the operators do the Right Thing on relations (necessarily virtual) with an infinite number of tuples? Tropashko particularly depends on EQ and NEQ, the obvious relations whose tuples are every pair of the form (k, k) and the complement of it.
I also would like an explanation of TENSOR_PRODUCT as applied to relations (I know from nothing about tensors).
Quote from AntC on June 16, 2019, 2:20 amQuote from johnwcowan on June 15, 2019, 10:23 pmQuote from AntC on June 15, 2019, 5:01 amThen I prefer Tropashko's set of operators, which can be proved to be expressively complete under a lattice interpretation of relations. In fact we need only one operator -- I choose
JOIN
-- all others can be defined in terms of it, using lattice properties.Tropashko gives UNION and JOIN as his minimal base ...
Careful: that's his 'Inner Union', not SQL's (or D's or set theory's) UNION. And he defines several other operations in terms of those. (Those are definitions, not reductions.)
at the expense of a number of infinite relations, but how do you reduce UNION to JOIN?
I said "defined in terms of", not "reduce". We're in a lattice structure (where the base partially ordered set is the set of relations). Not quite sure why you're bringing in "infinite": I expect the set of relations to be finitely denumerable. Then when I say "I choose
JOIN
", I mean I define the partial ordering (following Tropashko) using relationalJOIN
as lattice meet. ThenInnerUnion
is lattice join. Note that in usual definitions of lattice structures those two operations are taken as axiomatic, with the absorption laws, then associativity is derived. I prefer
JOIN
defined as closed over the base set of relations; the operation must be commutative, associative, idempotent.- Lattice (partial) ordering:
(r1 <= r2) =df r1 == (r1 JOIN r2)
-- standard definition- The identity for
JOIN
isDEE
(by definition):forall r. DEE JOIN r == r
-- i.e. lattice top(r1 InnerUnion r2) =df r3 s.t. (r1 == (r1 JOIN r3)) ∧ (r2 == (r2 JOIN r3))
andr3
is the least such (by lattice ordering), that is:forall r3'. (r3' == (r1 JOIN r3')) ∧ (r3' == (r2 JOIN r3')) ⇒ (r3 == (r3 JOIN r3'))
.There's a side condition on
JOIN
(which applies mutatis mutandis for every definition in terms ofJOIN
): the operands toJOIN
must be join-compatible, that is attribute names in common must be at same attribute type. Otherwise the result ofJOIN
is undefined. Then operands toInnerUnion
must be join-compatible. (Tropashko doesn't apply this side-condition: his attributes are untyped.)We can show
InnerUnion
exists and is unique for all pairs of relations (providing thoseJOIN
s are well-defined). Also can showDEE
exists and is unique.I wonder if the nullology prescription should be extended to omnology (totology? ugly either way) as well. Has anyone done the work to demonstrate that all the operators do the Right Thing on relations (necessarily virtual) with an infinite number of tuples? Tropashko particularly depends on EQ and NEQ, the obvious relations whose tuples are every pair of the form (k, k) and the complement of it.
Again I'm not sure what you're getting at. Tropashko's operators are domain-independent. (Unlike Appendix A's <OR>, <NOT>.) Then yes it goes fine with infinite relations. AFAIR his definitional semantics doesn't draw on dis-equality of relations, only on equality. Must admit I'm a little rusty on it. To be precise, he does explore/model a lattice structure 'at right angles' to the
JOIN
-based one, withDUM
akaR00
as lattice bottom for Appendix A's<OR>
and hisR11
as lattice top. That won't go well with infinite relations; but none of that modelling is essential to his core operators.If we follow Hall, Hitchcock, Todd 1975's idea of 'algorithmic relations' then those are potentially infinite. But HHT give rules for constructing queries (similar to the rules for 'safe queries') to make sure you never expose an infinite relation outermost in a query. Those rules are easily expressed using type inference for the relational operators (treating algorithmic relations as a different type constructor vs grounded relations). That type inference is statically type safe, David L please note.
I also would like an explanation of TENSOR_PRODUCT as applied to relations (I know from nothing about tensors).
There used to be a contributor here talked in terms of cylindric algebras/Tarski semantics for relations. I never managed to get my head round that.
Quote from johnwcowan on June 15, 2019, 10:23 pmQuote from AntC on June 15, 2019, 5:01 amThen I prefer Tropashko's set of operators, which can be proved to be expressively complete under a lattice interpretation of relations. In fact we need only one operator -- I choose
JOIN
-- all others can be defined in terms of it, using lattice properties.Tropashko gives UNION and JOIN as his minimal base ...
Careful: that's his 'Inner Union', not SQL's (or D's or set theory's) UNION. And he defines several other operations in terms of those. (Those are definitions, not reductions.)
at the expense of a number of infinite relations, but how do you reduce UNION to JOIN?
I said "defined in terms of", not "reduce". We're in a lattice structure (where the base partially ordered set is the set of relations). Not quite sure why you're bringing in "infinite": I expect the set of relations to be finitely denumerable. Then when I say "I choose JOIN
", I mean I define the partial ordering (following Tropashko) using relational JOIN
as lattice meet. Then InnerUnion
is lattice join. Note that in usual definitions of lattice structures those two operations are taken as axiomatic, with the absorption laws, then associativity is derived. I prefer
JOIN
defined as closed over the base set of relations; the operation must be commutative, associative, idempotent.- Lattice (partial) ordering:
(r1 <= r2) =df r1 == (r1 JOIN r2)
-- standard definition - The identity for
JOIN
isDEE
(by definition):forall r. DEE JOIN r == r
-- i.e. lattice top (r1 InnerUnion r2) =df r3 s.t. (r1 == (r1 JOIN r3)) ∧ (r2 == (r2 JOIN r3))
andr3
is the least such (by lattice ordering), that is:forall r3'. (r3' == (r1 JOIN r3')) ∧ (r3' == (r2 JOIN r3')) ⇒ (r3 == (r3 JOIN r3'))
.
There's a side condition on JOIN
(which applies mutatis mutandis for every definition in terms of JOIN
): the operands to JOIN
must be join-compatible, that is attribute names in common must be at same attribute type. Otherwise the result of JOIN
is undefined. Then operands to InnerUnion
must be join-compatible. (Tropashko doesn't apply this side-condition: his attributes are untyped.)
We can show InnerUnion
exists and is unique for all pairs of relations (providing those JOIN
s are well-defined). Also can show DEE
exists and is unique.
I wonder if the nullology prescription should be extended to omnology (totology? ugly either way) as well. Has anyone done the work to demonstrate that all the operators do the Right Thing on relations (necessarily virtual) with an infinite number of tuples? Tropashko particularly depends on EQ and NEQ, the obvious relations whose tuples are every pair of the form (k, k) and the complement of it.
Again I'm not sure what you're getting at. Tropashko's operators are domain-independent. (Unlike Appendix A's <OR>, <NOT>.) Then yes it goes fine with infinite relations. AFAIR his definitional semantics doesn't draw on dis-equality of relations, only on equality. Must admit I'm a little rusty on it. To be precise, he does explore/model a lattice structure 'at right angles' to the JOIN
-based one, with DUM
aka R00
as lattice bottom for Appendix A's <OR>
and his R11
as lattice top. That won't go well with infinite relations; but none of that modelling is essential to his core operators.
If we follow Hall, Hitchcock, Todd 1975's idea of 'algorithmic relations' then those are potentially infinite. But HHT give rules for constructing queries (similar to the rules for 'safe queries') to make sure you never expose an infinite relation outermost in a query. Those rules are easily expressed using type inference for the relational operators (treating algorithmic relations as a different type constructor vs grounded relations). That type inference is statically type safe, David L please note.
I also would like an explanation of TENSOR_PRODUCT as applied to relations (I know from nothing about tensors).
There used to be a contributor here talked in terms of cylindric algebras/Tarski semantics for relations. I never managed to get my head round that.
Quote from johnwcowan on June 16, 2019, 4:27 amQuote from AntC on June 16, 2019, 2:20 amCareful: that's his 'Inner Union', not SQL's (or D's or set theory's) UNION. And he defines several other operations in terms of those.
Ah. Clearly I need to read the paper more closely.
I said "defined in terms of", not "reduce".
What's the distinction? Where I come from, if you can define A in terms of B (using a constructive definition, at least), you have reduced A to B.
Not quite sure why you're bringing in "infinite": I expect the set of relations to be finitely denumerable.
No, it's the relations themselves that T uses in his definitions of RENAME and MINUS that contain an infinite number of tuples. Siimlarly, the niladic version of INTERSECT in the TTM book requires a heading to be supplied, and it is the universal relation for that heading; i.e. all possible tuples that meet the type restrictions. This is given as the only case where INTERSECT and JOIN differ.
There's a side condition on
JOIN
(which applies mutatis mutandis for every definition in terms ofJOIN
): the operands toJOIN
must be join-compatible, that is attribute names in common must be at same attribute type.See my post on type-indexed rows, which makes that tautologous.
Again I'm not sure what you're getting at. Tropashko's operators are domain-independent. (Unlike Appendix A's <OR>, <NOT>.) Then yes it goes fine with infinite relations.
But TTM, as noted above, requires relations of infinite cardinality already. Once you have exposed such things, you have to know what they are going to do when operated on.
Quote from AntC on June 16, 2019, 2:20 amCareful: that's his 'Inner Union', not SQL's (or D's or set theory's) UNION. And he defines several other operations in terms of those.
Ah. Clearly I need to read the paper more closely.
I said "defined in terms of", not "reduce".
What's the distinction? Where I come from, if you can define A in terms of B (using a constructive definition, at least), you have reduced A to B.
Not quite sure why you're bringing in "infinite": I expect the set of relations to be finitely denumerable.
No, it's the relations themselves that T uses in his definitions of RENAME and MINUS that contain an infinite number of tuples. Siimlarly, the niladic version of INTERSECT in the TTM book requires a heading to be supplied, and it is the universal relation for that heading; i.e. all possible tuples that meet the type restrictions. This is given as the only case where INTERSECT and JOIN differ.
There's a side condition on
JOIN
(which applies mutatis mutandis for every definition in terms ofJOIN
): the operands toJOIN
must be join-compatible, that is attribute names in common must be at same attribute type.
See my post on type-indexed rows, which makes that tautologous.
Again I'm not sure what you're getting at. Tropashko's operators are domain-independent. (Unlike Appendix A's <OR>, <NOT>.) Then yes it goes fine with infinite relations.
But TTM, as noted above, requires relations of infinite cardinality already. Once you have exposed such things, you have to know what they are going to do when operated on.