The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Tuples FTW

PreviousPage 2 of 5Next
Quote from Hugh on April 28, 2021, 10:36 am
Quote from tobega on April 28, 2021, 6:47 am

On the subject of type system for a language capable of hosting a D (and also for Tailspin, of course), we have observed that Tuples must be structurally typed, i.e. the attributes they contain define them as the product type of those attributes.

As a counterpoint to a previous thread here, I propose that Tuples be THE way to create product types.

The latest insight (or train-wreck) that I had, is that we should let attributes define types, i.e. instead of saying that an attribute has a type, we say that an attribute is a type. I think this fits very nicely with the natural join and that we take the position that things with the same name are the same kind of things. It also fits in with a good practice to create specific types for specific things, even if in Java it is a bit of a pain to e.g. create a SupplierName class that simply wraps a String.

So we would declare that there is a type called PNAME of the base type string, and the type called SNAME of the base type string, and you just use them as attributes in the Tuples, the type and the attribute have the same name.

Obviously you cannot assign an SNAME value to a PNAME attribute without casting it. But you could e.g. have a COMPANY_NAME and have SNAME be of the type COMPANY_NAME, which would enable assigning between the two.

So, comments? Good idea? Insane idea?

I've seen other replies.  It doesn't look like a good idea to me but in any case clarification is needed.  Please give examples of type definitions for, e.g., SNAME and PNAME, preferably using TD-like syntax.  I assume you imagine a relation type definition to be like TD's but with just attribute type names as heading components: REL{SNO, SNAME, CITY

What do you think a value of an attribute type looks like.   Please give a literal denoting the supplier name Smith.

What are the implications for the relational RENAME operator?

Hugh we've discussed ideas very like this before. I do think TTM's type system is not rich enough, compared to modern languages (or even to languages extant at the time TTM was written). I think this should be legitimate Tutorial D:

S1 := SNo 'S1'         // type <SNo, Char> -- i.e. a <A, T> pair
P1 := PNo 'P1'
Q50 := Qty 50

SP := REL{ TUP{ S1, P1, Q50 } }    // heading as per the standard relvar SP example

IOW, TTM's <A, T, v> triples should be allowed as first-class values, of first-class type <A, T>. Those should not be condemned to second-class appearance only within TUP/REL values and Headings.

Addit: Then an <A, T, v> triple functions more like a value wrapped in a Selector; and we don't need the verbose TTM form TUP{ SNo SNo('S1'), ... }.

Quote from AntC on April 28, 2021, 11:50 am
Quote from tobega on April 28, 2021, 6:47 am

On the subject of type system for a language capable of hosting a D (and also for Tailspin, of course), we have observed that Tuples must be structurally typed, i.e. the attributes they contain define them as the product type of those attributes.

As a counterpoint to a previous thread here, I propose that Tuples be THE way to create product types.

Careful: in most languages, 'product type' means an ordered product. Type (Int, Bool) is distinct from (Bool, Int). So do you mean TUPLE{ SNAME 'S1', PNAME 'P1'} is type distinct from TUPLE{ PNAME 'P1', SNAME 'S1'}?

The latest insight (or train-wreck) that I had, is that we should let attributes define types, i.e. instead of saying that an attribute has a type, we say that an attribute is a type. I think this fits very nicely with the natural join and that we take the position that things with the same name are the same kind of things. It also fits in with a good practice to create specific types for specific things, even if in Java it is a bit of a pain to e.g. create a SupplierName class that simply wraps a String.

So we would declare that there is a type called PNAME of the base type string, and the type called SNAME of the base type string, and you just use them as attributes in the Tuples, the type and the attribute have the same name.

No wrong. Because we have to cope with ad-hoc attribute naming: type X Int distinct from X String. Furthermore we don't want bare String or Int being allowed as types of attributes. We always want there to be a wrapper; and the wrapper to wrap a single 'payload' type.

Take a look at Haskell (or most Functional Languages') 'datatype renamings' section 4.2.3.

newtype X a = X a  -- where a (parametric) denotes some arbitrary type

In a nominal typing system, type name X Int is distinct from type Int. The newtype construct says they are to share the same PhysRep.

Obviously you cannot assign an SNAME value to a PNAME attribute without casting it.

Sure, because type SNAME String is distinct from PNAME String. But 'casting' is not the appropriate mechanism here: unwrap the string from one then re-wrap it into the other. In Functional languages that's achieved via 'pattern matching'. And because the compiler knows they're newtypes and therefore share the same PhysRep, that's a no-op.  Tutorial D has SNAME FROM ... -- in which presumably unwrap/rewrap is computationally more clunky.

You mean THE_SNAME(...), or in general, THE_x(...) operators?

SNAME FROM x retrieves an attribute from tuple expression x, not a member of a user-defined type.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from AntC on April 28, 2021, 11:58 am
Quote from Dave Voorhis on April 28, 2021, 9:07 am
Quote from Darren Duncan on April 28, 2021, 8:55 am
Quote from tobega on April 28, 2021, 6:47 am

It also fits in with a good practice to create specific types for specific things, even if in Java it is a bit of a pain to e.g. create a SupplierName class that simply wraps a String.

I actually advocate in normal programming to have such very wrapper types.  Have lots of types that have very specific meanings and typically wrap generic types like String or Integer, and the wrappers convey more meaning.

There is a common term called "primitive obsession" which describes people who insist on using plain String/int/etc rather than the more semantically strict types described above.

Agree.

Some years ago, I showed a Rel demo database to a DBA in which I had wrapped all primitive types in user-defined types. I demonstrated how it prevented you from JOINing customer numbers to phone numbers, or multiplying phone numbers by product quantities, etc.

He thought it was marvellous. Apparently a significant source of error and time-consuming development and fixes was due to unintended JOINs and operations on what should be incompatible types, particularly when working with large and/or unfamiliar schemas.

Indeed. This is the source of the nostrum 'Natural Join is a disaster waiting to happen.' (They mean in SQL.) If by accident you have column name User or Date on two different tables, Natural Join will join by them even though one is Entered-by User and the other is Approved-by User, or Order-Date vs Delivered-Date.

I favour declaring a data dictionary before declaring any tables, and such that attribute names on tables must be drawn from the dictionary. And that no dictionary field be named Date, User, Int, String, Count, Balance, Total, etc.

It's a good strategy. Unfortunately the compiler is no help. At the outset of my thoughts on safer-higher-shorter was the idea that higher might allow a language to define a data model, not just a single relation. There errors we are talking about are static and discoverable by the compiler, provided it has a valid model to work from. Obvious candidates would include:

  • FK links
  • fields suitable for join, aggregate, order, etc
  • constraints on insertion and deletion
  • inheritance

I think there are things in there that higher order type systems aspire to.

Andl - A New Database Language - andl.org
Quote from Erwin on April 28, 2021, 8:36 am
Quote from tobega on April 28, 2021, 6:47 am

On the subject of type system for a language capable of hosting a D (and also for Tailspin, of course), we have observed that Tuples must be structurally typed, i.e. the attributes they contain define them as the product type of those attributes.

As a counterpoint to a previous thread here, I propose that Tuples be THE way to create product types.

The latest insight (or train-wreck) that I had, is that we should let attributes define types, i.e. instead of saying that an attribute has a type, we say that an attribute is a type. I think this fits very nicely with the natural join and that we take the position that things with the same name are the same kind of things. It also fits in with a good practice to create specific types for specific things, even if in Java it is a bit of a pain to e.g. create a SupplierName class that simply wraps a String.

So we would declare that there is a type called PNAME of the base type string, and the type called SNAME of the base type string, and you just use them as attributes in the Tuples, the type and the attribute have the same name.

Obviously you cannot assign an SNAME value to a PNAME attribute without casting it. But you could e.g. have a COMPANY_NAME and have SNAME be of the type COMPANY_NAME, which would enable assigning between the two.

So, comments? Good idea? Insane idea?

Types are what they are and are used in TTM how they are because of how they create the connection with logic : a type is the domain the free variables in the corresponding predicate draw their values from to "generate" the corresponding proposition.  An attribute declaration inside a nonscalar type definition has TWO parts : the attribute name (Codd mentioned them as "role names") to establish user-friendly addressability of the attribute value, and the type name to establish the logic domain.

So an attribute declaration is an (attrnm, typenm) pair.  What does it look like in your proposal ?  I cannot tell because you use the same words for both attribute name and type name so I cannot tell from the example.  Is it simpler ?  Seems questionable because "simpler" must mean ditching one of those two names which must mean you lose either user-friendly addressability or the very link with the value set (logic domain) itself.  Or is it merely notionally equivalent ?

An attribute reference is just a mention of the attribute name.  Can your proposal make referencing attributes any simpler ?

I do note you seem to mention "two types of types" : your "new-style" types and "base types".  Is that going to make things simpler ?  I doubt it.

Today we would specify SNAME: string and PNAME: string which is essentially wrong because as you say, a type is the domain you can draw values from and an SNAME must be only one of those strings that correspond to the name of a supplier, while PNAME must be one of those strings that correspond to  the name of a part. So therefore SNAME and PNAME should be distinct types and my proposal aims to simplify specifying that by simply interpreting the specification we already provide to implicitly create the type SNAME that consists only of the relevant strings. The attribute SNAME (wherever it is used) must correspond to the type SNAME.

What is gained is that I now cannot assign a PNAME value to an SNAME attribute (without first explicitly downcasting to the common type string)

Quote from Dave Voorhis on April 28, 2021, 12:28 pm
Quote from AntC on April 28, 2021, 11:50 am
Quote from tobega on April 28, 2021, 6:47 am

On the subject of type system for a language capable of hosting a D (and also for Tailspin, of course), we have observed that Tuples must be structurally typed, i.e. the attributes they contain define them as the product type of those attributes.

As a counterpoint to a previous thread here, I propose that Tuples be THE way to create product types.

Careful: in most languages, 'product type' means an ordered product. Type (Int, Bool) is distinct from (Bool, Int). So do you mean TUPLE{ SNAME 'S1', PNAME 'P1'} is type distinct from TUPLE{ PNAME 'P1', SNAME 'S1'}?

The latest insight (or train-wreck) that I had, is that we should let attributes define types, i.e. instead of saying that an attribute has a type, we say that an attribute is a type. I think this fits very nicely with the natural join and that we take the position that things with the same name are the same kind of things. It also fits in with a good practice to create specific types for specific things, even if in Java it is a bit of a pain to e.g. create a SupplierName class that simply wraps a String.

So we would declare that there is a type called PNAME of the base type string, and the type called SNAME of the base type string, and you just use them as attributes in the Tuples, the type and the attribute have the same name.

No wrong. Because we have to cope with ad-hoc attribute naming: type X Int distinct from X String. Furthermore we don't want bare String or Int being allowed as types of attributes. We always want there to be a wrapper; and the wrapper to wrap a single 'payload' type.

Take a look at Haskell (or most Functional Languages') 'datatype renamings' section 4.2.3.

newtype X a = X a  -- where a (parametric) denotes some arbitrary type

In a nominal typing system, type name X Int is distinct from type Int. The newtype construct says they are to share the same PhysRep.

Obviously you cannot assign an SNAME value to a PNAME attribute without casting it.

Sure, because type SNAME String is distinct from PNAME String. But 'casting' is not the appropriate mechanism here: unwrap the string from one then re-wrap it into the other. In Functional languages that's achieved via 'pattern matching'. And because the compiler knows they're newtypes and therefore share the same PhysRep, that's a no-op.  Tutorial D has SNAME FROM ... -- in which presumably unwrap/rewrap is computationally more clunky.

You mean THE_SNAME(...), or in general, THE_x(...) operators?

SNAME FROM x retrieves an attribute from tuple expression x, not a member of a user-defined type.

Thanks Dave, I'm not recalling exactly. Doesn't THE_x( ... ) unwrap a value from a Selector? See the note I've just added to my reply to Hugh. I see SName 'Acme' as a first-class value able to stand in a TUP or assigned to a variable (of type SName String), etc. As such it's instead of a Selector-wrapping.

I want the 'Acme' from TUP{ SNo 'S1', SName 'Acme', City 'Leeds', Status 10 }, in which there are no Selector invocations, and no user-defined-types.

Mutter, mutter, well perhaps SName 'Acme' is at a user-defined type, since I want it to be a first-class value; but that appearance of SName is not a Selector invocation: it's constructing a <A, T, v> triple.

In Haskell you'd use pattern matching against the argument to a function:

foo TUP{ SNo sno, SName sname .. } =        -- var sname in scope here, bound to `Acme`

So I don't think of it as unwrapping/exposing a single attribute value so much as applying a function to the whole TUP.

Addit: we could expand this idea to do away with Selectors altogether (hooray!):

MyPoint := TUP{ X 1, Y 2}
MyRedPoint := TUP{ Colour 'Red', Point MyPoint }       // so Point is a TVA

MyAngle := Theta FROM Point FROM MyRedPoint            // Theta is a 'virtual' attribute

 

Quote from AntC on April 28, 2021, 1:13 pm
Quote from Dave Voorhis on April 28, 2021, 12:28 pm
Quote from AntC on April 28, 2021, 11:50 am
Quote from tobega on April 28, 2021, 6:47 am

On the subject of type system for a language capable of hosting a D (and also for Tailspin, of course), we have observed that Tuples must be structurally typed, i.e. the attributes they contain define them as the product type of those attributes.

As a counterpoint to a previous thread here, I propose that Tuples be THE way to create product types.

Careful: in most languages, 'product type' means an ordered product. Type (Int, Bool) is distinct from (Bool, Int). So do you mean TUPLE{ SNAME 'S1', PNAME 'P1'} is type distinct from TUPLE{ PNAME 'P1', SNAME 'S1'}?

The latest insight (or train-wreck) that I had, is that we should let attributes define types, i.e. instead of saying that an attribute has a type, we say that an attribute is a type. I think this fits very nicely with the natural join and that we take the position that things with the same name are the same kind of things. It also fits in with a good practice to create specific types for specific things, even if in Java it is a bit of a pain to e.g. create a SupplierName class that simply wraps a String.

So we would declare that there is a type called PNAME of the base type string, and the type called SNAME of the base type string, and you just use them as attributes in the Tuples, the type and the attribute have the same name.

No wrong. Because we have to cope with ad-hoc attribute naming: type X Int distinct from X String. Furthermore we don't want bare String or Int being allowed as types of attributes. We always want there to be a wrapper; and the wrapper to wrap a single 'payload' type.

Take a look at Haskell (or most Functional Languages') 'datatype renamings' section 4.2.3.

newtype X a = X a  -- where a (parametric) denotes some arbitrary type

In a nominal typing system, type name X Int is distinct from type Int. The newtype construct says they are to share the same PhysRep.

Obviously you cannot assign an SNAME value to a PNAME attribute without casting it.

Sure, because type SNAME String is distinct from PNAME String. But 'casting' is not the appropriate mechanism here: unwrap the string from one then re-wrap it into the other. In Functional languages that's achieved via 'pattern matching'. And because the compiler knows they're newtypes and therefore share the same PhysRep, that's a no-op.  Tutorial D has SNAME FROM ... -- in which presumably unwrap/rewrap is computationally more clunky.

You mean THE_SNAME(...), or in general, THE_x(...) operators?

SNAME FROM x retrieves an attribute from tuple expression x, not a member of a user-defined type.

Thanks Dave, I'm not recalling exactly. Doesn't THE_x( ... ) unwrap a value from a Selector?

You mean obtain a value of an element of a value of a given type?

Yes, it does.

A selector is an operator that obtains a value of a given type.

See the note I've just added to my reply to Hugh. I see SName 'Acme' as a first-class value able to stand in a TUP or assigned to a variable (of type SName String), etc. As such it's instead of a Selector-wrapping.

I want the 'Acme' from TUP{ SNo 'S1', SName 'Acme', City 'Leeds', Status 10 }, in which there are no Selector invocations, and no user-defined-types.

Mutter, mutter, well perhaps SName 'Acme' is at a user-defined type, since I want it to be a first-class value; but that appearance of SName is not a Selector invocation: it's constructing a <A, T, v> triple.

In Haskell you'd use pattern matching against the argument to a function:

foo TUP{ SNo sno, SName sname .. } = -- var sname in scope here, bound to `Acme`
foo TUP{ SNo sno, SName sname .. } = -- var sname in scope here, bound to `Acme`
foo TUP{ SNo sno, SName sname .. } =        -- var sname in scope here, bound to `Acme`

So I don't think of it as unwrapping/exposing a single attribute value so much as applying a function to the whole TUP.

Ah, yes, you're accessing a specified attribute of a tuple by name, to get its value. So, in Tutorial D that would be ATTRIBUTE_NAME FROM tuple_expression.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Hugh on April 28, 2021, 10:36 am
Quote from tobega on April 28, 2021, 6:47 am

On the subject of type system for a language capable of hosting a D (and also for Tailspin, of course), we have observed that Tuples must be structurally typed, i.e. the attributes they contain define them as the product type of those attributes.

As a counterpoint to a previous thread here, I propose that Tuples be THE way to create product types.

The latest insight (or train-wreck) that I had, is that we should let attributes define types, i.e. instead of saying that an attribute has a type, we say that an attribute is a type. I think this fits very nicely with the natural join and that we take the position that things with the same name are the same kind of things. It also fits in with a good practice to create specific types for specific things, even if in Java it is a bit of a pain to e.g. create a SupplierName class that simply wraps a String.

So we would declare that there is a type called PNAME of the base type string, and the type called SNAME of the base type string, and you just use them as attributes in the Tuples, the type and the attribute have the same name.

Obviously you cannot assign an SNAME value to a PNAME attribute without casting it. But you could e.g. have a COMPANY_NAME and have SNAME be of the type COMPANY_NAME, which would enable assigning between the two.

So, comments? Good idea? Insane idea?

I've seen other replies.  It doesn't look like a good idea to me but in any case clarification is needed.  Please give examples of type definitions for, e.g., SNAME and PNAME, preferably using TD-like syntax.  I assume you imagine a relation type definition to be like TD's but with just attribute type names as heading components: REL{SNO, SNAME, CITY

What do you think a value of an attribute type looks like.   Please give a literal denoting the supplier name Smith.

What are the implications for the relational RENAME operator?

Hugh

P.S.  Perhaps more appropriate, what about EXTEND?  In particular, I have a query that involves extension with concatenation of FirstName and LastName (with a blank in between).  How is that done?

Coauthor of The Third Manifesto and related books.
Quote from Darren Duncan on April 28, 2021, 8:51 am
Quote from tobega on April 28, 2021, 6:47 am

we should let attributes define types, i.e. instead of saying that an attribute has a type, we say that an attribute is a type.

If an attribute IS a type, then how do you handle cases where a tuple needs to have multiple attributes of the same type?  For example, a Marriage tuple referring to 2 Person?

These days I suppose you would have to model a marriage as a Set<Spouse>. But we could consider the classic Parent-Child where both parent and child are of underlying type person. In the case where we want to rename parent to grandparent and child to parent, we would have a slight inconvenience that we would have to explicitly assert the assignment/renaming, e.g. by downcasting to person. But I think that is minor compared to the greater benefit.

Of course, doing this the "right" way by on one side of the join renaming child to "tmp" and on the other renaming parent to "tmp" brings up the case that we may not want "tmp" to become a global type.

Quote from Hugh on April 28, 2021, 2:32 pm
Quote from Hugh on April 28, 2021, 10:36 am
Quote from tobega on April 28, 2021, 6:47 am

On the subject of type system for a language capable of hosting a D (and also for Tailspin, of course), we have observed that Tuples must be structurally typed, i.e. the attributes they contain define them as the product type of those attributes.

As a counterpoint to a previous thread here, I propose that Tuples be THE way to create product types.

The latest insight (or train-wreck) that I had, is that we should let attributes define types, i.e. instead of saying that an attribute has a type, we say that an attribute is a type. I think this fits very nicely with the natural join and that we take the position that things with the same name are the same kind of things. It also fits in with a good practice to create specific types for specific things, even if in Java it is a bit of a pain to e.g. create a SupplierName class that simply wraps a String.

So we would declare that there is a type called PNAME of the base type string, and the type called SNAME of the base type string, and you just use them as attributes in the Tuples, the type and the attribute have the same name.

Obviously you cannot assign an SNAME value to a PNAME attribute without casting it. But you could e.g. have a COMPANY_NAME and have SNAME be of the type COMPANY_NAME, which would enable assigning between the two.

So, comments? Good idea? Insane idea?

I've seen other replies.  It doesn't look like a good idea to me but in any case clarification is needed.  Please give examples of type definitions for, e.g., SNAME and PNAME, preferably using TD-like syntax.  I assume you imagine a relation type definition to be like TD's but with just attribute type names as heading components: REL{SNO, SNAME, CITY

What do you think a value of an attribute type looks like.   Please give a literal denoting the supplier name Smith.

What are the implications for the relational RENAME operator?

Hugh

P.S.  Perhaps more appropriate, what about EXTEND?  In particular, I have a query that involves extension with concatenation of FirstName and LastName (with a blank in between).  How is that done?

Assume:

TYPE FName POSSREP {Name CHAR};
TYPE LName POSSREP {Name CHAR};

Let Users be a relvar with attributes FirstName FName and LastName LName, then:

EXTEND Users: {FullName := THE_Name(FirstName) || ' ' || THE_Name(LastName)}

 

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on April 28, 2021, 9:07 am
Quote from Darren Duncan on April 28, 2021, 8:55 am
Quote from tobega on April 28, 2021, 6:47 am

It also fits in with a good practice to create specific types for specific things, even if in Java it is a bit of a pain to e.g. create a SupplierName class that simply wraps a String.

I actually advocate in normal programming to have such very wrapper types.  Have lots of types that have very specific meanings and typically wrap generic types like String or Integer, and the wrappers convey more meaning.

There is a common term called "primitive obsession" which describes people who insist on using plain String/int/etc rather than the more semantically strict types described above.

Agree.

Some years ago, I showed a Rel demo database to a DBA in which I had wrapped all primitive types in user-defined types. I demonstrated how it prevented you from JOINing customer numbers to phone numbers, or multiplying phone numbers by product quantities, etc.

He thought it was marvellous. Apparently a significant source of error and time-consuming development and fixes was due to unintended JOINs and operations on what should be incompatible types, particularly when working with large and/or unfamiliar schemas.

Cool, so at least there would seem to be some value in what I'm trying to achieve, whether or not the mechanism proposed is viable.

PreviousPage 2 of 5Next