The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Tuples FTW

PreviousPage 5 of 5
Quote from Dave Voorhis on May 3, 2021, 3:23 pm
Quote from Hugh on May 3, 2021, 2:25 pm
Quote from Dave Voorhis on May 1, 2021, 7:17 pm
Quote from Hugh on May 1, 2021, 3:12 pm
Quote from Dave Voorhis on April 30, 2021, 1:46 pm
Quote from Hugh on April 30, 2021, 1:12 pm
Quote from tobega on April 29, 2021, 5:22 pm
Quote from Hugh on April 29, 2021, 3:06 pm
Quote from tobega on April 28, 2021, 3:25 pm
Quote from Hugh on April 28, 2021, 2:32 pm
Quote from Hugh on April 28, 2021, 10:36 am
Quote from tobega on April 28, 2021, 6:47 am

On the subject of type system for a language capable of hosting a D (and also for Tailspin, of course), we have observed that Tuples must be structurally typed, i.e. the attributes they contain define them as the product type of those attributes.

As a counterpoint to a previous thread here, I propose that Tuples be THE way to create product types.

The latest insight (or train-wreck) that I had, is that we should let attributes define types, i.e. instead of saying that an attribute has a type, we say that an attribute is a type. I think this fits very nicely with the natural join and that we take the position that things with the same name are the same kind of things. It also fits in with a good practice to create specific types for specific things, even if in Java it is a bit of a pain to e.g. create a SupplierName class that simply wraps a String.

So we would declare that there is a type called PNAME of the base type string, and the type called SNAME of the base type string, and you just use them as attributes in the Tuples, the type and the attribute have the same name.

Obviously you cannot assign an SNAME value to a PNAME attribute without casting it. But you could e.g. have a COMPANY_NAME and have SNAME be of the type COMPANY_NAME, which would enable assigning between the two.

So, comments? Good idea? Insane idea?

I've seen other replies.  It doesn't look like a good idea to me but in any case clarification is needed.  Please give examples of type definitions for, e.g., SNAME and PNAME, preferably using TD-like syntax.  I assume you imagine a relation type definition to be like TD's but with just attribute type names as heading components: REL{SNO, SNAME, CITY

What do you think a value of an attribute type looks like.   Please give a literal denoting the supplier name Smith.

What are the implications for the relational RENAME operator?

Hugh

P.S.  Perhaps more appropriate, what about EXTEND?  In particular, I have a query that involves extension with concatenation of FirstName and LastName (with a blank in between).  How is that done?

Answering once for both of your posts.

I wouldn't necessarily change anything from the TD syntax, if that is the syntax you want, so

TUPLE { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR }
TUPLE { S# S#('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London' }
TUPLE { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR } TUPLE { S# S#('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London' }
TUPLE { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR }

TUPLE { S# S#('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London' }

But what would happen is that we would now automatically also have the types SNAME, STATUS and CITY defined, and the "real" type of the CITY attribute would be CITY, but it could be assigned values of the representation type CHAR.

I believe I answered the RENAME case in my reply to Darren.

As for EXTEND, I suppose that would work similarly in that you would have to explicitly assert the conversion to appropriate types. So FirstName is what? CHAR? And LastName might also be CHAR. So lets look at TD syntax:

EXTEND a ADD (CHAR(FirstName)+' '+CHAR(LastName) AS FullName)
EXTEND a ADD (CHAR(FirstName)+' '+CHAR(LastName) AS FullName)
EXTEND a ADD (CHAR(FirstName)+' '+CHAR(LastName) AS FullName)

If FullName has not been previously defined, there will now be a type "FullName CHAR", and any attribute FullName can be assumed to be of type FullName, without needing specification. But it wouldn't necessarily hurt to respecify "FullName CHAR". However, trying to specify e.g. "FullName INT" would not be allowed anywhere, you would have to forego that option.

Oh, I didn't see this one until after I had responded to Dave Voorhis.  So I didn't misunderstand but you have now clarified.  The question now concerns the scope of this defined-on-the-fly type Fullname.  It has to be local to the expression in which it is defined, otherwise I would object strongly.  And it is is local to the expression, I can't see much point.

Hugh

There wouldn't be much point in having it just local to the expression. I propose that the specific type definitions for each attribute that Dave provided is a "best practice" or at least a "good practice", so we can just let them be automatically defined from the attribute definition. This would lead the developer in the right direction for the price of a slight inconvenience on the rare (?) occasion when you would have wanted an attribute with the same name but of a different type.

Thank you for confirming the pointlessness of a type definition local to the expression in which it is defined.  In that case, this particular aspect of your idea bothers me greatly.

First, it gets into the dodgy are of expression evaluation having side-effects.  That alone is sufficient grounds for rejection afaiacs.

Secondly,  what happens if  "exp1 AS FullName" and "exp2 AS FullName" specifications appear in the same overall relation expression, where exp1 andexp2 re of different types?

Thirdly, what happens if "exp2 AS FullName" appears in a subsequent statement (perhaps a year or so later, if you are really thinking of global scope)?

It is unthinkable, to my mind, that somebody innocently entering an ad hoc query might invalidate somebody else's ad hoc query in this way.

Sorry if points like this have already been raised and possibly addressed.  I know there has been a lot of correspondence but my brief glances have given me the impression that your idea has received some interest and even sympathy in some quarters.  So I might be missing something.

Hugh

Hugh

Would something like this be more palatable?

DICTIONARY;
Customer_ID CHAR;
Customer_Phone INT;
Invoice_Number INT;
Invoice_Date Date;
Amount RATIONAL;
END DICTIONARY;
VAR Customers REAL RELATION {Customer_ID, Customer_Phone} KEY {Customer_ID};
VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number};
CONSTRAINT Invoices_Customers_FK Invoices {Customer_ID} ⊆ Customers {Customer_ID};
DICTIONARY; Customer_ID CHAR; Customer_Phone INT; Invoice_Number INT; Invoice_Date Date; Amount RATIONAL; END DICTIONARY; VAR Customers REAL RELATION {Customer_ID, Customer_Phone} KEY {Customer_ID}; VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number}; CONSTRAINT Invoices_Customers_FK Invoices {Customer_ID} ⊆ Customers {Customer_ID};
DICTIONARY;
  Customer_ID CHAR;
  Customer_Phone INT;
  Invoice_Number INT;
  Invoice_Date Date;
  Amount RATIONAL;
END DICTIONARY;

VAR Customers REAL RELATION {Customer_ID, Customer_Phone} KEY {Customer_ID};
VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number};

CONSTRAINT Invoices_Customers_FK  Invoices {Customer_ID} ⊆ Customers {Customer_ID};

The effect of the DICTIONARY block is to define a data dictionary of attributes, which implicitly creates the following types...

TYPE Customer_ID POSSREP {Value CHAR};
TYPE Customer_Phone POSSREP {Value INT};
TYPE Invoice_Number POSSREP {Value INT};
TYPE Invoice_Date POSSREP {Value Date};
TYPE Amount POSSREP {Value RATIONAL};
TYPE Customer_ID POSSREP {Value CHAR}; TYPE Customer_Phone POSSREP {Value INT}; TYPE Invoice_Number POSSREP {Value INT}; TYPE Invoice_Date POSSREP {Value Date}; TYPE Amount POSSREP {Value RATIONAL};
TYPE Customer_ID POSSREP {Value CHAR};
TYPE Customer_Phone POSSREP {Value INT}; 
TYPE Invoice_Number POSSREP {Value INT}; 
TYPE Invoice_Date POSSREP {Value Date}; 
TYPE Amount POSSREP {Value RATIONAL};

...and allows declarations like

VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number};

VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number}; to be shorthand for:

VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};

This allows the compiler to check that we're not going to accidently multiply a Customer_Phone by an Invoice_Number, etc.

Ideally, any declaration of the form <identifier> <type> should be replaceable with <dictionary_name>, e.g. this...

VAR Invoice_Number INIT(33);
VAR Invoice_Number INIT(33);
VAR Invoice_Number INIT(33);

...is the same as this:

VAR Invoice_Number Invoice_Number INIT(33);
VAR Invoice_Number Invoice_Number INIT(33);
VAR Invoice_Number Invoice_Number INIT(33);

I imagine it would an entirely optional feature; no existing Tutorial D code would be broken by adding the facility, and anyone who doesn't want to use DICTIONARY and the <dictionary_name> shorthands for <identifier> <type> could ignore them.

This is a bit too much to digest.  It would be better not to include all these shorthands to begin with (such as omitting the type on a VAR declaration to make it default to the variable name (or are we omitting the variable name and making it default to the type name?)).

Actually, the whole reason for having the DICTIONARY ... END DICTIONARY declaration and being able to use <name> <type> --> <dictionary name> shorthands is because the shorthands are desirable. Of course, you can do everything in Tutorial D without the DICTIONARY ... END DICTIONARY declaration or being able to use <name> <type> --> <dictionary name> shorthands...

Or, the <name> <type> --> <dictionary name> shorthands could be removed and just have DICTIONARY, but it would mean a lot of unnecessarily repetitive declarations like:

VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};

Anyway, it doesn't seem anything like Tobega's idea.  Is it solving or addressing the same problem(s) as Tobega's?

It is notionally addressing the same problem as Tobega's, but taking a different approach that also addresses a desire for data dictionaries, mentioned elsewhere in this thread.

I conclude from the examples that dictionary element names are independent from attributes of headings.  That's okay, but you do call a dictionary a dictionary of attributes whereas VAR Invoice_Number Invoice_Number INIT(33) shows Invoice_Number be used for something other than an attribute.

Dictionary element names define identifier type pairs. They could be restricted to attributes of headings, but it might seem arbitrarily restrictive to allow a dictionary name to be used in place of identifier type in some places but not others.

Also, it seems that I can assign and integer to an Invoice_Number variable, and I can compare an Invoice_Number values with an integer, but you don't mention assigning/comparing between Invoice_Number and Amount variables and values.  I believe you don't intend those to be legal.  What about arithmetic operations?  Can I subtract an Amount from an Invoice_Number?  Can I concatenate a Firstname with a blank and a LastName?

Assigning an integer to an Invoice_Number variable was a careless mistake. It should have been...

VAR Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number INIT(Invoice_Number(33));

...which is equivalent to:

VAR Invoice_Number Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number Invoice_Number INIT(Invoice_Number(33));

Am I right in assume there is no effect on RENAME and EXTEND as presently defined in TD?

It has no effect on RENAME, EXTEND, or anything else, except where <name> <type> can currently be used to declare a variable, parameter, or heading attribute (have I missed anything?) you could also use <dictionary_name> instead of <name> <type> -- assuming there is a dictionary entry of the form <dictionary_name> <type>, which automatically creates TYPE <dictionary_name> POSSREP {Value <type>} -- so that using <dictionary_name> in a variable, parameter, or heading attribute declaration instead of <name> <type> is shorthand for <dictionary_name> <dictionary_name>.

E.g., given:

DICTIONARY;
X CHAR;
END DICTIONARY;
DICTIONARY; X CHAR; END DICTIONARY;
DICTIONARY;
  X CHAR;
END DICTIONARY;

The following will be implicitly created:

TYPE X POSSREP {Value CHAR}
TYPE X POSSREP {Value CHAR}
TYPE X POSSREP {Value CHAR}

[Addendum: I wonder if it might be useful to be able to optionally specify in the DICTIONARY section those elements that are not to be wrapped and are to be defined as the specified type. E.g., something like 'Customer_ID INT UNWRAP' means that rather than automatically creating TYPE Customer_ID POSSREP {Value INT} and declaring attributes/variables/parameters named Customer_ID as type Customer_ID, they'd be declared to be type INT.]

And a declaration like...

VAR X;
VAR X;
VAR X;

...is shorthand for:

VAR X X;
VAR X X;
VAR X X;

In short, it provides the "classic" features of a data dictionary, with added type safety.

Quote from Hugh on May 1, 2021, 3:16 pm
Quote from Dave Voorhis on April 30, 2021, 2:44 pm

Thinking a bit more on 'DICTIONARY', etc., maybe it would be desirable to be able to do this:

DICTIONARY;
Customer_ID CHAR;
Customer_Phone, Customer_Phone2 INT;
Invoice_Number INT;
Invoice_Date Date;
Amount RATIONAL;
END DICTIONARY;
DICTIONARY; Customer_ID CHAR; Customer_Phone, Customer_Phone2 INT; Invoice_Number INT; Invoice_Date Date; Amount RATIONAL; END DICTIONARY;
DICTIONARY;
  Customer_ID CHAR;
  Customer_Phone, Customer_Phone2 INT;
  Invoice_Number INT;
  Invoice_Date Date;
  Amount RATIONAL;
END DICTIONARY;

So you can say this...

VAR Customers REAL RELATION {Customer_ID, Customer_Phone, Customer_Phone2} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID, Customer_Phone, Customer_Phone2} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID, Customer_Phone, Customer_Phone2} KEY {Customer_ID};

...which is shorthand for:

VAR Customers REAL RELATION {Customer_ID Customer_ID, Customer_Phone Customer_Phone, Customer_Phone2 Customer_Phone} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID Customer_ID, Customer_Phone Customer_Phone, Customer_Phone2 Customer_Phone} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID Customer_ID, Customer_Phone Customer_Phone, Customer_Phone2 Customer_Phone} KEY {Customer_ID};

Yet another shorthand given in advance of acceptance of the base idea.  We have to be sure that works before considering this addition.

I assume you mean that multiple element names on the same dictionary element are just synonyms.  Right?  If so, I'm reminded that synonyms sometimes give rise to problems, so I think this one might need more thought.

If the DICTIONARY idea is acceptable, it's arguably necessary rather than being an (optional?) addition.

I would definitely not describe multiple element names on the same dictionary element as synonyms. Multiple element names on the same dictionary element define distinct dictionary elements of the same type.

It allows you to declare multiple attributes in a relvar to have the same type using DICTIONARY entries to specify them, which you otherwise couldn't do using DICTIONARY entries.

I misunderstood your use of multiple elements names on the same dictionary element because your example gives two distinct dictionary elements using the same underlying type CHAR.

Given a DICTIONARY like...

DICTIONARY;
Customer_ID, Customer_ID2 INT;
Invoice_Number INT;
END DICTIONARY;
DICTIONARY; Customer_ID, Customer_ID2 INT; Invoice_Number INT; END DICTIONARY;
DICTIONARY;
  Customer_ID, Customer_ID2 INT;
  Invoice_Number INT;
END DICTIONARY;

...it specifies that Customer_ID and Customer_ID2 are both of type Customer_ID which wraps an INTEGER, and are type compatible.

Customer_ID and Invoice_Number are distinct types, both of which wrap an INTEGER. They are not type compatible.

I wouldn't object to any genuine shorthand, though I might have an opinion on its value.  It's difficult for me to judge the wisdom of this one because (a) I don't have a full understanding of the problem it seeks to address (saying it's the same as Tobega's doesn't help me),

The fundamental problem it seeks to address is avoiding type compatibility which accidentally results in error -- things like inadvertently JOINing an invoice ID and a product ID, because they're both named ID and have the same INTEGER type.

Apparently, this sort of thing is quite common in the SQL world, particularly when working with large and relatively unfamiliar schemas, like those in commercial bought-in products.

and (b) the extent to which you are addressing it isn't clear either.

If the DICTIONARY facility is used, each new entry is a unique type (unless explicitly declared otherwise.) Thus, inadvertent type compatibility issues are virtually eliminated. There is also value in having a data dictionary that identifies every data element / attribute, but that's a separate benefit.

You provide a shorthand for defining types that have a single possrep with a single possrep component, without defining any operators in addition to those systematically implied  by the type definition (such as THE_Value(Customer_Id)).  That tells me that solutions for the perceived problem are already available in TD as defined.  Do dictionaries offer any additional advantages?

Yes.

It uses an explicit data dictionary, which is of value for clearly identifying every possible data element / attribute.

It reduces verbosity, and simplifies gaining safety.

And, yes, you can do everything in Tutorial D as currently defined without the DICTIONARY facility. In fact, I often use it that way in Rel. But it's verbose, and you don't get the benefits of an explicit data dictionary.

Regarding the perceived problem, is it really just concerned with ill-advised comparisons, especially those that can arise "by accident", being implicitly involved in operations such as join?

That, and ill-advised mathematical operations on numbers, and ill-advised concatenation of unrelated (i.e., different type) strings, and so on.

In other words, it gives you all the benefits you gain from type safety in general. It's simply a means to encourage type safety where, arguably, type safety should be encouraged. We really shouldn't be treating a Customer ID and a Product ID as the same type, because they're not the same type, even though it's reasonable for both to be based on an integer or a string.

You might want to outlaw taking averages of part numbers too,

Yes, and my approach implicitly outlaws it.

but if anybody really wants to go out of their way do that they probably do have some good reason!

If they have a good reason to do it, they can. This isn't allowed:

AVG(Customers, Customer_ID)
AVG(Customers, Customer_ID)
AVG(Customers, Customer_ID)

But this is allowed:

AVG(Customers, THE_Value(Customer_ID))
AVG(Customers, THE_Value(Customer_ID))
AVG(Customers, THE_Value(Customer_ID))

It has the added benefit of making it explicit what you're doing.

Thanks for all the clarification, with answers as I was expecting.  Just wanted to make sure.

Regarding Customer_ID, Customer_ID2 INT, it seems that the first name is special, and this is an additional shorthand for:

Customer_ID INT;
Customer_ID2 Customer_ID;

assuming that the first statement really does take effect before the second.

If dictionary entry gives rise to an error, is the whole Dictionary statement rejected?

I note that a Dictionary statement isn't named, so I assume that you envisage no additional catalog entries and that the types defined just appear in the catalog in the usual way.  I.e., a dictionary isn't a persistent object itself.

At the moment I'm still wavering between  "no objection" and full support.

Hugh

Coauthor of The Third Manifesto and related books.
Quote from Hugh on May 4, 2021, 10:49 am
Quote from Dave Voorhis on May 3, 2021, 3:23 pm
Quote from Hugh on May 3, 2021, 2:25 pm
Quote from Dave Voorhis on May 1, 2021, 7:17 pm
Quote from Hugh on May 1, 2021, 3:12 pm
Quote from Dave Voorhis on April 30, 2021, 1:46 pm
Quote from Hugh on April 30, 2021, 1:12 pm
Quote from tobega on April 29, 2021, 5:22 pm
Quote from Hugh on April 29, 2021, 3:06 pm
Quote from tobega on April 28, 2021, 3:25 pm
Quote from Hugh on April 28, 2021, 2:32 pm
Quote from Hugh on April 28, 2021, 10:36 am
Quote from tobega on April 28, 2021, 6:47 am

On the subject of type system for a language capable of hosting a D (and also for Tailspin, of course), we have observed that Tuples must be structurally typed, i.e. the attributes they contain define them as the product type of those attributes.

As a counterpoint to a previous thread here, I propose that Tuples be THE way to create product types.

The latest insight (or train-wreck) that I had, is that we should let attributes define types, i.e. instead of saying that an attribute has a type, we say that an attribute is a type. I think this fits very nicely with the natural join and that we take the position that things with the same name are the same kind of things. It also fits in with a good practice to create specific types for specific things, even if in Java it is a bit of a pain to e.g. create a SupplierName class that simply wraps a String.

So we would declare that there is a type called PNAME of the base type string, and the type called SNAME of the base type string, and you just use them as attributes in the Tuples, the type and the attribute have the same name.

Obviously you cannot assign an SNAME value to a PNAME attribute without casting it. But you could e.g. have a COMPANY_NAME and have SNAME be of the type COMPANY_NAME, which would enable assigning between the two.

So, comments? Good idea? Insane idea?

I've seen other replies.  It doesn't look like a good idea to me but in any case clarification is needed.  Please give examples of type definitions for, e.g., SNAME and PNAME, preferably using TD-like syntax.  I assume you imagine a relation type definition to be like TD's but with just attribute type names as heading components: REL{SNO, SNAME, CITY

What do you think a value of an attribute type looks like.   Please give a literal denoting the supplier name Smith.

What are the implications for the relational RENAME operator?

Hugh

P.S.  Perhaps more appropriate, what about EXTEND?  In particular, I have a query that involves extension with concatenation of FirstName and LastName (with a blank in between).  How is that done?

Answering once for both of your posts.

I wouldn't necessarily change anything from the TD syntax, if that is the syntax you want, so

TUPLE { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR }
TUPLE { S# S#('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London' }
TUPLE { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR } TUPLE { S# S#('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London' }
TUPLE { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR }

TUPLE { S# S#('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London' }

But what would happen is that we would now automatically also have the types SNAME, STATUS and CITY defined, and the "real" type of the CITY attribute would be CITY, but it could be assigned values of the representation type CHAR.

I believe I answered the RENAME case in my reply to Darren.

As for EXTEND, I suppose that would work similarly in that you would have to explicitly assert the conversion to appropriate types. So FirstName is what? CHAR? And LastName might also be CHAR. So lets look at TD syntax:

EXTEND a ADD (CHAR(FirstName)+' '+CHAR(LastName) AS FullName)
EXTEND a ADD (CHAR(FirstName)+' '+CHAR(LastName) AS FullName)
EXTEND a ADD (CHAR(FirstName)+' '+CHAR(LastName) AS FullName)

If FullName has not been previously defined, there will now be a type "FullName CHAR", and any attribute FullName can be assumed to be of type FullName, without needing specification. But it wouldn't necessarily hurt to respecify "FullName CHAR". However, trying to specify e.g. "FullName INT" would not be allowed anywhere, you would have to forego that option.

Oh, I didn't see this one until after I had responded to Dave Voorhis.  So I didn't misunderstand but you have now clarified.  The question now concerns the scope of this defined-on-the-fly type Fullname.  It has to be local to the expression in which it is defined, otherwise I would object strongly.  And it is is local to the expression, I can't see much point.

Hugh

There wouldn't be much point in having it just local to the expression. I propose that the specific type definitions for each attribute that Dave provided is a "best practice" or at least a "good practice", so we can just let them be automatically defined from the attribute definition. This would lead the developer in the right direction for the price of a slight inconvenience on the rare (?) occasion when you would have wanted an attribute with the same name but of a different type.

Thank you for confirming the pointlessness of a type definition local to the expression in which it is defined.  In that case, this particular aspect of your idea bothers me greatly.

First, it gets into the dodgy are of expression evaluation having side-effects.  That alone is sufficient grounds for rejection afaiacs.

Secondly,  what happens if  "exp1 AS FullName" and "exp2 AS FullName" specifications appear in the same overall relation expression, where exp1 andexp2 re of different types?

Thirdly, what happens if "exp2 AS FullName" appears in a subsequent statement (perhaps a year or so later, if you are really thinking of global scope)?

It is unthinkable, to my mind, that somebody innocently entering an ad hoc query might invalidate somebody else's ad hoc query in this way.

Sorry if points like this have already been raised and possibly addressed.  I know there has been a lot of correspondence but my brief glances have given me the impression that your idea has received some interest and even sympathy in some quarters.  So I might be missing something.

Hugh

Hugh

Would something like this be more palatable?

DICTIONARY;
Customer_ID CHAR;
Customer_Phone INT;
Invoice_Number INT;
Invoice_Date Date;
Amount RATIONAL;
END DICTIONARY;
VAR Customers REAL RELATION {Customer_ID, Customer_Phone} KEY {Customer_ID};
VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number};
CONSTRAINT Invoices_Customers_FK Invoices {Customer_ID} ⊆ Customers {Customer_ID};
DICTIONARY; Customer_ID CHAR; Customer_Phone INT; Invoice_Number INT; Invoice_Date Date; Amount RATIONAL; END DICTIONARY; VAR Customers REAL RELATION {Customer_ID, Customer_Phone} KEY {Customer_ID}; VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number}; CONSTRAINT Invoices_Customers_FK Invoices {Customer_ID} ⊆ Customers {Customer_ID};
DICTIONARY;
  Customer_ID CHAR;
  Customer_Phone INT;
  Invoice_Number INT;
  Invoice_Date Date;
  Amount RATIONAL;
END DICTIONARY;

VAR Customers REAL RELATION {Customer_ID, Customer_Phone} KEY {Customer_ID};
VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number};

CONSTRAINT Invoices_Customers_FK  Invoices {Customer_ID} ⊆ Customers {Customer_ID};

The effect of the DICTIONARY block is to define a data dictionary of attributes, which implicitly creates the following types...

TYPE Customer_ID POSSREP {Value CHAR};
TYPE Customer_Phone POSSREP {Value INT};
TYPE Invoice_Number POSSREP {Value INT};
TYPE Invoice_Date POSSREP {Value Date};
TYPE Amount POSSREP {Value RATIONAL};
TYPE Customer_ID POSSREP {Value CHAR}; TYPE Customer_Phone POSSREP {Value INT}; TYPE Invoice_Number POSSREP {Value INT}; TYPE Invoice_Date POSSREP {Value Date}; TYPE Amount POSSREP {Value RATIONAL};
TYPE Customer_ID POSSREP {Value CHAR};
TYPE Customer_Phone POSSREP {Value INT}; 
TYPE Invoice_Number POSSREP {Value INT}; 
TYPE Invoice_Date POSSREP {Value Date}; 
TYPE Amount POSSREP {Value RATIONAL};

...and allows declarations like

VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number};

VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number}; to be shorthand for:

VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};

This allows the compiler to check that we're not going to accidently multiply a Customer_Phone by an Invoice_Number, etc.

Ideally, any declaration of the form <identifier> <type> should be replaceable with <dictionary_name>, e.g. this...

VAR Invoice_Number INIT(33);
VAR Invoice_Number INIT(33);
VAR Invoice_Number INIT(33);

...is the same as this:

VAR Invoice_Number Invoice_Number INIT(33);
VAR Invoice_Number Invoice_Number INIT(33);
VAR Invoice_Number Invoice_Number INIT(33);

I imagine it would an entirely optional feature; no existing Tutorial D code would be broken by adding the facility, and anyone who doesn't want to use DICTIONARY and the <dictionary_name> shorthands for <identifier> <type> could ignore them.

This is a bit too much to digest.  It would be better not to include all these shorthands to begin with (such as omitting the type on a VAR declaration to make it default to the variable name (or are we omitting the variable name and making it default to the type name?)).

Actually, the whole reason for having the DICTIONARY ... END DICTIONARY declaration and being able to use <name> <type> --> <dictionary name> shorthands is because the shorthands are desirable. Of course, you can do everything in Tutorial D without the DICTIONARY ... END DICTIONARY declaration or being able to use <name> <type> --> <dictionary name> shorthands...

Or, the <name> <type> --> <dictionary name> shorthands could be removed and just have DICTIONARY, but it would mean a lot of unnecessarily repetitive declarations like:

VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};

Anyway, it doesn't seem anything like Tobega's idea.  Is it solving or addressing the same problem(s) as Tobega's?

It is notionally addressing the same problem as Tobega's, but taking a different approach that also addresses a desire for data dictionaries, mentioned elsewhere in this thread.

I conclude from the examples that dictionary element names are independent from attributes of headings.  That's okay, but you do call a dictionary a dictionary of attributes whereas VAR Invoice_Number Invoice_Number INIT(33) shows Invoice_Number be used for something other than an attribute.

Dictionary element names define identifier type pairs. They could be restricted to attributes of headings, but it might seem arbitrarily restrictive to allow a dictionary name to be used in place of identifier type in some places but not others.

Also, it seems that I can assign and integer to an Invoice_Number variable, and I can compare an Invoice_Number values with an integer, but you don't mention assigning/comparing between Invoice_Number and Amount variables and values.  I believe you don't intend those to be legal.  What about arithmetic operations?  Can I subtract an Amount from an Invoice_Number?  Can I concatenate a Firstname with a blank and a LastName?

Assigning an integer to an Invoice_Number variable was a careless mistake. It should have been...

VAR Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number INIT(Invoice_Number(33));

...which is equivalent to:

VAR Invoice_Number Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number Invoice_Number INIT(Invoice_Number(33));

Am I right in assume there is no effect on RENAME and EXTEND as presently defined in TD?

It has no effect on RENAME, EXTEND, or anything else, except where <name> <type> can currently be used to declare a variable, parameter, or heading attribute (have I missed anything?) you could also use <dictionary_name> instead of <name> <type> -- assuming there is a dictionary entry of the form <dictionary_name> <type>, which automatically creates TYPE <dictionary_name> POSSREP {Value <type>} -- so that using <dictionary_name> in a variable, parameter, or heading attribute declaration instead of <name> <type> is shorthand for <dictionary_name> <dictionary_name>.

E.g., given:

DICTIONARY;
X CHAR;
END DICTIONARY;
DICTIONARY; X CHAR; END DICTIONARY;
DICTIONARY;
  X CHAR;
END DICTIONARY;

The following will be implicitly created:

TYPE X POSSREP {Value CHAR}
TYPE X POSSREP {Value CHAR}
TYPE X POSSREP {Value CHAR}

[Addendum: I wonder if it might be useful to be able to optionally specify in the DICTIONARY section those elements that are not to be wrapped and are to be defined as the specified type. E.g., something like 'Customer_ID INT UNWRAP' means that rather than automatically creating TYPE Customer_ID POSSREP {Value INT} and declaring attributes/variables/parameters named Customer_ID as type Customer_ID, they'd be declared to be type INT.]

And a declaration like...

VAR X;
VAR X;
VAR X;

...is shorthand for:

VAR X X;
VAR X X;
VAR X X;

In short, it provides the "classic" features of a data dictionary, with added type safety.

Quote from Hugh on May 1, 2021, 3:16 pm
Quote from Dave Voorhis on April 30, 2021, 2:44 pm

Thinking a bit more on 'DICTIONARY', etc., maybe it would be desirable to be able to do this:

DICTIONARY;
Customer_ID CHAR;
Customer_Phone, Customer_Phone2 INT;
Invoice_Number INT;
Invoice_Date Date;
Amount RATIONAL;
END DICTIONARY;
DICTIONARY; Customer_ID CHAR; Customer_Phone, Customer_Phone2 INT; Invoice_Number INT; Invoice_Date Date; Amount RATIONAL; END DICTIONARY;
DICTIONARY;
  Customer_ID CHAR;
  Customer_Phone, Customer_Phone2 INT;
  Invoice_Number INT;
  Invoice_Date Date;
  Amount RATIONAL;
END DICTIONARY;

So you can say this...

VAR Customers REAL RELATION {Customer_ID, Customer_Phone, Customer_Phone2} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID, Customer_Phone, Customer_Phone2} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID, Customer_Phone, Customer_Phone2} KEY {Customer_ID};

...which is shorthand for:

VAR Customers REAL RELATION {Customer_ID Customer_ID, Customer_Phone Customer_Phone, Customer_Phone2 Customer_Phone} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID Customer_ID, Customer_Phone Customer_Phone, Customer_Phone2 Customer_Phone} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID Customer_ID, Customer_Phone Customer_Phone, Customer_Phone2 Customer_Phone} KEY {Customer_ID};

Yet another shorthand given in advance of acceptance of the base idea.  We have to be sure that works before considering this addition.

I assume you mean that multiple element names on the same dictionary element are just synonyms.  Right?  If so, I'm reminded that synonyms sometimes give rise to problems, so I think this one might need more thought.

If the DICTIONARY idea is acceptable, it's arguably necessary rather than being an (optional?) addition.

I would definitely not describe multiple element names on the same dictionary element as synonyms. Multiple element names on the same dictionary element define distinct dictionary elements of the same type.

It allows you to declare multiple attributes in a relvar to have the same type using DICTIONARY entries to specify them, which you otherwise couldn't do using DICTIONARY entries.

I misunderstood your use of multiple elements names on the same dictionary element because your example gives two distinct dictionary elements using the same underlying type CHAR.

Given a DICTIONARY like...

DICTIONARY;
Customer_ID, Customer_ID2 INT;
Invoice_Number INT;
END DICTIONARY;
DICTIONARY; Customer_ID, Customer_ID2 INT; Invoice_Number INT; END DICTIONARY;
DICTIONARY;
  Customer_ID, Customer_ID2 INT;
  Invoice_Number INT;
END DICTIONARY;

...it specifies that Customer_ID and Customer_ID2 are both of type Customer_ID which wraps an INTEGER, and are type compatible.

Customer_ID and Invoice_Number are distinct types, both of which wrap an INTEGER. They are not type compatible.

I wouldn't object to any genuine shorthand, though I might have an opinion on its value.  It's difficult for me to judge the wisdom of this one because (a) I don't have a full understanding of the problem it seeks to address (saying it's the same as Tobega's doesn't help me),

The fundamental problem it seeks to address is avoiding type compatibility which accidentally results in error -- things like inadvertently JOINing an invoice ID and a product ID, because they're both named ID and have the same INTEGER type.

Apparently, this sort of thing is quite common in the SQL world, particularly when working with large and relatively unfamiliar schemas, like those in commercial bought-in products.

and (b) the extent to which you are addressing it isn't clear either.

If the DICTIONARY facility is used, each new entry is a unique type (unless explicitly declared otherwise.) Thus, inadvertent type compatibility issues are virtually eliminated. There is also value in having a data dictionary that identifies every data element / attribute, but that's a separate benefit.

You provide a shorthand for defining types that have a single possrep with a single possrep component, without defining any operators in addition to those systematically implied  by the type definition (such as THE_Value(Customer_Id)).  That tells me that solutions for the perceived problem are already available in TD as defined.  Do dictionaries offer any additional advantages?

Yes.

It uses an explicit data dictionary, which is of value for clearly identifying every possible data element / attribute.

It reduces verbosity, and simplifies gaining safety.

And, yes, you can do everything in Tutorial D as currently defined without the DICTIONARY facility. In fact, I often use it that way in Rel. But it's verbose, and you don't get the benefits of an explicit data dictionary.

Regarding the perceived problem, is it really just concerned with ill-advised comparisons, especially those that can arise "by accident", being implicitly involved in operations such as join?

That, and ill-advised mathematical operations on numbers, and ill-advised concatenation of unrelated (i.e., different type) strings, and so on.

In other words, it gives you all the benefits you gain from type safety in general. It's simply a means to encourage type safety where, arguably, type safety should be encouraged. We really shouldn't be treating a Customer ID and a Product ID as the same type, because they're not the same type, even though it's reasonable for both to be based on an integer or a string.

You might want to outlaw taking averages of part numbers too,

Yes, and my approach implicitly outlaws it.

but if anybody really wants to go out of their way do that they probably do have some good reason!

If they have a good reason to do it, they can. This isn't allowed:

AVG(Customers, Customer_ID)
AVG(Customers, Customer_ID)
AVG(Customers, Customer_ID)

But this is allowed:

AVG(Customers, THE_Value(Customer_ID))
AVG(Customers, THE_Value(Customer_ID))
AVG(Customers, THE_Value(Customer_ID))

It has the added benefit of making it explicit what you're doing.

Thanks for all the clarification, with answers as I was expecting.  Just wanted to make sure.

Regarding Customer_ID, Customer_ID2 INT, it seems that the first name is special, and this is an additional shorthand for:

Customer_ID INT;
Customer_ID2 Customer_ID;

assuming that the first statement really does take effect before the second.

The first name is (intentionally) also the name of the generated type. The second (and subsequent) names in the same element definition provide allow subsequent shorthand attribute/variable/parameter definitions with those names, but do not define new types because they are all the type of the first name.

So...

DICTIONARY;
  Customer_ID, Customer_ID2, Customer_Identifier, Customer_ID3 INT;
END DICTIONARY;

...defines a type...

TYPE Customer_ID POSSREP {Value INT};

...and allows definitions like...

VAR Customer_ID;
VAR Customer_ID2 INIT(Customer_ID(222));
VAR MyVar REAL RELATION {Customer_Identifier, Customer_ID3, Address CHAR} KEY {Customer_Identifier};

...which is shorthand for:

VAR Customer_ID Customer_ID;
VAR Customer_ID2 Customer_ID INIT(Customer_ID(222));
VAR MyVar REAL RELATION {Customer_Identifier Customer_ID, Customer_ID3 Customer_ID, Address CHAR} KEY {Customer_Identifier};

If dictionary entry gives rise to an error, is the whole Dictionary statement rejected?

Yes.

I note that a Dictionary statement isn't named, so I assume that you envisage no additional catalog entries and that the types defined just appear in the catalog in the usual way.  I.e., a dictionary isn't a persistent object itself.

A dictionary isn't named because there's at most one per database, but it might be of practical value to provide a set of ALTER DICTIONARY ... commands to add new, delete unused, or even (maybe) change type of dictionary entries.

Since a dictionary entry isn't just a type -- it also specifies possible attribute names of a specified type -- those would need to exist in the catalog in exactly the same way types, relvar definitions, and other useful metadata are kept in the catalog.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on May 4, 2021, 1:34 pm
Quote from Hugh on May 4, 2021, 10:49 am
Quote from Dave Voorhis on May 3, 2021, 3:23 pm
Quote from Hugh on May 3, 2021, 2:25 pm
Quote from Dave Voorhis on May 1, 2021, 7:17 pm
Quote from Hugh on May 1, 2021, 3:12 pm
Quote from Dave Voorhis on April 30, 2021, 1:46 pm
Quote from Hugh on April 30, 2021, 1:12 pm
Quote from tobega on April 29, 2021, 5:22 pm
Quote from Hugh on April 29, 2021, 3:06 pm
Quote from tobega on April 28, 2021, 3:25 pm
Quote from Hugh on April 28, 2021, 2:32 pm
Quote from Hugh on April 28, 2021, 10:36 am
Quote from tobega on April 28, 2021, 6:47 am

On the subject of type system for a language capable of hosting a D (and also for Tailspin, of course), we have observed that Tuples must be structurally typed, i.e. the attributes they contain define them as the product type of those attributes.

As a counterpoint to a previous thread here, I propose that Tuples be THE way to create product types.

The latest insight (or train-wreck) that I had, is that we should let attributes define types, i.e. instead of saying that an attribute has a type, we say that an attribute is a type. I think this fits very nicely with the natural join and that we take the position that things with the same name are the same kind of things. It also fits in with a good practice to create specific types for specific things, even if in Java it is a bit of a pain to e.g. create a SupplierName class that simply wraps a String.

So we would declare that there is a type called PNAME of the base type string, and the type called SNAME of the base type string, and you just use them as attributes in the Tuples, the type and the attribute have the same name.

Obviously you cannot assign an SNAME value to a PNAME attribute without casting it. But you could e.g. have a COMPANY_NAME and have SNAME be of the type COMPANY_NAME, which would enable assigning between the two.

So, comments? Good idea? Insane idea?

I've seen other replies.  It doesn't look like a good idea to me but in any case clarification is needed.  Please give examples of type definitions for, e.g., SNAME and PNAME, preferably using TD-like syntax.  I assume you imagine a relation type definition to be like TD's but with just attribute type names as heading components: REL{SNO, SNAME, CITY

What do you think a value of an attribute type looks like.   Please give a literal denoting the supplier name Smith.

What are the implications for the relational RENAME operator?

Hugh

P.S.  Perhaps more appropriate, what about EXTEND?  In particular, I have a query that involves extension with concatenation of FirstName and LastName (with a blank in between).  How is that done?

Answering once for both of your posts.

I wouldn't necessarily change anything from the TD syntax, if that is the syntax you want, so

TUPLE { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR }
TUPLE { S# S#('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London' }
TUPLE { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR } TUPLE { S# S#('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London' }
TUPLE { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR }

TUPLE { S# S#('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London' }

But what would happen is that we would now automatically also have the types SNAME, STATUS and CITY defined, and the "real" type of the CITY attribute would be CITY, but it could be assigned values of the representation type CHAR.

I believe I answered the RENAME case in my reply to Darren.

As for EXTEND, I suppose that would work similarly in that you would have to explicitly assert the conversion to appropriate types. So FirstName is what? CHAR? And LastName might also be CHAR. So lets look at TD syntax:

EXTEND a ADD (CHAR(FirstName)+' '+CHAR(LastName) AS FullName)
EXTEND a ADD (CHAR(FirstName)+' '+CHAR(LastName) AS FullName)
EXTEND a ADD (CHAR(FirstName)+' '+CHAR(LastName) AS FullName)

If FullName has not been previously defined, there will now be a type "FullName CHAR", and any attribute FullName can be assumed to be of type FullName, without needing specification. But it wouldn't necessarily hurt to respecify "FullName CHAR". However, trying to specify e.g. "FullName INT" would not be allowed anywhere, you would have to forego that option.

Oh, I didn't see this one until after I had responded to Dave Voorhis.  So I didn't misunderstand but you have now clarified.  The question now concerns the scope of this defined-on-the-fly type Fullname.  It has to be local to the expression in which it is defined, otherwise I would object strongly.  And it is is local to the expression, I can't see much point.

Hugh

There wouldn't be much point in having it just local to the expression. I propose that the specific type definitions for each attribute that Dave provided is a "best practice" or at least a "good practice", so we can just let them be automatically defined from the attribute definition. This would lead the developer in the right direction for the price of a slight inconvenience on the rare (?) occasion when you would have wanted an attribute with the same name but of a different type.

Thank you for confirming the pointlessness of a type definition local to the expression in which it is defined.  In that case, this particular aspect of your idea bothers me greatly.

First, it gets into the dodgy are of expression evaluation having side-effects.  That alone is sufficient grounds for rejection afaiacs.

Secondly,  what happens if  "exp1 AS FullName" and "exp2 AS FullName" specifications appear in the same overall relation expression, where exp1 andexp2 re of different types?

Thirdly, what happens if "exp2 AS FullName" appears in a subsequent statement (perhaps a year or so later, if you are really thinking of global scope)?

It is unthinkable, to my mind, that somebody innocently entering an ad hoc query might invalidate somebody else's ad hoc query in this way.

Sorry if points like this have already been raised and possibly addressed.  I know there has been a lot of correspondence but my brief glances have given me the impression that your idea has received some interest and even sympathy in some quarters.  So I might be missing something.

Hugh

Hugh

Would something like this be more palatable?

DICTIONARY;
Customer_ID CHAR;
Customer_Phone INT;
Invoice_Number INT;
Invoice_Date Date;
Amount RATIONAL;
END DICTIONARY;
VAR Customers REAL RELATION {Customer_ID, Customer_Phone} KEY {Customer_ID};
VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number};
CONSTRAINT Invoices_Customers_FK Invoices {Customer_ID} ⊆ Customers {Customer_ID};
DICTIONARY; Customer_ID CHAR; Customer_Phone INT; Invoice_Number INT; Invoice_Date Date; Amount RATIONAL; END DICTIONARY; VAR Customers REAL RELATION {Customer_ID, Customer_Phone} KEY {Customer_ID}; VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number}; CONSTRAINT Invoices_Customers_FK Invoices {Customer_ID} ⊆ Customers {Customer_ID};
DICTIONARY;
  Customer_ID CHAR;
  Customer_Phone INT;
  Invoice_Number INT;
  Invoice_Date Date;
  Amount RATIONAL;
END DICTIONARY;

VAR Customers REAL RELATION {Customer_ID, Customer_Phone} KEY {Customer_ID};
VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number};

CONSTRAINT Invoices_Customers_FK  Invoices {Customer_ID} ⊆ Customers {Customer_ID};

The effect of the DICTIONARY block is to define a data dictionary of attributes, which implicitly creates the following types...

TYPE Customer_ID POSSREP {Value CHAR};
TYPE Customer_Phone POSSREP {Value INT};
TYPE Invoice_Number POSSREP {Value INT};
TYPE Invoice_Date POSSREP {Value Date};
TYPE Amount POSSREP {Value RATIONAL};
TYPE Customer_ID POSSREP {Value CHAR}; TYPE Customer_Phone POSSREP {Value INT}; TYPE Invoice_Number POSSREP {Value INT}; TYPE Invoice_Date POSSREP {Value Date}; TYPE Amount POSSREP {Value RATIONAL};
TYPE Customer_ID POSSREP {Value CHAR};
TYPE Customer_Phone POSSREP {Value INT}; 
TYPE Invoice_Number POSSREP {Value INT}; 
TYPE Invoice_Date POSSREP {Value Date}; 
TYPE Amount POSSREP {Value RATIONAL};

...and allows declarations like

VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number};

VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number}; to be shorthand for:

VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};

This allows the compiler to check that we're not going to accidently multiply a Customer_Phone by an Invoice_Number, etc.

Ideally, any declaration of the form <identifier> <type> should be replaceable with <dictionary_name>, e.g. this...

VAR Invoice_Number INIT(33);
VAR Invoice_Number INIT(33);
VAR Invoice_Number INIT(33);

...is the same as this:

VAR Invoice_Number Invoice_Number INIT(33);
VAR Invoice_Number Invoice_Number INIT(33);
VAR Invoice_Number Invoice_Number INIT(33);

I imagine it would an entirely optional feature; no existing Tutorial D code would be broken by adding the facility, and anyone who doesn't want to use DICTIONARY and the <dictionary_name> shorthands for <identifier> <type> could ignore them.

This is a bit too much to digest.  It would be better not to include all these shorthands to begin with (such as omitting the type on a VAR declaration to make it default to the variable name (or are we omitting the variable name and making it default to the type name?)).

Actually, the whole reason for having the DICTIONARY ... END DICTIONARY declaration and being able to use <name> <type> --> <dictionary name> shorthands is because the shorthands are desirable. Of course, you can do everything in Tutorial D without the DICTIONARY ... END DICTIONARY declaration or being able to use <name> <type> --> <dictionary name> shorthands...

Or, the <name> <type> --> <dictionary name> shorthands could be removed and just have DICTIONARY, but it would mean a lot of unnecessarily repetitive declarations like:

VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};

Anyway, it doesn't seem anything like Tobega's idea.  Is it solving or addressing the same problem(s) as Tobega's?

It is notionally addressing the same problem as Tobega's, but taking a different approach that also addresses a desire for data dictionaries, mentioned elsewhere in this thread.

I conclude from the examples that dictionary element names are independent from attributes of headings.  That's okay, but you do call a dictionary a dictionary of attributes whereas VAR Invoice_Number Invoice_Number INIT(33) shows Invoice_Number be used for something other than an attribute.

Dictionary element names define identifier type pairs. They could be restricted to attributes of headings, but it might seem arbitrarily restrictive to allow a dictionary name to be used in place of identifier type in some places but not others.

Also, it seems that I can assign and integer to an Invoice_Number variable, and I can compare an Invoice_Number values with an integer, but you don't mention assigning/comparing between Invoice_Number and Amount variables and values.  I believe you don't intend those to be legal.  What about arithmetic operations?  Can I subtract an Amount from an Invoice_Number?  Can I concatenate a Firstname with a blank and a LastName?

Assigning an integer to an Invoice_Number variable was a careless mistake. It should have been...

VAR Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number INIT(Invoice_Number(33));

...which is equivalent to:

VAR Invoice_Number Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number Invoice_Number INIT(Invoice_Number(33));

Am I right in assume there is no effect on RENAME and EXTEND as presently defined in TD?

It has no effect on RENAME, EXTEND, or anything else, except where <name> <type> can currently be used to declare a variable, parameter, or heading attribute (have I missed anything?) you could also use <dictionary_name> instead of <name> <type> -- assuming there is a dictionary entry of the form <dictionary_name> <type>, which automatically creates TYPE <dictionary_name> POSSREP {Value <type>} -- so that using <dictionary_name> in a variable, parameter, or heading attribute declaration instead of <name> <type> is shorthand for <dictionary_name> <dictionary_name>.

E.g., given:

DICTIONARY;
X CHAR;
END DICTIONARY;
DICTIONARY; X CHAR; END DICTIONARY;
DICTIONARY;
  X CHAR;
END DICTIONARY;

The following will be implicitly created:

TYPE X POSSREP {Value CHAR}
TYPE X POSSREP {Value CHAR}
TYPE X POSSREP {Value CHAR}

[Addendum: I wonder if it might be useful to be able to optionally specify in the DICTIONARY section those elements that are not to be wrapped and are to be defined as the specified type. E.g., something like 'Customer_ID INT UNWRAP' means that rather than automatically creating TYPE Customer_ID POSSREP {Value INT} and declaring attributes/variables/parameters named Customer_ID as type Customer_ID, they'd be declared to be type INT.]

And a declaration like...

VAR X;
VAR X;
VAR X;

...is shorthand for:

VAR X X;
VAR X X;
VAR X X;

In short, it provides the "classic" features of a data dictionary, with added type safety.

Quote from Hugh on May 1, 2021, 3:16 pm
Quote from Dave Voorhis on April 30, 2021, 2:44 pm

Thinking a bit more on 'DICTIONARY', etc., maybe it would be desirable to be able to do this:

DICTIONARY;
Customer_ID CHAR;
Customer_Phone, Customer_Phone2 INT;
Invoice_Number INT;
Invoice_Date Date;
Amount RATIONAL;
END DICTIONARY;
DICTIONARY; Customer_ID CHAR; Customer_Phone, Customer_Phone2 INT; Invoice_Number INT; Invoice_Date Date; Amount RATIONAL; END DICTIONARY;
DICTIONARY;
  Customer_ID CHAR;
  Customer_Phone, Customer_Phone2 INT;
  Invoice_Number INT;
  Invoice_Date Date;
  Amount RATIONAL;
END DICTIONARY;

So you can say this...

VAR Customers REAL RELATION {Customer_ID, Customer_Phone, Customer_Phone2} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID, Customer_Phone, Customer_Phone2} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID, Customer_Phone, Customer_Phone2} KEY {Customer_ID};

...which is shorthand for:

VAR Customers REAL RELATION {Customer_ID Customer_ID, Customer_Phone Customer_Phone, Customer_Phone2 Customer_Phone} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID Customer_ID, Customer_Phone Customer_Phone, Customer_Phone2 Customer_Phone} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID Customer_ID, Customer_Phone Customer_Phone, Customer_Phone2 Customer_Phone} KEY {Customer_ID};

Yet another shorthand given in advance of acceptance of the base idea.  We have to be sure that works before considering this addition.

I assume you mean that multiple element names on the same dictionary element are just synonyms.  Right?  If so, I'm reminded that synonyms sometimes give rise to problems, so I think this one might need more thought.

If the DICTIONARY idea is acceptable, it's arguably necessary rather than being an (optional?) addition.

I would definitely not describe multiple element names on the same dictionary element as synonyms. Multiple element names on the same dictionary element define distinct dictionary elements of the same type.

It allows you to declare multiple attributes in a relvar to have the same type using DICTIONARY entries to specify them, which you otherwise couldn't do using DICTIONARY entries.

I misunderstood your use of multiple elements names on the same dictionary element because your example gives two distinct dictionary elements using the same underlying type CHAR.

Given a DICTIONARY like...

DICTIONARY;
Customer_ID, Customer_ID2 INT;
Invoice_Number INT;
END DICTIONARY;
DICTIONARY; Customer_ID, Customer_ID2 INT; Invoice_Number INT; END DICTIONARY;
DICTIONARY;
  Customer_ID, Customer_ID2 INT;
  Invoice_Number INT;
END DICTIONARY;

...it specifies that Customer_ID and Customer_ID2 are both of type Customer_ID which wraps an INTEGER, and are type compatible.

Customer_ID and Invoice_Number are distinct types, both of which wrap an INTEGER. They are not type compatible.

I wouldn't object to any genuine shorthand, though I might have an opinion on its value.  It's difficult for me to judge the wisdom of this one because (a) I don't have a full understanding of the problem it seeks to address (saying it's the same as Tobega's doesn't help me),

The fundamental problem it seeks to address is avoiding type compatibility which accidentally results in error -- things like inadvertently JOINing an invoice ID and a product ID, because they're both named ID and have the same INTEGER type.

Apparently, this sort of thing is quite common in the SQL world, particularly when working with large and relatively unfamiliar schemas, like those in commercial bought-in products.

and (b) the extent to which you are addressing it isn't clear either.

If the DICTIONARY facility is used, each new entry is a unique type (unless explicitly declared otherwise.) Thus, inadvertent type compatibility issues are virtually eliminated. There is also value in having a data dictionary that identifies every data element / attribute, but that's a separate benefit.

You provide a shorthand for defining types that have a single possrep with a single possrep component, without defining any operators in addition to those systematically implied  by the type definition (such as THE_Value(Customer_Id)).  That tells me that solutions for the perceived problem are already available in TD as defined.  Do dictionaries offer any additional advantages?

Yes.

It uses an explicit data dictionary, which is of value for clearly identifying every possible data element / attribute.

It reduces verbosity, and simplifies gaining safety.

And, yes, you can do everything in Tutorial D as currently defined without the DICTIONARY facility. In fact, I often use it that way in Rel. But it's verbose, and you don't get the benefits of an explicit data dictionary.

Regarding the perceived problem, is it really just concerned with ill-advised comparisons, especially those that can arise "by accident", being implicitly involved in operations such as join?

That, and ill-advised mathematical operations on numbers, and ill-advised concatenation of unrelated (i.e., different type) strings, and so on.

In other words, it gives you all the benefits you gain from type safety in general. It's simply a means to encourage type safety where, arguably, type safety should be encouraged. We really shouldn't be treating a Customer ID and a Product ID as the same type, because they're not the same type, even though it's reasonable for both to be based on an integer or a string.

You might want to outlaw taking averages of part numbers too,

Yes, and my approach implicitly outlaws it.

but if anybody really wants to go out of their way do that they probably do have some good reason!

If they have a good reason to do it, they can. This isn't allowed:

AVG(Customers, Customer_ID)
AVG(Customers, Customer_ID)
AVG(Customers, Customer_ID)

But this is allowed:

AVG(Customers, THE_Value(Customer_ID))
AVG(Customers, THE_Value(Customer_ID))
AVG(Customers, THE_Value(Customer_ID))

It has the added benefit of making it explicit what you're doing.

Thanks for all the clarification, with answers as I was expecting.  Just wanted to make sure.

Regarding Customer_ID, Customer_ID2 INT, it seems that the first name is special, and this is an additional shorthand for:

Customer_ID INT;
Customer_ID2 Customer_ID;

assuming that the first statement really does take effect before the second.

The first name is (intentionally) also the name of the generated type. The second (and subsequent) names in the same element definition provide allow subsequent shorthand attribute/variable/parameter definitions with those names, but do not define new types because they are all the type of the first name.

So...

DICTIONARY;
Customer_ID, Customer_ID2, Customer_Identifier, Customer_ID3 INT;
END DICTIONARY;
DICTIONARY; Customer_ID, Customer_ID2, Customer_Identifier, Customer_ID3 INT; END DICTIONARY;
DICTIONARY;
  Customer_ID, Customer_ID2, Customer_Identifier, Customer_ID3 INT;
END DICTIONARY;

...defines a type...

TYPE Customer_ID POSSREP {Value INT};
TYPE Customer_ID POSSREP {Value INT};
TYPE Customer_ID POSSREP {Value INT};

...and allows definitions like...

VAR Customer_ID;
VAR Customer_ID2 INIT(Customer_ID(222));
VAR MyVar REAL RELATION {Customer_Identifier, Customer_ID3, Address CHAR} KEY {Customer_Identifier};
VAR Customer_ID; VAR Customer_ID2 INIT(Customer_ID(222)); VAR MyVar REAL RELATION {Customer_Identifier, Customer_ID3, Address CHAR} KEY {Customer_Identifier};
VAR Customer_ID;
VAR Customer_ID2 INIT(Customer_ID(222));
VAR MyVar REAL RELATION {Customer_Identifier, Customer_ID3, Address CHAR} KEY {Customer_Identifier};

...which is shorthand for:

VAR Customer_ID Customer_ID;
VAR Customer_ID2 Customer_ID INIT(Customer_ID(222));
VAR MyVar REAL RELATION {Customer_Identifier Customer_ID, Customer_ID3 Customer_ID, Address CHAR} KEY {Customer_Identifier};
VAR Customer_ID Customer_ID; VAR Customer_ID2 Customer_ID INIT(Customer_ID(222)); VAR MyVar REAL RELATION {Customer_Identifier Customer_ID, Customer_ID3 Customer_ID, Address CHAR} KEY {Customer_Identifier};
VAR Customer_ID Customer_ID;
VAR Customer_ID2 Customer_ID INIT(Customer_ID(222));
VAR MyVar REAL RELATION {Customer_Identifier Customer_ID, Customer_ID3 Customer_ID, Address CHAR} KEY {Customer_Identifier};

If dictionary entry gives rise to an error, is the whole Dictionary statement rejected?

Yes.

I note that a Dictionary statement isn't named, so I assume that you envisage no additional catalog entries and that the types defined just appear in the catalog in the usual way.  I.e., a dictionary isn't a persistent object itself.

A dictionary isn't named because there's at most one per database, but it might be of practical value to provide a set of ALTER DICTIONARY ... commands to add new, delete unused, or even (maybe) change type of dictionary entries.

Since a dictionary entry isn't just a type -- it also specifies possible attribute names of a specified type -- those would need to exist in the catalog in exactly the same way types, relvar definitions, and other useful metadata are kept in the catalog.

So the first name in a dictionary entry is a type name but subsequent ones are not.  I think you need a word for what they are and perhaps some clearer syntax, such as AlsoUsedFor(n1, n2, ...).  Just a mild suggestion.

Anyway, yes, now I see that a catalog extension is needed.  I expect that would be relvar sys.Dictionary in Rel.

I still waver.

Hugh

Coauthor of The Third Manifesto and related books.
Quote from Hugh on May 4, 2021, 2:02 pm
Quote from Dave Voorhis on May 4, 2021, 1:34 pm
Quote from Hugh on May 4, 2021, 10:49 am
Quote from Dave Voorhis on May 3, 2021, 3:23 pm
Quote from Hugh on May 3, 2021, 2:25 pm
Quote from Dave Voorhis on May 1, 2021, 7:17 pm
Quote from Hugh on May 1, 2021, 3:12 pm
Quote from Dave Voorhis on April 30, 2021, 1:46 pm
Quote from Hugh on April 30, 2021, 1:12 pm
Quote from tobega on April 29, 2021, 5:22 pm
Quote from Hugh on April 29, 2021, 3:06 pm
Quote from tobega on April 28, 2021, 3:25 pm
Quote from Hugh on April 28, 2021, 2:32 pm
Quote from Hugh on April 28, 2021, 10:36 am
Quote from tobega on April 28, 2021, 6:47 am

On the subject of type system for a language capable of hosting a D (and also for Tailspin, of course), we have observed that Tuples must be structurally typed, i.e. the attributes they contain define them as the product type of those attributes.

As a counterpoint to a previous thread here, I propose that Tuples be THE way to create product types.

The latest insight (or train-wreck) that I had, is that we should let attributes define types, i.e. instead of saying that an attribute has a type, we say that an attribute is a type. I think this fits very nicely with the natural join and that we take the position that things with the same name are the same kind of things. It also fits in with a good practice to create specific types for specific things, even if in Java it is a bit of a pain to e.g. create a SupplierName class that simply wraps a String.

So we would declare that there is a type called PNAME of the base type string, and the type called SNAME of the base type string, and you just use them as attributes in the Tuples, the type and the attribute have the same name.

Obviously you cannot assign an SNAME value to a PNAME attribute without casting it. But you could e.g. have a COMPANY_NAME and have SNAME be of the type COMPANY_NAME, which would enable assigning between the two.

So, comments? Good idea? Insane idea?

I've seen other replies.  It doesn't look like a good idea to me but in any case clarification is needed.  Please give examples of type definitions for, e.g., SNAME and PNAME, preferably using TD-like syntax.  I assume you imagine a relation type definition to be like TD's but with just attribute type names as heading components: REL{SNO, SNAME, CITY

What do you think a value of an attribute type looks like.   Please give a literal denoting the supplier name Smith.

What are the implications for the relational RENAME operator?

Hugh

P.S.  Perhaps more appropriate, what about EXTEND?  In particular, I have a query that involves extension with concatenation of FirstName and LastName (with a blank in between).  How is that done?

Answering once for both of your posts.

I wouldn't necessarily change anything from the TD syntax, if that is the syntax you want, so

TUPLE { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR }
TUPLE { S# S#('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London' }
TUPLE { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR } TUPLE { S# S#('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London' }
TUPLE { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR }

TUPLE { S# S#('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London' }

But what would happen is that we would now automatically also have the types SNAME, STATUS and CITY defined, and the "real" type of the CITY attribute would be CITY, but it could be assigned values of the representation type CHAR.

I believe I answered the RENAME case in my reply to Darren.

As for EXTEND, I suppose that would work similarly in that you would have to explicitly assert the conversion to appropriate types. So FirstName is what? CHAR? And LastName might also be CHAR. So lets look at TD syntax:

EXTEND a ADD (CHAR(FirstName)+' '+CHAR(LastName) AS FullName)
EXTEND a ADD (CHAR(FirstName)+' '+CHAR(LastName) AS FullName)
EXTEND a ADD (CHAR(FirstName)+' '+CHAR(LastName) AS FullName)

If FullName has not been previously defined, there will now be a type "FullName CHAR", and any attribute FullName can be assumed to be of type FullName, without needing specification. But it wouldn't necessarily hurt to respecify "FullName CHAR". However, trying to specify e.g. "FullName INT" would not be allowed anywhere, you would have to forego that option.

Oh, I didn't see this one until after I had responded to Dave Voorhis.  So I didn't misunderstand but you have now clarified.  The question now concerns the scope of this defined-on-the-fly type Fullname.  It has to be local to the expression in which it is defined, otherwise I would object strongly.  And it is is local to the expression, I can't see much point.

Hugh

There wouldn't be much point in having it just local to the expression. I propose that the specific type definitions for each attribute that Dave provided is a "best practice" or at least a "good practice", so we can just let them be automatically defined from the attribute definition. This would lead the developer in the right direction for the price of a slight inconvenience on the rare (?) occasion when you would have wanted an attribute with the same name but of a different type.

Thank you for confirming the pointlessness of a type definition local to the expression in which it is defined.  In that case, this particular aspect of your idea bothers me greatly.

First, it gets into the dodgy are of expression evaluation having side-effects.  That alone is sufficient grounds for rejection afaiacs.

Secondly,  what happens if  "exp1 AS FullName" and "exp2 AS FullName" specifications appear in the same overall relation expression, where exp1 andexp2 re of different types?

Thirdly, what happens if "exp2 AS FullName" appears in a subsequent statement (perhaps a year or so later, if you are really thinking of global scope)?

It is unthinkable, to my mind, that somebody innocently entering an ad hoc query might invalidate somebody else's ad hoc query in this way.

Sorry if points like this have already been raised and possibly addressed.  I know there has been a lot of correspondence but my brief glances have given me the impression that your idea has received some interest and even sympathy in some quarters.  So I might be missing something.

Hugh

Hugh

Would something like this be more palatable?

DICTIONARY;
Customer_ID CHAR;
Customer_Phone INT;
Invoice_Number INT;
Invoice_Date Date;
Amount RATIONAL;
END DICTIONARY;
VAR Customers REAL RELATION {Customer_ID, Customer_Phone} KEY {Customer_ID};
VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number};
CONSTRAINT Invoices_Customers_FK Invoices {Customer_ID} ⊆ Customers {Customer_ID};
DICTIONARY; Customer_ID CHAR; Customer_Phone INT; Invoice_Number INT; Invoice_Date Date; Amount RATIONAL; END DICTIONARY; VAR Customers REAL RELATION {Customer_ID, Customer_Phone} KEY {Customer_ID}; VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number}; CONSTRAINT Invoices_Customers_FK Invoices {Customer_ID} ⊆ Customers {Customer_ID};
DICTIONARY;
  Customer_ID CHAR;
  Customer_Phone INT;
  Invoice_Number INT;
  Invoice_Date Date;
  Amount RATIONAL;
END DICTIONARY;

VAR Customers REAL RELATION {Customer_ID, Customer_Phone} KEY {Customer_ID};
VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number};

CONSTRAINT Invoices_Customers_FK  Invoices {Customer_ID} ⊆ Customers {Customer_ID};

The effect of the DICTIONARY block is to define a data dictionary of attributes, which implicitly creates the following types...

TYPE Customer_ID POSSREP {Value CHAR};
TYPE Customer_Phone POSSREP {Value INT};
TYPE Invoice_Number POSSREP {Value INT};
TYPE Invoice_Date POSSREP {Value Date};
TYPE Amount POSSREP {Value RATIONAL};
TYPE Customer_ID POSSREP {Value CHAR}; TYPE Customer_Phone POSSREP {Value INT}; TYPE Invoice_Number POSSREP {Value INT}; TYPE Invoice_Date POSSREP {Value Date}; TYPE Amount POSSREP {Value RATIONAL};
TYPE Customer_ID POSSREP {Value CHAR};
TYPE Customer_Phone POSSREP {Value INT}; 
TYPE Invoice_Number POSSREP {Value INT}; 
TYPE Invoice_Date POSSREP {Value Date}; 
TYPE Amount POSSREP {Value RATIONAL};

...and allows declarations like

VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number};

VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number}; to be shorthand for:

VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};

This allows the compiler to check that we're not going to accidently multiply a Customer_Phone by an Invoice_Number, etc.

Ideally, any declaration of the form <identifier> <type> should be replaceable with <dictionary_name>, e.g. this...

VAR Invoice_Number INIT(33);
VAR Invoice_Number INIT(33);
VAR Invoice_Number INIT(33);

...is the same as this:

VAR Invoice_Number Invoice_Number INIT(33);
VAR Invoice_Number Invoice_Number INIT(33);
VAR Invoice_Number Invoice_Number INIT(33);

I imagine it would an entirely optional feature; no existing Tutorial D code would be broken by adding the facility, and anyone who doesn't want to use DICTIONARY and the <dictionary_name> shorthands for <identifier> <type> could ignore them.

This is a bit too much to digest.  It would be better not to include all these shorthands to begin with (such as omitting the type on a VAR declaration to make it default to the variable name (or are we omitting the variable name and making it default to the type name?)).

Actually, the whole reason for having the DICTIONARY ... END DICTIONARY declaration and being able to use <name> <type> --> <dictionary name> shorthands is because the shorthands are desirable. Of course, you can do everything in Tutorial D without the DICTIONARY ... END DICTIONARY declaration or being able to use <name> <type> --> <dictionary name> shorthands...

Or, the <name> <type> --> <dictionary name> shorthands could be removed and just have DICTIONARY, but it would mean a lot of unnecessarily repetitive declarations like:

VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};

Anyway, it doesn't seem anything like Tobega's idea.  Is it solving or addressing the same problem(s) as Tobega's?

It is notionally addressing the same problem as Tobega's, but taking a different approach that also addresses a desire for data dictionaries, mentioned elsewhere in this thread.

I conclude from the examples that dictionary element names are independent from attributes of headings.  That's okay, but you do call a dictionary a dictionary of attributes whereas VAR Invoice_Number Invoice_Number INIT(33) shows Invoice_Number be used for something other than an attribute.

Dictionary element names define identifier type pairs. They could be restricted to attributes of headings, but it might seem arbitrarily restrictive to allow a dictionary name to be used in place of identifier type in some places but not others.

Also, it seems that I can assign and integer to an Invoice_Number variable, and I can compare an Invoice_Number values with an integer, but you don't mention assigning/comparing between Invoice_Number and Amount variables and values.  I believe you don't intend those to be legal.  What about arithmetic operations?  Can I subtract an Amount from an Invoice_Number?  Can I concatenate a Firstname with a blank and a LastName?

Assigning an integer to an Invoice_Number variable was a careless mistake. It should have been...

VAR Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number INIT(Invoice_Number(33));

...which is equivalent to:

VAR Invoice_Number Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number Invoice_Number INIT(Invoice_Number(33));

Am I right in assume there is no effect on RENAME and EXTEND as presently defined in TD?

It has no effect on RENAME, EXTEND, or anything else, except where <name> <type> can currently be used to declare a variable, parameter, or heading attribute (have I missed anything?) you could also use <dictionary_name> instead of <name> <type> -- assuming there is a dictionary entry of the form <dictionary_name> <type>, which automatically creates TYPE <dictionary_name> POSSREP {Value <type>} -- so that using <dictionary_name> in a variable, parameter, or heading attribute declaration instead of <name> <type> is shorthand for <dictionary_name> <dictionary_name>.

E.g., given:

DICTIONARY;
X CHAR;
END DICTIONARY;
DICTIONARY; X CHAR; END DICTIONARY;
DICTIONARY;
  X CHAR;
END DICTIONARY;

The following will be implicitly created:

TYPE X POSSREP {Value CHAR}
TYPE X POSSREP {Value CHAR}
TYPE X POSSREP {Value CHAR}

[Addendum: I wonder if it might be useful to be able to optionally specify in the DICTIONARY section those elements that are not to be wrapped and are to be defined as the specified type. E.g., something like 'Customer_ID INT UNWRAP' means that rather than automatically creating TYPE Customer_ID POSSREP {Value INT} and declaring attributes/variables/parameters named Customer_ID as type Customer_ID, they'd be declared to be type INT.]

And a declaration like...

VAR X;
VAR X;
VAR X;

...is shorthand for:

VAR X X;
VAR X X;
VAR X X;

In short, it provides the "classic" features of a data dictionary, with added type safety.

Quote from Hugh on May 1, 2021, 3:16 pm
Quote from Dave Voorhis on April 30, 2021, 2:44 pm

Thinking a bit more on 'DICTIONARY', etc., maybe it would be desirable to be able to do this:

DICTIONARY;
Customer_ID CHAR;
Customer_Phone, Customer_Phone2 INT;
Invoice_Number INT;
Invoice_Date Date;
Amount RATIONAL;
END DICTIONARY;
DICTIONARY; Customer_ID CHAR; Customer_Phone, Customer_Phone2 INT; Invoice_Number INT; Invoice_Date Date; Amount RATIONAL; END DICTIONARY;
DICTIONARY;
  Customer_ID CHAR;
  Customer_Phone, Customer_Phone2 INT;
  Invoice_Number INT;
  Invoice_Date Date;
  Amount RATIONAL;
END DICTIONARY;

So you can say this...

VAR Customers REAL RELATION {Customer_ID, Customer_Phone, Customer_Phone2} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID, Customer_Phone, Customer_Phone2} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID, Customer_Phone, Customer_Phone2} KEY {Customer_ID};

...which is shorthand for:

VAR Customers REAL RELATION {Customer_ID Customer_ID, Customer_Phone Customer_Phone, Customer_Phone2 Customer_Phone} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID Customer_ID, Customer_Phone Customer_Phone, Customer_Phone2 Customer_Phone} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID Customer_ID, Customer_Phone Customer_Phone, Customer_Phone2 Customer_Phone} KEY {Customer_ID};

Yet another shorthand given in advance of acceptance of the base idea.  We have to be sure that works before considering this addition.

I assume you mean that multiple element names on the same dictionary element are just synonyms.  Right?  If so, I'm reminded that synonyms sometimes give rise to problems, so I think this one might need more thought.

If the DICTIONARY idea is acceptable, it's arguably necessary rather than being an (optional?) addition.

I would definitely not describe multiple element names on the same dictionary element as synonyms. Multiple element names on the same dictionary element define distinct dictionary elements of the same type.

It allows you to declare multiple attributes in a relvar to have the same type using DICTIONARY entries to specify them, which you otherwise couldn't do using DICTIONARY entries.

I misunderstood your use of multiple elements names on the same dictionary element because your example gives two distinct dictionary elements using the same underlying type CHAR.

Given a DICTIONARY like...

DICTIONARY;
Customer_ID, Customer_ID2 INT;
Invoice_Number INT;
END DICTIONARY;
DICTIONARY; Customer_ID, Customer_ID2 INT; Invoice_Number INT; END DICTIONARY;
DICTIONARY;
  Customer_ID, Customer_ID2 INT;
  Invoice_Number INT;
END DICTIONARY;

...it specifies that Customer_ID and Customer_ID2 are both of type Customer_ID which wraps an INTEGER, and are type compatible.

Customer_ID and Invoice_Number are distinct types, both of which wrap an INTEGER. They are not type compatible.

I wouldn't object to any genuine shorthand, though I might have an opinion on its value.  It's difficult for me to judge the wisdom of this one because (a) I don't have a full understanding of the problem it seeks to address (saying it's the same as Tobega's doesn't help me),

The fundamental problem it seeks to address is avoiding type compatibility which accidentally results in error -- things like inadvertently JOINing an invoice ID and a product ID, because they're both named ID and have the same INTEGER type.

Apparently, this sort of thing is quite common in the SQL world, particularly when working with large and relatively unfamiliar schemas, like those in commercial bought-in products.

and (b) the extent to which you are addressing it isn't clear either.

If the DICTIONARY facility is used, each new entry is a unique type (unless explicitly declared otherwise.) Thus, inadvertent type compatibility issues are virtually eliminated. There is also value in having a data dictionary that identifies every data element / attribute, but that's a separate benefit.

You provide a shorthand for defining types that have a single possrep with a single possrep component, without defining any operators in addition to those systematically implied  by the type definition (such as THE_Value(Customer_Id)).  That tells me that solutions for the perceived problem are already available in TD as defined.  Do dictionaries offer any additional advantages?

Yes.

It uses an explicit data dictionary, which is of value for clearly identifying every possible data element / attribute.

It reduces verbosity, and simplifies gaining safety.

And, yes, you can do everything in Tutorial D as currently defined without the DICTIONARY facility. In fact, I often use it that way in Rel. But it's verbose, and you don't get the benefits of an explicit data dictionary.

Regarding the perceived problem, is it really just concerned with ill-advised comparisons, especially those that can arise "by accident", being implicitly involved in operations such as join?

That, and ill-advised mathematical operations on numbers, and ill-advised concatenation of unrelated (i.e., different type) strings, and so on.

In other words, it gives you all the benefits you gain from type safety in general. It's simply a means to encourage type safety where, arguably, type safety should be encouraged. We really shouldn't be treating a Customer ID and a Product ID as the same type, because they're not the same type, even though it's reasonable for both to be based on an integer or a string.

You might want to outlaw taking averages of part numbers too,

Yes, and my approach implicitly outlaws it.

but if anybody really wants to go out of their way do that they probably do have some good reason!

If they have a good reason to do it, they can. This isn't allowed:

AVG(Customers, Customer_ID)
AVG(Customers, Customer_ID)
AVG(Customers, Customer_ID)

But this is allowed:

AVG(Customers, THE_Value(Customer_ID))
AVG(Customers, THE_Value(Customer_ID))
AVG(Customers, THE_Value(Customer_ID))

It has the added benefit of making it explicit what you're doing.

Thanks for all the clarification, with answers as I was expecting.  Just wanted to make sure.

Regarding Customer_ID, Customer_ID2 INT, it seems that the first name is special, and this is an additional shorthand for:

Customer_ID INT;
Customer_ID2 Customer_ID;

assuming that the first statement really does take effect before the second.

The first name is (intentionally) also the name of the generated type. The second (and subsequent) names in the same element definition provide allow subsequent shorthand attribute/variable/parameter definitions with those names, but do not define new types because they are all the type of the first name.

So...

DICTIONARY;
Customer_ID, Customer_ID2, Customer_Identifier, Customer_ID3 INT;
END DICTIONARY;
DICTIONARY; Customer_ID, Customer_ID2, Customer_Identifier, Customer_ID3 INT; END DICTIONARY;
DICTIONARY;
  Customer_ID, Customer_ID2, Customer_Identifier, Customer_ID3 INT;
END DICTIONARY;

...defines a type...

TYPE Customer_ID POSSREP {Value INT};
TYPE Customer_ID POSSREP {Value INT};
TYPE Customer_ID POSSREP {Value INT};

...and allows definitions like...

VAR Customer_ID;
VAR Customer_ID2 INIT(Customer_ID(222));
VAR MyVar REAL RELATION {Customer_Identifier, Customer_ID3, Address CHAR} KEY {Customer_Identifier};
VAR Customer_ID; VAR Customer_ID2 INIT(Customer_ID(222)); VAR MyVar REAL RELATION {Customer_Identifier, Customer_ID3, Address CHAR} KEY {Customer_Identifier};
VAR Customer_ID;
VAR Customer_ID2 INIT(Customer_ID(222));
VAR MyVar REAL RELATION {Customer_Identifier, Customer_ID3, Address CHAR} KEY {Customer_Identifier};

...which is shorthand for:

VAR Customer_ID Customer_ID;
VAR Customer_ID2 Customer_ID INIT(Customer_ID(222));
VAR MyVar REAL RELATION {Customer_Identifier Customer_ID, Customer_ID3 Customer_ID, Address CHAR} KEY {Customer_Identifier};
VAR Customer_ID Customer_ID; VAR Customer_ID2 Customer_ID INIT(Customer_ID(222)); VAR MyVar REAL RELATION {Customer_Identifier Customer_ID, Customer_ID3 Customer_ID, Address CHAR} KEY {Customer_Identifier};
VAR Customer_ID Customer_ID;
VAR Customer_ID2 Customer_ID INIT(Customer_ID(222));
VAR MyVar REAL RELATION {Customer_Identifier Customer_ID, Customer_ID3 Customer_ID, Address CHAR} KEY {Customer_Identifier};

If dictionary entry gives rise to an error, is the whole Dictionary statement rejected?

Yes.

I note that a Dictionary statement isn't named, so I assume that you envisage no additional catalog entries and that the types defined just appear in the catalog in the usual way.  I.e., a dictionary isn't a persistent object itself.

A dictionary isn't named because there's at most one per database, but it might be of practical value to provide a set of ALTER DICTIONARY ... commands to add new, delete unused, or even (maybe) change type of dictionary entries.

Since a dictionary entry isn't just a type -- it also specifies possible attribute names of a specified type -- those would need to exist in the catalog in exactly the same way types, relvar definitions, and other useful metadata are kept in the catalog.

So the first name in a dictionary entry is a type name but subsequent ones are not.

Yes. But the type name isn't particularly important, and (ideally) for most purposes can be ignored, because you think in terms of the data element defined by the dictionary rather than the type name.

In other words, I'd rather think of it as, "The names in a dictionary entry define possible attributes, variables and parameters all of the same new type, which wraps the specified existing type. (Oh, and by the way, the name of the new type is the same as the first name in the dictionary entry.)"

  I think you need a word for what they are and perhaps some clearer syntax, such as AlsoUsedFor(n1, n2, ...).  Just a mild suggestion.

I'm not sure the special syntax is necessary, since from the user's point of view it almost never matters, and when it does, the first name is also the type name.

Maybe call the first name in an entry the primary dictionary element name and the additional names are additional dictionary element names.

Anyway, yes, now I see that a catalog extension is needed.  I expect that would be relvar sys.Dictionary in Rel.

Yes, probably sys.Dictionary for the primary name and associated metadata and sys.DictionaryAdditionalNames for additional names. Perhaps something like:

VAR sys.Dictionary REAL RELATION {PrimaryName CHAR, DeclaredType CHAR, Unwrapped BOOLEAN} KEY {PrimaryName};
VAR sys.DictionaryAdditionalNames REAL RELATION {PrimaryName CHAR, AdditionalName CHAR} KEY {PrimaryName, AdditionalName};
CONSTRAINT sys.DictionaryAdditionalNames_Dictionary_FK  sys.DictionaryAdditionalNames {PrimaryName} ⊆ sys.Dictionary {PrimaryName};

 

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on May 4, 2021, 2:30 pm
Quote from Hugh on May 4, 2021, 2:02 pm
Quote from Dave Voorhis on May 4, 2021, 1:34 pm
Quote from Hugh on May 4, 2021, 10:49 am
Quote from Dave Voorhis on May 3, 2021, 3:23 pm
Quote from Hugh on May 3, 2021, 2:25 pm
Quote from Dave Voorhis on May 1, 2021, 7:17 pm
Quote from Hugh on May 1, 2021, 3:12 pm
Quote from Dave Voorhis on April 30, 2021, 1:46 pm
Quote from Hugh on April 30, 2021, 1:12 pm
Quote from tobega on April 29, 2021, 5:22 pm
Quote from Hugh on April 29, 2021, 3:06 pm
Quote from tobega on April 28, 2021, 3:25 pm
Quote from Hugh on April 28, 2021, 2:32 pm
Quote from Hugh on April 28, 2021, 10:36 am
Quote from tobega on April 28, 2021, 6:47 am

On the subject of type system for a language capable of hosting a D (and also for Tailspin, of course), we have observed that Tuples must be structurally typed, i.e. the attributes they contain define them as the product type of those attributes.

As a counterpoint to a previous thread here, I propose that Tuples be THE way to create product types.

The latest insight (or train-wreck) that I had, is that we should let attributes define types, i.e. instead of saying that an attribute has a type, we say that an attribute is a type. I think this fits very nicely with the natural join and that we take the position that things with the same name are the same kind of things. It also fits in with a good practice to create specific types for specific things, even if in Java it is a bit of a pain to e.g. create a SupplierName class that simply wraps a String.

So we would declare that there is a type called PNAME of the base type string, and the type called SNAME of the base type string, and you just use them as attributes in the Tuples, the type and the attribute have the same name.

Obviously you cannot assign an SNAME value to a PNAME attribute without casting it. But you could e.g. have a COMPANY_NAME and have SNAME be of the type COMPANY_NAME, which would enable assigning between the two.

So, comments? Good idea? Insane idea?

I've seen other replies.  It doesn't look like a good idea to me but in any case clarification is needed.  Please give examples of type definitions for, e.g., SNAME and PNAME, preferably using TD-like syntax.  I assume you imagine a relation type definition to be like TD's but with just attribute type names as heading components: REL{SNO, SNAME, CITY

What do you think a value of an attribute type looks like.   Please give a literal denoting the supplier name Smith.

What are the implications for the relational RENAME operator?

Hugh

P.S.  Perhaps more appropriate, what about EXTEND?  In particular, I have a query that involves extension with concatenation of FirstName and LastName (with a blank in between).  How is that done?

Answering once for both of your posts.

I wouldn't necessarily change anything from the TD syntax, if that is the syntax you want, so

TUPLE { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR }
TUPLE { S# S#('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London' }
TUPLE { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR } TUPLE { S# S#('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London' }
TUPLE { S# S#, SNAME NAME, STATUS INTEGER, CITY CHAR }

TUPLE { S# S#('S1'), SNAME NAME('Smith'), STATUS 20, CITY 'London' }

But what would happen is that we would now automatically also have the types SNAME, STATUS and CITY defined, and the "real" type of the CITY attribute would be CITY, but it could be assigned values of the representation type CHAR.

I believe I answered the RENAME case in my reply to Darren.

As for EXTEND, I suppose that would work similarly in that you would have to explicitly assert the conversion to appropriate types. So FirstName is what? CHAR? And LastName might also be CHAR. So lets look at TD syntax:

EXTEND a ADD (CHAR(FirstName)+' '+CHAR(LastName) AS FullName)
EXTEND a ADD (CHAR(FirstName)+' '+CHAR(LastName) AS FullName)
EXTEND a ADD (CHAR(FirstName)+' '+CHAR(LastName) AS FullName)

If FullName has not been previously defined, there will now be a type "FullName CHAR", and any attribute FullName can be assumed to be of type FullName, without needing specification. But it wouldn't necessarily hurt to respecify "FullName CHAR". However, trying to specify e.g. "FullName INT" would not be allowed anywhere, you would have to forego that option.

Oh, I didn't see this one until after I had responded to Dave Voorhis.  So I didn't misunderstand but you have now clarified.  The question now concerns the scope of this defined-on-the-fly type Fullname.  It has to be local to the expression in which it is defined, otherwise I would object strongly.  And it is is local to the expression, I can't see much point.

Hugh

There wouldn't be much point in having it just local to the expression. I propose that the specific type definitions for each attribute that Dave provided is a "best practice" or at least a "good practice", so we can just let them be automatically defined from the attribute definition. This would lead the developer in the right direction for the price of a slight inconvenience on the rare (?) occasion when you would have wanted an attribute with the same name but of a different type.

Thank you for confirming the pointlessness of a type definition local to the expression in which it is defined.  In that case, this particular aspect of your idea bothers me greatly.

First, it gets into the dodgy are of expression evaluation having side-effects.  That alone is sufficient grounds for rejection afaiacs.

Secondly,  what happens if  "exp1 AS FullName" and "exp2 AS FullName" specifications appear in the same overall relation expression, where exp1 andexp2 re of different types?

Thirdly, what happens if "exp2 AS FullName" appears in a subsequent statement (perhaps a year or so later, if you are really thinking of global scope)?

It is unthinkable, to my mind, that somebody innocently entering an ad hoc query might invalidate somebody else's ad hoc query in this way.

Sorry if points like this have already been raised and possibly addressed.  I know there has been a lot of correspondence but my brief glances have given me the impression that your idea has received some interest and even sympathy in some quarters.  So I might be missing something.

Hugh

Hugh

Would something like this be more palatable?

DICTIONARY;
Customer_ID CHAR;
Customer_Phone INT;
Invoice_Number INT;
Invoice_Date Date;
Amount RATIONAL;
END DICTIONARY;
VAR Customers REAL RELATION {Customer_ID, Customer_Phone} KEY {Customer_ID};
VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number};
CONSTRAINT Invoices_Customers_FK Invoices {Customer_ID} ⊆ Customers {Customer_ID};
DICTIONARY; Customer_ID CHAR; Customer_Phone INT; Invoice_Number INT; Invoice_Date Date; Amount RATIONAL; END DICTIONARY; VAR Customers REAL RELATION {Customer_ID, Customer_Phone} KEY {Customer_ID}; VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number}; CONSTRAINT Invoices_Customers_FK Invoices {Customer_ID} ⊆ Customers {Customer_ID};
DICTIONARY;
  Customer_ID CHAR;
  Customer_Phone INT;
  Invoice_Number INT;
  Invoice_Date Date;
  Amount RATIONAL;
END DICTIONARY;

VAR Customers REAL RELATION {Customer_ID, Customer_Phone} KEY {Customer_ID};
VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number};

CONSTRAINT Invoices_Customers_FK  Invoices {Customer_ID} ⊆ Customers {Customer_ID};

The effect of the DICTIONARY block is to define a data dictionary of attributes, which implicitly creates the following types...

TYPE Customer_ID POSSREP {Value CHAR};
TYPE Customer_Phone POSSREP {Value INT};
TYPE Invoice_Number POSSREP {Value INT};
TYPE Invoice_Date POSSREP {Value Date};
TYPE Amount POSSREP {Value RATIONAL};
TYPE Customer_ID POSSREP {Value CHAR}; TYPE Customer_Phone POSSREP {Value INT}; TYPE Invoice_Number POSSREP {Value INT}; TYPE Invoice_Date POSSREP {Value Date}; TYPE Amount POSSREP {Value RATIONAL};
TYPE Customer_ID POSSREP {Value CHAR};
TYPE Customer_Phone POSSREP {Value INT}; 
TYPE Invoice_Number POSSREP {Value INT}; 
TYPE Invoice_Date POSSREP {Value Date}; 
TYPE Amount POSSREP {Value RATIONAL};

...and allows declarations like

VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number};

VAR Invoices REAL RELATION {Invoice_Number, Invoice_Date, Customer_ID, Amount} KEY {Invoice_Number}; to be shorthand for:

VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};

This allows the compiler to check that we're not going to accidently multiply a Customer_Phone by an Invoice_Number, etc.

Ideally, any declaration of the form <identifier> <type> should be replaceable with <dictionary_name>, e.g. this...

VAR Invoice_Number INIT(33);
VAR Invoice_Number INIT(33);
VAR Invoice_Number INIT(33);

...is the same as this:

VAR Invoice_Number Invoice_Number INIT(33);
VAR Invoice_Number Invoice_Number INIT(33);
VAR Invoice_Number Invoice_Number INIT(33);

I imagine it would an entirely optional feature; no existing Tutorial D code would be broken by adding the facility, and anyone who doesn't want to use DICTIONARY and the <dictionary_name> shorthands for <identifier> <type> could ignore them.

This is a bit too much to digest.  It would be better not to include all these shorthands to begin with (such as omitting the type on a VAR declaration to make it default to the variable name (or are we omitting the variable name and making it default to the type name?)).

Actually, the whole reason for having the DICTIONARY ... END DICTIONARY declaration and being able to use <name> <type> --> <dictionary name> shorthands is because the shorthands are desirable. Of course, you can do everything in Tutorial D without the DICTIONARY ... END DICTIONARY declaration or being able to use <name> <type> --> <dictionary name> shorthands...

Or, the <name> <type> --> <dictionary name> shorthands could be removed and just have DICTIONARY, but it would mean a lot of unnecessarily repetitive declarations like:

VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};
VAR Invoices REAL RELATION {Invoice_Number Invoice_Number, Invoice_Date Invoice_Date, Customer_ID Customer_ID, Amount Amount} KEY {Invoice_Number};

Anyway, it doesn't seem anything like Tobega's idea.  Is it solving or addressing the same problem(s) as Tobega's?

It is notionally addressing the same problem as Tobega's, but taking a different approach that also addresses a desire for data dictionaries, mentioned elsewhere in this thread.

I conclude from the examples that dictionary element names are independent from attributes of headings.  That's okay, but you do call a dictionary a dictionary of attributes whereas VAR Invoice_Number Invoice_Number INIT(33) shows Invoice_Number be used for something other than an attribute.

Dictionary element names define identifier type pairs. They could be restricted to attributes of headings, but it might seem arbitrarily restrictive to allow a dictionary name to be used in place of identifier type in some places but not others.

Also, it seems that I can assign and integer to an Invoice_Number variable, and I can compare an Invoice_Number values with an integer, but you don't mention assigning/comparing between Invoice_Number and Amount variables and values.  I believe you don't intend those to be legal.  What about arithmetic operations?  Can I subtract an Amount from an Invoice_Number?  Can I concatenate a Firstname with a blank and a LastName?

Assigning an integer to an Invoice_Number variable was a careless mistake. It should have been...

VAR Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number INIT(Invoice_Number(33));

...which is equivalent to:

VAR Invoice_Number Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number Invoice_Number INIT(Invoice_Number(33));
VAR Invoice_Number Invoice_Number INIT(Invoice_Number(33));

Am I right in assume there is no effect on RENAME and EXTEND as presently defined in TD?

It has no effect on RENAME, EXTEND, or anything else, except where <name> <type> can currently be used to declare a variable, parameter, or heading attribute (have I missed anything?) you could also use <dictionary_name> instead of <name> <type> -- assuming there is a dictionary entry of the form <dictionary_name> <type>, which automatically creates TYPE <dictionary_name> POSSREP {Value <type>} -- so that using <dictionary_name> in a variable, parameter, or heading attribute declaration instead of <name> <type> is shorthand for <dictionary_name> <dictionary_name>.

E.g., given:

DICTIONARY;
X CHAR;
END DICTIONARY;
DICTIONARY; X CHAR; END DICTIONARY;
DICTIONARY;
  X CHAR;
END DICTIONARY;

The following will be implicitly created:

TYPE X POSSREP {Value CHAR}
TYPE X POSSREP {Value CHAR}
TYPE X POSSREP {Value CHAR}

[Addendum: I wonder if it might be useful to be able to optionally specify in the DICTIONARY section those elements that are not to be wrapped and are to be defined as the specified type. E.g., something like 'Customer_ID INT UNWRAP' means that rather than automatically creating TYPE Customer_ID POSSREP {Value INT} and declaring attributes/variables/parameters named Customer_ID as type Customer_ID, they'd be declared to be type INT.]

And a declaration like...

VAR X;
VAR X;
VAR X;

...is shorthand for:

VAR X X;
VAR X X;
VAR X X;

In short, it provides the "classic" features of a data dictionary, with added type safety.

Quote from Hugh on May 1, 2021, 3:16 pm
Quote from Dave Voorhis on April 30, 2021, 2:44 pm

Thinking a bit more on 'DICTIONARY', etc., maybe it would be desirable to be able to do this:

DICTIONARY;
Customer_ID CHAR;
Customer_Phone, Customer_Phone2 INT;
Invoice_Number INT;
Invoice_Date Date;
Amount RATIONAL;
END DICTIONARY;
DICTIONARY; Customer_ID CHAR; Customer_Phone, Customer_Phone2 INT; Invoice_Number INT; Invoice_Date Date; Amount RATIONAL; END DICTIONARY;
DICTIONARY;
  Customer_ID CHAR;
  Customer_Phone, Customer_Phone2 INT;
  Invoice_Number INT;
  Invoice_Date Date;
  Amount RATIONAL;
END DICTIONARY;

So you can say this...

VAR Customers REAL RELATION {Customer_ID, Customer_Phone, Customer_Phone2} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID, Customer_Phone, Customer_Phone2} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID, Customer_Phone, Customer_Phone2} KEY {Customer_ID};

...which is shorthand for:

VAR Customers REAL RELATION {Customer_ID Customer_ID, Customer_Phone Customer_Phone, Customer_Phone2 Customer_Phone} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID Customer_ID, Customer_Phone Customer_Phone, Customer_Phone2 Customer_Phone} KEY {Customer_ID};
VAR Customers REAL RELATION {Customer_ID Customer_ID, Customer_Phone Customer_Phone, Customer_Phone2 Customer_Phone} KEY {Customer_ID};

Yet another shorthand given in advance of acceptance of the base idea.  We have to be sure that works before considering this addition.

I assume you mean that multiple element names on the same dictionary element are just synonyms.  Right?  If so, I'm reminded that synonyms sometimes give rise to problems, so I think this one might need more thought.

If the DICTIONARY idea is acceptable, it's arguably necessary rather than being an (optional?) addition.

I would definitely not describe multiple element names on the same dictionary element as synonyms. Multiple element names on the same dictionary element define distinct dictionary elements of the same type.

It allows you to declare multiple attributes in a relvar to have the same type using DICTIONARY entries to specify them, which you otherwise couldn't do using DICTIONARY entries.

I misunderstood your use of multiple elements names on the same dictionary element because your example gives two distinct dictionary elements using the same underlying type CHAR.

Given a DICTIONARY like...

DICTIONARY;
Customer_ID, Customer_ID2 INT;
Invoice_Number INT;
END DICTIONARY;
DICTIONARY; Customer_ID, Customer_ID2 INT; Invoice_Number INT; END DICTIONARY;
DICTIONARY;
  Customer_ID, Customer_ID2 INT;
  Invoice_Number INT;
END DICTIONARY;

...it specifies that Customer_ID and Customer_ID2 are both of type Customer_ID which wraps an INTEGER, and are type compatible.

Customer_ID and Invoice_Number are distinct types, both of which wrap an INTEGER. They are not type compatible.

I wouldn't object to any genuine shorthand, though I might have an opinion on its value.  It's difficult for me to judge the wisdom of this one because (a) I don't have a full understanding of the problem it seeks to address (saying it's the same as Tobega's doesn't help me),

The fundamental problem it seeks to address is avoiding type compatibility which accidentally results in error -- things like inadvertently JOINing an invoice ID and a product ID, because they're both named ID and have the same INTEGER type.

Apparently, this sort of thing is quite common in the SQL world, particularly when working with large and relatively unfamiliar schemas, like those in commercial bought-in products.

and (b) the extent to which you are addressing it isn't clear either.

If the DICTIONARY facility is used, each new entry is a unique type (unless explicitly declared otherwise.) Thus, inadvertent type compatibility issues are virtually eliminated. There is also value in having a data dictionary that identifies every data element / attribute, but that's a separate benefit.

You provide a shorthand for defining types that have a single possrep with a single possrep component, without defining any operators in addition to those systematically implied  by the type definition (such as THE_Value(Customer_Id)).  That tells me that solutions for the perceived problem are already available in TD as defined.  Do dictionaries offer any additional advantages?

Yes.

It uses an explicit data dictionary, which is of value for clearly identifying every possible data element / attribute.

It reduces verbosity, and simplifies gaining safety.

And, yes, you can do everything in Tutorial D as currently defined without the DICTIONARY facility. In fact, I often use it that way in Rel. But it's verbose, and you don't get the benefits of an explicit data dictionary.

Regarding the perceived problem, is it really just concerned with ill-advised comparisons, especially those that can arise "by accident", being implicitly involved in operations such as join?

That, and ill-advised mathematical operations on numbers, and ill-advised concatenation of unrelated (i.e., different type) strings, and so on.

In other words, it gives you all the benefits you gain from type safety in general. It's simply a means to encourage type safety where, arguably, type safety should be encouraged. We really shouldn't be treating a Customer ID and a Product ID as the same type, because they're not the same type, even though it's reasonable for both to be based on an integer or a string.

You might want to outlaw taking averages of part numbers too,

Yes, and my approach implicitly outlaws it.

but if anybody really wants to go out of their way do that they probably do have some good reason!

If they have a good reason to do it, they can. This isn't allowed:

AVG(Customers, Customer_ID)
AVG(Customers, Customer_ID)
AVG(Customers, Customer_ID)

But this is allowed:

AVG(Customers, THE_Value(Customer_ID))
AVG(Customers, THE_Value(Customer_ID))
AVG(Customers, THE_Value(Customer_ID))

It has the added benefit of making it explicit what you're doing.

Thanks for all the clarification, with answers as I was expecting.  Just wanted to make sure.

Regarding Customer_ID, Customer_ID2 INT, it seems that the first name is special, and this is an additional shorthand for:

Customer_ID INT;
Customer_ID2 Customer_ID;

assuming that the first statement really does take effect before the second.

The first name is (intentionally) also the name of the generated type. The second (and subsequent) names in the same element definition provide allow subsequent shorthand attribute/variable/parameter definitions with those names, but do not define new types because they are all the type of the first name.

So...

DICTIONARY;
Customer_ID, Customer_ID2, Customer_Identifier, Customer_ID3 INT;
END DICTIONARY;
DICTIONARY; Customer_ID, Customer_ID2, Customer_Identifier, Customer_ID3 INT; END DICTIONARY;
DICTIONARY;
  Customer_ID, Customer_ID2, Customer_Identifier, Customer_ID3 INT;
END DICTIONARY;

...defines a type...

TYPE Customer_ID POSSREP {Value INT};
TYPE Customer_ID POSSREP {Value INT};
TYPE Customer_ID POSSREP {Value INT};

...and allows definitions like...

VAR Customer_ID;
VAR Customer_ID2 INIT(Customer_ID(222));
VAR MyVar REAL RELATION {Customer_Identifier, Customer_ID3, Address CHAR} KEY {Customer_Identifier};
VAR Customer_ID; VAR Customer_ID2 INIT(Customer_ID(222)); VAR MyVar REAL RELATION {Customer_Identifier, Customer_ID3, Address CHAR} KEY {Customer_Identifier};
VAR Customer_ID;
VAR Customer_ID2 INIT(Customer_ID(222));
VAR MyVar REAL RELATION {Customer_Identifier, Customer_ID3, Address CHAR} KEY {Customer_Identifier};

...which is shorthand for:

VAR Customer_ID Customer_ID;
VAR Customer_ID2 Customer_ID INIT(Customer_ID(222));
VAR MyVar REAL RELATION {Customer_Identifier Customer_ID, Customer_ID3 Customer_ID, Address CHAR} KEY {Customer_Identifier};
VAR Customer_ID Customer_ID; VAR Customer_ID2 Customer_ID INIT(Customer_ID(222)); VAR MyVar REAL RELATION {Customer_Identifier Customer_ID, Customer_ID3 Customer_ID, Address CHAR} KEY {Customer_Identifier};
VAR Customer_ID Customer_ID;
VAR Customer_ID2 Customer_ID INIT(Customer_ID(222));
VAR MyVar REAL RELATION {Customer_Identifier Customer_ID, Customer_ID3 Customer_ID, Address CHAR} KEY {Customer_Identifier};

If dictionary entry gives rise to an error, is the whole Dictionary statement rejected?

Yes.

I note that a Dictionary statement isn't named, so I assume that you envisage no additional catalog entries and that the types defined just appear in the catalog in the usual way.  I.e., a dictionary isn't a persistent object itself.

A dictionary isn't named because there's at most one per database, but it might be of practical value to provide a set of ALTER DICTIONARY ... commands to add new, delete unused, or even (maybe) change type of dictionary entries.

Since a dictionary entry isn't just a type -- it also specifies possible attribute names of a specified type -- those would need to exist in the catalog in exactly the same way types, relvar definitions, and other useful metadata are kept in the catalog.

So the first name in a dictionary entry is a type name but subsequent ones are not.

Yes. But the type name isn't particularly important, and (ideally) for most purposes can be ignored, because you think in terms of the data element defined by the dictionary rather than the type name.

In other words, I'd rather think of it as, "The names in a dictionary entry define possible attributes, variables and parameters all of the same new type, which wraps the specified existing type. (Oh, and by the way, the name of the new type is the same as the first name in the dictionary entry.)"

  I think you need a word for what they are and perhaps some clearer syntax, such as AlsoUsedFor(n1, n2, ...).  Just a mild suggestion.

I'm not sure the special syntax is necessary, since from the user's point of view it almost never matters, and when it does, the first name is also the type name.

Maybe call the first name in an entry the primary dictionary element name and the additional names are additional dictionary element names.

Anyway, yes, now I see that a catalog extension is needed.  I expect that would be relvar sys.Dictionary in Rel.

Yes, probably sys.Dictionary for the primary name and associated metadata and sys.DictionaryAdditionalNames for additional names. Perhaps something like:

VAR sys.Dictionary REAL RELATION {PrimaryName CHAR, DeclaredType CHAR, Unwrapped BOOLEAN} KEY {PrimaryName};
VAR sys.DictionaryAdditionalNames REAL RELATION {PrimaryName CHAR, AdditionalName CHAR} KEY {PrimaryName, AdditionalName};
CONSTRAINT sys.DictionaryAdditionalNames_Dictionary_FK sys.DictionaryAdditionalNames {PrimaryName} ⊆ sys.Dictionary {PrimaryName};
VAR sys.Dictionary REAL RELATION {PrimaryName CHAR, DeclaredType CHAR, Unwrapped BOOLEAN} KEY {PrimaryName}; VAR sys.DictionaryAdditionalNames REAL RELATION {PrimaryName CHAR, AdditionalName CHAR} KEY {PrimaryName, AdditionalName}; CONSTRAINT sys.DictionaryAdditionalNames_Dictionary_FK sys.DictionaryAdditionalNames {PrimaryName} ⊆ sys.Dictionary {PrimaryName};
VAR sys.Dictionary REAL RELATION {PrimaryName CHAR, DeclaredType CHAR, Unwrapped BOOLEAN} KEY {PrimaryName};
VAR sys.DictionaryAdditionalNames REAL RELATION {PrimaryName CHAR, AdditionalName CHAR} KEY {PrimaryName, AdditionalName};
CONSTRAINT sys.DictionaryAdditionalNames_Dictionary_FK  sys.DictionaryAdditionalNames {PrimaryName} ⊆ sys.Dictionary {PrimaryName};

Or, alternatively (and perhaps controversially):

VAR sys.Dictionary REAL RELATION {Name CHAR, DeclaredType CHAR, Unwrapped BOOLEAN, AdditionalNames RELATION {Name CHAR}} KEY {Name}; 

 

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
PreviousPage 5 of 5