The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Dictionaries

PreviousPage 2 of 3Next
Quote from Dave Voorhis on May 7, 2021, 11:04 am
Quote from dandl on May 7, 2021, 10:41 am
Quote from Dave Voorhis on May 7, 2021, 8:20 am
Quote from dandl on May 7, 2021, 12:16 am
Quote from Hugh on May 6, 2021, 11:06 am

I refer to Dave Voorhis’s “dictionary” proposal that has been advanced and discussed under the rubric Tuples FTW.

In brief, he proposes a shorthand for defining a set of user-defined types that each have just a single possrep and a single possrep component.  In each case the possrep name is the same as the type name and the possrep component name is Value.  The purpose is much the same as that of standard SQL’s so-called distinct types: to provide a way of avoiding traps arising from inappropriate comparisons, especially ones that are implicit in operations such as joins on relations.

On the face of it, isn't this just a type alise? Much like typedef in C? I'm a big fan, I use(d) it a lot.

No, it's not a type alias.

If you write typedef qty int in C, the type names qty and int become interchangeable; they are the same type.

If you write...

DICTIONARY;
QTY INT;
END DICTIONARY;
DICTIONARY; QTY INT; END DICTIONARY;
DICTIONARY;
  QTY INT;
END DICTIONARY;

...per my proposal, INT and QTY are not interchangeable. Instead, a new type QTY has been implicitly created with a single POSSREP component named Value of type INT.

So this is a new blind type, with no operations, but the means to explicitly convert to and from INT?

I'm not sure what a "blind type" is, but the TTM writings don't specify how the selector operator and THE_Value(...) operator come into being. They might be manually authored somewhere, or they might be automatically generated. They certainly can be automatically generated and Rel does so.

Sorry: it's much the same as what is now known as an 'opaque type'. And yes, it's close to an SQL DISTINCT type. A type with no visible internals or operations, with the exception of equality and casts.

How does that help? If for example, QTY is intended to be treated arithmetically (by referring to QTY.Value) and STATUS is not, how do you enforce that?

It's inherent in the fact that they are distinct types. The intent is not to prevent the user from, say, comparing THE_Value(myQTY) with THE_Value(mySTATUS) if that is desired. The intent is to prevent accidently comparing/multiplying/adding/joining/whatevering myQTY with mySTATUS because both were declared to be INT.

It's a weak protection, particularly compared to Haskell. In practice the code is going to be littered with mentions of THE_Value(), with no way to tell which applies to which. Instead of some_qty + some_length you'll just have THE_Value(some_qty) + THE_Value(some_length). I don't like it.

Take a look at some SQL implementations' "distinct types". E.g., https://www.ibm.com/docs/en/informix-servers/12.10?topic=types-distinct-data. Same idea.

But tying it into something called a 'dictionary', I'm not so sure I like that. What (extra) problem does that solve?

Imagine we define a dictionary and use it like this:

DICTIONARY;
Customer_ID INT;
Product_ID INT;
Quantity, OldQuantity INT;
END DICTIONARY;
VAR Customers REAL RELATION {Customer_ID} KEY {Customer_ID};
VAR Products REAL RELATION {Product_ID} KEY {Product_ID};
VAR CustomerProducts REAL RELATION {Customer_ID, Product_ID, Quantity, OldQuantity} KEY {Customer_ID, Product_ID};
DICTIONARY; Customer_ID INT; Product_ID INT; Quantity, OldQuantity INT; END DICTIONARY; VAR Customers REAL RELATION {Customer_ID} KEY {Customer_ID}; VAR Products REAL RELATION {Product_ID} KEY {Product_ID}; VAR CustomerProducts REAL RELATION {Customer_ID, Product_ID, Quantity, OldQuantity} KEY {Customer_ID, Product_ID};
DICTIONARY;
  Customer_ID INT;
  Product_ID INT;
  Quantity, OldQuantity INT;
END DICTIONARY;

VAR Customers REAL RELATION {Customer_ID} KEY {Customer_ID};
VAR Products REAL RELATION {Product_ID} KEY {Product_ID};
VAR CustomerProducts REAL RELATION {Customer_ID, Product_ID, Quantity, OldQuantity} KEY {Customer_ID, Product_ID};

As used above, the DICTIONARY provides a single point of definition (and thus a single source of truth) for all attribute names/types used in the database, and ensures that each dictionary entry is a distinct type from the others and a distinct type from INT, whilst specifying that Quantity and OldQuantity are the same type (both are type Quantity.)

Now that's the problem. You can't define a language feature in terms of something that is not a language feature. Are you saying that (a) a dictionary is a (named?) collection of defined/declared types (b) a database (per TTM) must/should declare compliance with a particular dictionary?

It's defined entirely in terms of existing language features, and its use is entirely optional -- all existing language features continue to work as before. You can use DICTIONARY if you wish, not use it at all if you wish, or mix use of DICTIONARY entries and the current <identifier> <type> definitions for attributes, variables, and parameters.

No, you defined it in terms of the database and that was my sole objection. If you define it in terms of language features so that it can be used in a database then no problem.

Andl - A New Database Language - andl.org
Quote from Dave Voorhis on May 6, 2021, 1:08 pm
Quote from Hugh on May 6, 2021, 11:06 am

I refer to Dave Voorhis’s “dictionary” proposal that has been advanced and discussed under the rubric Tuples FTW.

In brief, he proposes a shorthand for defining a set of user-defined types that each have just a single possrep and a single possrep component.  In each case the possrep name is the same as the type name and the possrep component name is Value.  The purpose is much the same as that of standard SQL’s so-called distinct types: to provide a way of avoiding traps arising from inappropriate comparisons, especially ones that are implicit in operations such as joins on relations.

Here, according to my understanding, is an example, showing how it might be implemented in Rel:

Ex. 1

Dictionary;
SNO CHAR;
PNO CHAR;

End Dictionary;

As far as the type definitions are concerned this is equivalent to

Ex. 2

TYPE SNO POSSREP { Value CHAR };
TYPE PNO POSSREP { Value CHAR };

but there are two more points to the proposal.  Calling those types dictionary types and their defining statements dictionary elements:

  1. The definition of a variable with declared type a dictionary type and name the name of that type need not repeat the name. So VAR SNO; is short for VAR SNO SNO;

The same applies to declarations of objects such as attributes, parameters, and possrep components:

Ex. 3

VAR SP BASE REL{SNO, PNO, QTY INT} KEY{SNO, PNO};

VAR SP BASE REL{SNO SNO, PNO PNO, QTY INT}
KEY{SNO, PNO};

Ex. 4

OPERATOR CheckSNO {SNO} RETURNS BOOLEAN;

RETURN Left(THE_Value(SNO),1) = 'S';

END OPERATOR;

OPERATOR CheckSNO {SNO SNO} RETURNS BOOLEAN;

RETURN Left(THE_Value(SNO),1) = 'S';

END OPERATOR;

Ex. 5

Assume TypeX is a dictionary type.  Then

TYPE TwinX POSSREP { TypeX, TypeX2 TypeX};

TYPE TwinX POSSREP { TypeX TypeX, TypeX2 TypeX};

  1. Provision is made for the same default type to be used for more than one object name by allowing several names to be specified in the same dictionary element.

Ex. 6

Dictionary;
SNO, SNO1, SNO2 CHAR;
PNO, PNO1, PNO2 CHAR;

End Dictionary;

The defined types are just SNO and PNO as in Ex 1, but the declared type names for objects named SNO1 or SNO2 can be omitted in the same manner as for objects named SNO.  A similar observation applies to objects name PNO1 or PNO2.

Questions:

I know that some of these may have been answered already (after I had first drafted this posting) but I decided to leave them in for completeness.

Assume the dictionary is as defined in Ex. 6.

Q1:      Regarding Ex. 4, can the RETURNS clause be omitted such that the operator’s declared type name defaults to the operator’s name?

No. At least it's not part of what I propose.

Q2:      Are the equivalencen given Ex 3-5 correct?  E.g, this a legal relvar definition?

VAR S BASE REL{SNO SNO, NAME CHAR, CITY CHAR} KEY{SNO};

Yes.

Q3:      Is this a legal relvar definition?

VAR X BASE REL{SNO INT} KEY{SNO};

Yes. Defining a dictionary entry SNO does not obligate that the only SNO be the dictionary definition. It's a distinct type from the dictionary definition's type, so type safety is preserved.

Q4:      Given relvar R of heading {A INT}, is this expression legal:

R RENAME {A AS SNO}

Yes. SNO is not forced to be globally unique(ly typed.)

Q5:      Given relvar R of heading {A INT}, is this expression legal:

EXTEND R : {SNO := A + 1}

Yes. Again, SNO is not forced to be globally unique.

Q6:      The database is to be extended with some new relvars and it is desired to use new dictionary types for some of the attributes.  How can these additional dictionary types be added?  (I assume use of INSERT on the relevant catalog relvar would be one not-very-convenient method.  Right?)

There would be some (implementation-dependent, perhaps) mechanisms to ALTER DICTIONARY ADD <entry>, ALTER DICTIONARY DELETE <name>, etc.

Q7:      Possibly a stupid question.  Couldn’t system-defined scalar types be assumed to be dictionary types too?  For example, couldn’t this be legal?

VAR CHAR;

If not, why not?  In fact, why can’t all scalar types be dictionary types?  I realise that would need a different way of specifying additional names as in Ex. 6.  I also realise that expressions such as Value(Value(CHAR)), and so on ad infinitum would then be legal!

Which is shorthand for VAR CHAR CHAR ?

I suppose it could, but it seems a curious (ab)use of the (intent of the) dictionary mechanism. Primitive built-in types are notionally special in most languages and I'd be inclined to preserve that here, if only to avoid readability issues from (mis)using it, along with practical implementation parsing issues from INT/INTEGER, CHAR/CHARACTER, RAT/RATIONAL, BOOL/BOOLEAN being reserved words. (I don't recall whether they're still reserved words in Rel or not.)

Thanks Dave.  Your answers are as expected.  I've seen the subsequent discussions with Erwin and dandl.

I can't see anything wrong with the proposal but I have to say I don't feel at all enthusiastic about it and it's certainly not to my personal taste.  Providing a shorthand for defining types that have a single possrep with a single component doesn't seem to be of much benefit.  I'm more disturbed by allowing the type to be omitted in variable/attribute/parameter/possrep component definitions.  It seems just as likely to lead to some confusion as to be of any real benefit, considering that solutions to the problem at hand are already fairly easily available in TD and Rel.

I'm now wondering why Tobega advanced that "type attribute" idea in the first place.  What perceived deficiency in TTM or TD was it supposed to address?

Hugh

 

Coauthor of The Third Manifesto and related books.
Quote from Hugh on May 7, 2021, 2:27 pm
Quote from Dave Voorhis on May 6, 2021, 1:08 pm
Quote from Hugh on May 6, 2021, 11:06 am

I refer to Dave Voorhis’s “dictionary” proposal that has been advanced and discussed under the rubric Tuples FTW.

In brief, he proposes a shorthand for defining a set of user-defined types that each have just a single possrep and a single possrep component.  In each case the possrep name is the same as the type name and the possrep component name is Value.  The purpose is much the same as that of standard SQL’s so-called distinct types: to provide a way of avoiding traps arising from inappropriate comparisons, especially ones that are implicit in operations such as joins on relations.

Here, according to my understanding, is an example, showing how it might be implemented in Rel:

Ex. 1

Dictionary;
SNO CHAR;
PNO CHAR;

End Dictionary;

As far as the type definitions are concerned this is equivalent to

Ex. 2

TYPE SNO POSSREP { Value CHAR };
TYPE PNO POSSREP { Value CHAR };

but there are two more points to the proposal.  Calling those types dictionary types and their defining statements dictionary elements:

  1. The definition of a variable with declared type a dictionary type and name the name of that type need not repeat the name. So VAR SNO; is short for VAR SNO SNO;

The same applies to declarations of objects such as attributes, parameters, and possrep components:

Ex. 3

VAR SP BASE REL{SNO, PNO, QTY INT} KEY{SNO, PNO};

VAR SP BASE REL{SNO SNO, PNO PNO, QTY INT}
KEY{SNO, PNO};

Ex. 4

OPERATOR CheckSNO {SNO} RETURNS BOOLEAN;

RETURN Left(THE_Value(SNO),1) = 'S';

END OPERATOR;

OPERATOR CheckSNO {SNO SNO} RETURNS BOOLEAN;

RETURN Left(THE_Value(SNO),1) = 'S';

END OPERATOR;

Ex. 5

Assume TypeX is a dictionary type.  Then

TYPE TwinX POSSREP { TypeX, TypeX2 TypeX};

TYPE TwinX POSSREP { TypeX TypeX, TypeX2 TypeX};

  1. Provision is made for the same default type to be used for more than one object name by allowing several names to be specified in the same dictionary element.

Ex. 6

Dictionary;
SNO, SNO1, SNO2 CHAR;
PNO, PNO1, PNO2 CHAR;

End Dictionary;

The defined types are just SNO and PNO as in Ex 1, but the declared type names for objects named SNO1 or SNO2 can be omitted in the same manner as for objects named SNO.  A similar observation applies to objects name PNO1 or PNO2.

Questions:

I know that some of these may have been answered already (after I had first drafted this posting) but I decided to leave them in for completeness.

Assume the dictionary is as defined in Ex. 6.

Q1:      Regarding Ex. 4, can the RETURNS clause be omitted such that the operator’s declared type name defaults to the operator’s name?

No. At least it's not part of what I propose.

Q2:      Are the equivalencen given Ex 3-5 correct?  E.g, this a legal relvar definition?

VAR S BASE REL{SNO SNO, NAME CHAR, CITY CHAR} KEY{SNO};

Yes.

Q3:      Is this a legal relvar definition?

VAR X BASE REL{SNO INT} KEY{SNO};

Yes. Defining a dictionary entry SNO does not obligate that the only SNO be the dictionary definition. It's a distinct type from the dictionary definition's type, so type safety is preserved.

Q4:      Given relvar R of heading {A INT}, is this expression legal:

R RENAME {A AS SNO}

Yes. SNO is not forced to be globally unique(ly typed.)

Q5:      Given relvar R of heading {A INT}, is this expression legal:

EXTEND R : {SNO := A + 1}

Yes. Again, SNO is not forced to be globally unique.

Q6:      The database is to be extended with some new relvars and it is desired to use new dictionary types for some of the attributes.  How can these additional dictionary types be added?  (I assume use of INSERT on the relevant catalog relvar would be one not-very-convenient method.  Right?)

There would be some (implementation-dependent, perhaps) mechanisms to ALTER DICTIONARY ADD <entry>, ALTER DICTIONARY DELETE <name>, etc.

Q7:      Possibly a stupid question.  Couldn’t system-defined scalar types be assumed to be dictionary types too?  For example, couldn’t this be legal?

VAR CHAR;

If not, why not?  In fact, why can’t all scalar types be dictionary types?  I realise that would need a different way of specifying additional names as in Ex. 6.  I also realise that expressions such as Value(Value(CHAR)), and so on ad infinitum would then be legal!

Which is shorthand for VAR CHAR CHAR ?

I suppose it could, but it seems a curious (ab)use of the (intent of the) dictionary mechanism. Primitive built-in types are notionally special in most languages and I'd be inclined to preserve that here, if only to avoid readability issues from (mis)using it, along with practical implementation parsing issues from INT/INTEGER, CHAR/CHARACTER, RAT/RATIONAL, BOOL/BOOLEAN being reserved words. (I don't recall whether they're still reserved words in Rel or not.)

Thanks Dave.  Your answers are as expected.  I've seen the subsequent discussions with Erwin and dandl.

I can't see anything wrong with the proposal but I have to say I don't feel at all enthusiastic about it and it's certainly not to my personal taste.  Providing a shorthand for defining types that have a single possrep with a single component doesn't seem to be of much benefit.  I'm more disturbed by allowing the type to be omitted in variable/attribute/parameter/possrep component definitions.  It seems just as likely to lead to some confusion as to be of any real benefit, considering that solutions to the problem at hand are already fairly easily available in TD and Rel.

I'm now wondering why Tobega advanced that "type attribute" idea in the first place.  What perceived deficiency in TTM or TD was it supposed to address?

Hugh

 

I can't speak for Tobega, but the goal of my DICTIONARY proposal is to make it easy to create to create schemas with distinct attribute types where they should be distinct, the same attribute types where they should be the same, and all attribute types distinct from the built-in types even if they're using the built-in types.

Arguably, for proper type safety, that's what we should always do.  Defining attributes like Customer_ID and Product_ID and Quantity all as INTEGER is abominable -- though in the real world of SQL, commonplace -- and DICTIONARY is intended to make it easy to stop doing it badly and easy to start doing it right. (The same rationale presumably applies to distinct types in some SQL implementations.)

I presume Tobega (though, again, I can't speak for him) intended the same thing, but approaches it a different way.

But, yes, you can do exactly the same thing without DICTIONARY. Simply define and use the approach where I've shown what DICTIONARY is shorthand for. That's what I tend to do, anyway.

The deficiency in Tutorial D is that it's rather laborious to do it the right way and all too easy to do it the wrong way (just use INT and CHAR and BOOL and RATIONAL for everything), so  it would be nice -- pedagogically, polemically (kind of...), and practically -- to have a construct (DICTIONARY) that strongly encourages (but doesn't require) you to do it the right way all the time.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on May 7, 2021, 2:18 pm
Quote from Dave Voorhis on May 7, 2021, 11:04 am
Quote from dandl on May 7, 2021, 10:41 am
Quote from Dave Voorhis on May 7, 2021, 8:20 am
Quote from dandl on May 7, 2021, 12:16 am
Quote from Hugh on May 6, 2021, 11:06 am

I refer to Dave Voorhis’s “dictionary” proposal that has been advanced and discussed under the rubric Tuples FTW.

In brief, he proposes a shorthand for defining a set of user-defined types that each have just a single possrep and a single possrep component.  In each case the possrep name is the same as the type name and the possrep component name is Value.  The purpose is much the same as that of standard SQL’s so-called distinct types: to provide a way of avoiding traps arising from inappropriate comparisons, especially ones that are implicit in operations such as joins on relations.

On the face of it, isn't this just a type alise? Much like typedef in C? I'm a big fan, I use(d) it a lot.

No, it's not a type alias.

If you write typedef qty int in C, the type names qty and int become interchangeable; they are the same type.

If you write...

DICTIONARY;
QTY INT;
END DICTIONARY;
DICTIONARY; QTY INT; END DICTIONARY;
DICTIONARY;
  QTY INT;
END DICTIONARY;

...per my proposal, INT and QTY are not interchangeable. Instead, a new type QTY has been implicitly created with a single POSSREP component named Value of type INT.

So this is a new blind type, with no operations, but the means to explicitly convert to and from INT?

I'm not sure what a "blind type" is, but the TTM writings don't specify how the selector operator and THE_Value(...) operator come into being. They might be manually authored somewhere, or they might be automatically generated. They certainly can be automatically generated and Rel does so.

Sorry: it's much the same as what is now known as an 'opaque type'. And yes, it's close to an SQL DISTINCT type. A type with no visible internals or operations, with the exception of equality and casts.

It's not an opaque type; it's transparent. It's just the standard Tutorial D approach to a composite type, with a single element.

How does that help? If for example, QTY is intended to be treated arithmetically (by referring to QTY.Value) and STATUS is not, how do you enforce that?

It's inherent in the fact that they are distinct types. The intent is not to prevent the user from, say, comparing THE_Value(myQTY) with THE_Value(mySTATUS) if that is desired. The intent is to prevent accidently comparing/multiplying/adding/joining/whatevering myQTY with mySTATUS because both were declared to be INT.

It's a weak protection, particularly compared to Haskell. In practice the code is going to be littered with mentions of THE_Value(), with no way to tell which applies to which. Instead of some_qty + some_length you'll just have THE_Value(some_qty) + THE_Value(some_length). I don't like it.

It's not Haskell, it's Tutorial D.

If you frequently add QTY and LENGTH, maybe they should be the same type. If not, why not define an operator to handle addition:

OPERATOR OP_ADD(QTY, LENGTH) RETURNS QTY;
  RETURN QTY(THE_Value(QTY) + THE_Value(LENGTH));
END OPERATOR;

That's how you override '+' in Rel. Now you can say some_qty + some_length.

If you infrequently add QTY and LENGTH, then I'd rather see THE_Value(some_qty) + THE_Value(some_length), because it tells me exactly what's going on in that (hopefully) rather exceptional and unusual (and perhaps questionable) case.

Take a look at some SQL implementations' "distinct types". E.g., https://www.ibm.com/docs/en/informix-servers/12.10?topic=types-distinct-data. Same idea.

But tying it into something called a 'dictionary', I'm not so sure I like that. What (extra) problem does that solve?

Imagine we define a dictionary and use it like this:

DICTIONARY;
Customer_ID INT;
Product_ID INT;
Quantity, OldQuantity INT;
END DICTIONARY;
VAR Customers REAL RELATION {Customer_ID} KEY {Customer_ID};
VAR Products REAL RELATION {Product_ID} KEY {Product_ID};
VAR CustomerProducts REAL RELATION {Customer_ID, Product_ID, Quantity, OldQuantity} KEY {Customer_ID, Product_ID};
DICTIONARY; Customer_ID INT; Product_ID INT; Quantity, OldQuantity INT; END DICTIONARY; VAR Customers REAL RELATION {Customer_ID} KEY {Customer_ID}; VAR Products REAL RELATION {Product_ID} KEY {Product_ID}; VAR CustomerProducts REAL RELATION {Customer_ID, Product_ID, Quantity, OldQuantity} KEY {Customer_ID, Product_ID};
DICTIONARY;
  Customer_ID INT;
  Product_ID INT;
  Quantity, OldQuantity INT;
END DICTIONARY;

VAR Customers REAL RELATION {Customer_ID} KEY {Customer_ID};
VAR Products REAL RELATION {Product_ID} KEY {Product_ID};
VAR CustomerProducts REAL RELATION {Customer_ID, Product_ID, Quantity, OldQuantity} KEY {Customer_ID, Product_ID};

As used above, the DICTIONARY provides a single point of definition (and thus a single source of truth) for all attribute names/types used in the database, and ensures that each dictionary entry is a distinct type from the others and a distinct type from INT, whilst specifying that Quantity and OldQuantity are the same type (both are type Quantity.)

Now that's the problem. You can't define a language feature in terms of something that is not a language feature. Are you saying that (a) a dictionary is a (named?) collection of defined/declared types (b) a database (per TTM) must/should declare compliance with a particular dictionary?

It's defined entirely in terms of existing language features, and its use is entirely optional -- all existing language features continue to work as before. You can use DICTIONARY if you wish, not use it at all if you wish, or mix use of DICTIONARY entries and the current <identifier> <type> definitions for attributes, variables, and parameters.

No, you defined it in terms of the database and that was my sole objection. If you define it in terms of language features so that it can be used in a database then no problem.

I don't know what that means.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on May 7, 2021, 2:51 pm
Quote from Hugh on May 7, 2021, 2:27 pm
Quote from Dave Voorhis on May 6, 2021, 1:08 pm
Quote from Hugh on May 6, 2021, 11:06 am

I refer to Dave Voorhis’s “dictionary” proposal that has been advanced and discussed under the rubric Tuples FTW.

In brief, he proposes a shorthand for defining a set of user-defined types that each have just a single possrep and a single possrep component.  In each case the possrep name is the same as the type name and the possrep component name is Value.  The purpose is much the same as that of standard SQL’s so-called distinct types: to provide a way of avoiding traps arising from inappropriate comparisons, especially ones that are implicit in operations such as joins on relations.

Here, according to my understanding, is an example, showing how it might be implemented in Rel:

Ex. 1

Dictionary;
SNO CHAR;
PNO CHAR;

End Dictionary;

As far as the type definitions are concerned this is equivalent to

Ex. 2

TYPE SNO POSSREP { Value CHAR };
TYPE PNO POSSREP { Value CHAR };

but there are two more points to the proposal.  Calling those types dictionary types and their defining statements dictionary elements:

  1. The definition of a variable with declared type a dictionary type and name the name of that type need not repeat the name. So VAR SNO; is short for VAR SNO SNO;

The same applies to declarations of objects such as attributes, parameters, and possrep components:

Ex. 3

VAR SP BASE REL{SNO, PNO, QTY INT} KEY{SNO, PNO};

VAR SP BASE REL{SNO SNO, PNO PNO, QTY INT}
KEY{SNO, PNO};

Ex. 4

OPERATOR CheckSNO {SNO} RETURNS BOOLEAN;

RETURN Left(THE_Value(SNO),1) = 'S';

END OPERATOR;

OPERATOR CheckSNO {SNO SNO} RETURNS BOOLEAN;

RETURN Left(THE_Value(SNO),1) = 'S';

END OPERATOR;

Ex. 5

Assume TypeX is a dictionary type.  Then

TYPE TwinX POSSREP { TypeX, TypeX2 TypeX};

TYPE TwinX POSSREP { TypeX TypeX, TypeX2 TypeX};

  1. Provision is made for the same default type to be used for more than one object name by allowing several names to be specified in the same dictionary element.

Ex. 6

Dictionary;
SNO, SNO1, SNO2 CHAR;
PNO, PNO1, PNO2 CHAR;

End Dictionary;

The defined types are just SNO and PNO as in Ex 1, but the declared type names for objects named SNO1 or SNO2 can be omitted in the same manner as for objects named SNO.  A similar observation applies to objects name PNO1 or PNO2.

Questions:

I know that some of these may have been answered already (after I had first drafted this posting) but I decided to leave them in for completeness.

Assume the dictionary is as defined in Ex. 6.

Q1:      Regarding Ex. 4, can the RETURNS clause be omitted such that the operator’s declared type name defaults to the operator’s name?

No. At least it's not part of what I propose.

Q2:      Are the equivalencen given Ex 3-5 correct?  E.g, this a legal relvar definition?

VAR S BASE REL{SNO SNO, NAME CHAR, CITY CHAR} KEY{SNO};

Yes.

Q3:      Is this a legal relvar definition?

VAR X BASE REL{SNO INT} KEY{SNO};

Yes. Defining a dictionary entry SNO does not obligate that the only SNO be the dictionary definition. It's a distinct type from the dictionary definition's type, so type safety is preserved.

Q4:      Given relvar R of heading {A INT}, is this expression legal:

R RENAME {A AS SNO}

Yes. SNO is not forced to be globally unique(ly typed.)

Q5:      Given relvar R of heading {A INT}, is this expression legal:

EXTEND R : {SNO := A + 1}

Yes. Again, SNO is not forced to be globally unique.

Q6:      The database is to be extended with some new relvars and it is desired to use new dictionary types for some of the attributes.  How can these additional dictionary types be added?  (I assume use of INSERT on the relevant catalog relvar would be one not-very-convenient method.  Right?)

There would be some (implementation-dependent, perhaps) mechanisms to ALTER DICTIONARY ADD <entry>, ALTER DICTIONARY DELETE <name>, etc.

Q7:      Possibly a stupid question.  Couldn’t system-defined scalar types be assumed to be dictionary types too?  For example, couldn’t this be legal?

VAR CHAR;

If not, why not?  In fact, why can’t all scalar types be dictionary types?  I realise that would need a different way of specifying additional names as in Ex. 6.  I also realise that expressions such as Value(Value(CHAR)), and so on ad infinitum would then be legal!

Which is shorthand for VAR CHAR CHAR ?

I suppose it could, but it seems a curious (ab)use of the (intent of the) dictionary mechanism. Primitive built-in types are notionally special in most languages and I'd be inclined to preserve that here, if only to avoid readability issues from (mis)using it, along with practical implementation parsing issues from INT/INTEGER, CHAR/CHARACTER, RAT/RATIONAL, BOOL/BOOLEAN being reserved words. (I don't recall whether they're still reserved words in Rel or not.)

Thanks Dave.  Your answers are as expected.  I've seen the subsequent discussions with Erwin and dandl.

I can't see anything wrong with the proposal but I have to say I don't feel at all enthusiastic about it and it's certainly not to my personal taste.  Providing a shorthand for defining types that have a single possrep with a single component doesn't seem to be of much benefit.  I'm more disturbed by allowing the type to be omitted in variable/attribute/parameter/possrep component definitions.  It seems just as likely to lead to some confusion as to be of any real benefit, considering that solutions to the problem at hand are already fairly easily available in TD and Rel.

I'm now wondering why Tobega advanced that "type attribute" idea in the first place.  What perceived deficiency in TTM or TD was it supposed to address?

Hugh

 

I can't speak for Tobega, but the goal of my DICTIONARY proposal is to make it easy to create to create schemas with distinct attribute types where they should be distinct, the same attribute types where they should be the same, and all attribute types distinct from the built-in types even if they're using the built-in types.

Arguably, for proper type safety, that's what we should always do.  Defining attributes like Customer_ID and Product_ID and Quantity all as INTEGER is abominable -- though in the real world of SQL, commonplace -- and DICTIONARY is intended to make it easy to stop doing it badly and easy to start doing it right. (The same rationale presumably applies to distinct types in some SQL implementations.)

I presume Tobega (though, again, I can't speak for him) intended the same thing, but approaches it a different way.

But, yes, you can do exactly the same thing without DICTIONARY. Simply define and use the approach where I've shown what DICTIONARY is shorthand for. That's what I tend to do, anyway.

The deficiency in Tutorial D is that it's rather laborious to do it the right way and all too easy to do it the wrong way (just use INT and CHAR and BOOL and RATIONAL for everything), so  it would be nice -- pedagogically, polemically (kind of...), and practically -- to have a construct (DICTIONARY) that strongly encourages (but doesn't require) you to do it the right way all the time.

Exactly this. You SHOULD define a distinct type for each kind of thing. A width is not a height is not an x-coordinate even though all are ints. Every so often in some bug retrospective we will conclude "we should have defined distinct types for this". But it is laborious to do and developers are lazy, it's hard enough to get them to use type declarations and tests, even when we know there are long-term benefits. Nobody uses add-on tools for contracts or formal verification since it doesn't immediately contribute to getting the code in.

My idea was to make it happen automatically (and I will probably experiment with it in Tailspin), so that it gets done. I think the DICTIONARY is a fine way to encourage it without breaking anything, and I think the shorthand declarations might be the little carrot that encourages you even further to do the right thing.

Quote from Dave Voorhis on May 6, 2021, 4:40 pm
Quote from Erwin on May 6, 2021, 4:26 pm
Quote from Dave Voorhis on May 6, 2021, 3:26 pm
Quote from Erwin on May 6, 2021, 2:53 pm

When I first saw the 'dictionary' proposal, my first instinctive reaction was "that's SIRA_PRISE's ATTRIBUTE relvar" ( https://shark.armchair.mb.ca/~erwin/doc0105/public/be/SIRAPRISE/client/NAMES.RELVARNAMES.html#ATTRIBUTE , and its relvar predicate in particular).  It seems like it was intended to be yet something else.  The purpose of ATTRIBUTE was to have some sort of facility that, at least in the base relvars, prevents using the same (attribute) name for differently-typed things, causing/contributing to the "natural join is a disaster waiting to happen" mindset.  That facility is very much "dictionary-like", which might have caused my confusion.  The purpose here seems to be to offer a facility that helps the programmer avoid repetition of words in the case where he'd want the same name for a variable/attribute/... somewhere in the program and the type of that thing.  I guess you'll run into that use case a lot if you're toying around/.  I suspect it to be less so if what you're writing is supposed to be industrial-strength.  And I believe dictionaries of business data elements are far more useful to have and exploit than dictionaries of program data types.

The goal here is to be able to define a (say) 'CustomerID' attribute name and type, base it on an appropriate existing type like INT, and ensure that every appearance of CustomerID has the same (or a specified alternative) name and type (which facilitates proper use of natural JOIN), whilst a notionally similar 'ProductID' name/type can also based on an appropriate existing type like INT but is guaranteed to be a distinct type from CustomerID despite both being based on INT.

It's somewhat similar to 'distinct types' found in some SQL implementations. See, for example, https://www.ibm.com/docs/en/db2-warehouse?topic=types-user-defined-distinct.

OK.  So the 'type' aspect is about a shorthand for declaring one-possrep, one-component types ?

The 'type' aspect is about specially identifying (via a dictionary element) the representation for attributes/variables/parameters of a given name (or set of names), whilst ensuring that it's a unique type shared by the attributes/variables/parameters of that special identification.

The one-possrep, one-component type-shorthand is merely a means to implement it.

There's also an 'attribute' aspect, as per 'define a ... attribute name and type'.  Per your examples however, there's also a 'variables' aspect because you used the idea in variable declarations.  And then there's the 'dictionary' aspect which is about bundling all such declarations together in some place (and presumably be able to "import" them in some dead-easy way).

Are you sure it's a good idea to tack all of those distinct aspects onto one single language construct ?

Yes. That's what a data dictionary is for, isn't it?

I'll give it one last try.  Regard your dictionary "value" (or "dictionary value") as a relation, then write down the predicate for it (i.e. write down the predicate for a relvar that holds your "dictionary value" inside the catalog).

Is that predicate such that it obviously warrants all the distinct types of "usage" you have come forward with in your examples ?  Is that predicate disjunctive ?  If so, I suggest you really are talking of a number of distinct types of dictionary (or perhaps identical types of dictionary but they distinguish themselves from one another by what is their area of application) with that number equal to the number of disjuncts in your predicate, and I'd feel much more comfortable with the idea if the syntax explicitates (or how do you say that) that distinction.

Author of SIRA_PRISE
Quote from Erwin on May 7, 2021, 5:18 pm
Quote from Dave Voorhis on May 6, 2021, 4:40 pm
Quote from Erwin on May 6, 2021, 4:26 pm
Quote from Dave Voorhis on May 6, 2021, 3:26 pm
Quote from Erwin on May 6, 2021, 2:53 pm

When I first saw the 'dictionary' proposal, my first instinctive reaction was "that's SIRA_PRISE's ATTRIBUTE relvar" ( https://shark.armchair.mb.ca/~erwin/doc0105/public/be/SIRAPRISE/client/NAMES.RELVARNAMES.html#ATTRIBUTE , and its relvar predicate in particular).  It seems like it was intended to be yet something else.  The purpose of ATTRIBUTE was to have some sort of facility that, at least in the base relvars, prevents using the same (attribute) name for differently-typed things, causing/contributing to the "natural join is a disaster waiting to happen" mindset.  That facility is very much "dictionary-like", which might have caused my confusion.  The purpose here seems to be to offer a facility that helps the programmer avoid repetition of words in the case where he'd want the same name for a variable/attribute/... somewhere in the program and the type of that thing.  I guess you'll run into that use case a lot if you're toying around/.  I suspect it to be less so if what you're writing is supposed to be industrial-strength.  And I believe dictionaries of business data elements are far more useful to have and exploit than dictionaries of program data types.

The goal here is to be able to define a (say) 'CustomerID' attribute name and type, base it on an appropriate existing type like INT, and ensure that every appearance of CustomerID has the same (or a specified alternative) name and type (which facilitates proper use of natural JOIN), whilst a notionally similar 'ProductID' name/type can also based on an appropriate existing type like INT but is guaranteed to be a distinct type from CustomerID despite both being based on INT.

It's somewhat similar to 'distinct types' found in some SQL implementations. See, for example, https://www.ibm.com/docs/en/db2-warehouse?topic=types-user-defined-distinct.

OK.  So the 'type' aspect is about a shorthand for declaring one-possrep, one-component types ?

The 'type' aspect is about specially identifying (via a dictionary element) the representation for attributes/variables/parameters of a given name (or set of names), whilst ensuring that it's a unique type shared by the attributes/variables/parameters of that special identification.

The one-possrep, one-component type-shorthand is merely a means to implement it.

There's also an 'attribute' aspect, as per 'define a ... attribute name and type'.  Per your examples however, there's also a 'variables' aspect because you used the idea in variable declarations.  And then there's the 'dictionary' aspect which is about bundling all such declarations together in some place (and presumably be able to "import" them in some dead-easy way).

Are you sure it's a good idea to tack all of those distinct aspects onto one single language construct ?

Yes. That's what a data dictionary is for, isn't it?

I'll give it one last try.  Regard your dictionary "value" (or "dictionary value") as a relation, then write down the predicate for it (i.e. write down the predicate for a relvar that holds your "dictionary value" inside the catalog).

Is that predicate such that it obviously warrants all the distinct types of "usage" you have come forward with in your examples ?  Is that predicate disjunctive ?  If so, I suggest you really are talking of a number of distinct types of dictionary (or perhaps identical types of dictionary but they distinguish themselves from one another by what is their area of application) with that number equal to the number of disjuncts in your predicate, and I'd feel much more comfortable with the idea if the syntax explicitates (or how do you say that) that distinction.

Assume the dictionary catalog relvar is:

VAR sys.Dictionary REAL RELATION {Name CHAR, DeclaredType CHAR, Unwrapped BOOLEAN, AdditionalNames RELATION {Name CHAR}} KEY {Name};

The predicate is, "An attribute, variable, or parameter declared solely by Name or one of the AdditionalNames will have a type that wraps DeclaredType unless Unwrapped is true, in which case it has type DeclaredType."

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

I'm now wondering why Tobega advanced that "type attribute" idea in the first place.  What perceived deficiency in TTM or TD was it supposed to address?

I agree. What problem are we trying to solve, and what would a really good solution look like?

If the aim is merely to show intended usage in the database, then type aliases are enough. Since we also want types with components eg Point(x,y)), something that closely resembles a type with a single component might do the job. But although we now know that ID and QTY and STATUS are different, we don't know what purpose they serve or what operations are allowed on them. We might suspect that LOW_SCORE and HIGH_SCORE are related, while LOW_TEMP and LOW_LEVEL and LOW_BALANCE probably are not, but the compiler can't help us.

None of this helps with avoiding clangers like SUM(LOW_TEMP) or AVERAGE(CLIENT_ID) or ORDER BY BITMAP_BLOB or (QTY+STATUS-ACCOUNT_BALANCE). To do that we need a type system that allows fine-grained types with operations drawn from those available to the underlying base type. It's not inheritance, it's more like the Haskell system. You need a way to say that ID is an integer with comparison but no arithmetic operators, while STATUS can be averaged but not summed and QTY cannot be added to SHIP_WEIGHT.

And that's before we even get started on units of measure.

 

Andl - A New Database Language - andl.org
Quote from dandl on May 8, 2021, 2:20 am

I'm now wondering why Tobega advanced that "type attribute" idea in the first place.  What perceived deficiency in TTM or TD was it supposed to address?

I agree. What problem are we trying to solve, and what would a really good solution look like?

If the aim is merely to show intended usage in the database, then type aliases are enough. Since we also want types with components eg Point(x,y)), something that closely resembles a type with a single component might do the job. But although we now know that ID and QTY and STATUS are different, we don't know what purpose they serve or what operations are allowed on them. We might suspect that LOW_SCORE and HIGH_SCORE are related, while LOW_TEMP and LOW_LEVEL and LOW_BALANCE probably are not, but the compiler can't help us.

None of this helps with avoiding clangers like SUM(LOW_TEMP) or AVERAGE(CLIENT_ID) or ORDER BY BITMAP_BLOB or (QTY+STATUS-ACCOUNT_BALANCE). To do that we need a type system that allows fine-grained types with operations drawn from those available to the underlying base type. It's not inheritance, it's more like the Haskell system. You need a way to say that ID is an integer with comparison but no arithmetic operators, while STATUS can be averaged but not summed and QTY cannot be added to SHIP_WEIGHT.

And that's before we even get started on units of measure.

DICTIONARY is precisely about avoiding clangers like SUM(LOW_TEMP) or AVERAGE(CLIENT_ID) or ORDER BY BITMAP_BLOB or (QTY+STATUS-ACCOUNT_BALANCE).

That's exactly what it prevents, assuming LOW_TEMP and CLIENT_ID and BITMAP_BLOB and QTY and STATUS and ACCOUNT_BALANCE have been defined in a DICTIONARY.

In fact, you don't even need DICTIONARY. It's just sugar. You could explicitly define DICTIONARY-equivalent types, per my examples showing what the DICTIONARY shorthand is longhand for.

That certainly prevents you from writing AVG(R, CLIENT_ID), at least by default.

It doesn't prevent you from writing AVG(R, THE_Value(CLIENT_ID)), if that's really what you want. But it makes the developer be explicit about it, and it's much less likely to happen carelessly and unintentionally than if you defined CLIENT_ID to be of type INTEGER.

And if you really, really want to be able to say AVG(R, CLIENT_ID), then you can define (at least in Rel) your own AVG that works on type CLIENT_ID. The same goes for other operators, too. Remember that the Tutorial D type system is open -- new operators can be added that reference any type at any time, including (in Rel, at least) symbolic operators like +, -, /, *, etc.

As such, "a type system that allows fine-grained types with operations drawn from those available to the underlying base type" is exactly what it is. You can either define new operators as needed, or access the underlying "base" type via THE_Value(...), whilst "avoiding clangers like SUM(LOW_TEMP) or AVERAGE(CLIENT_ID) or ORDER BY BITMAP_BLOB or (QTY+STATUS-ACCOUNT_BALANCE)."

That's exactly what my DICTIONARY proposal is about. It's not that you can't do all of it in Tutorial D as it stands -- you most certainly can.

DICTIONARY is about making it syntactically easier.

 

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on May 7, 2021, 2:51 pm
Quote from Hugh on May 7, 2021, 2:27 pm
Quote from Dave Voorhis on May 6, 2021, 1:08 pm
Quote from Hugh on May 6, 2021, 11:06 am

I refer to Dave Voorhis’s “dictionary” proposal that has been advanced and discussed under the rubric Tuples FTW.

In brief, he proposes a shorthand for defining a set of user-defined types that each have just a single possrep and a single possrep component.  In each case the possrep name is the same as the type name and the possrep component name is Value.  The purpose is much the same as that of standard SQL’s so-called distinct types: to provide a way of avoiding traps arising from inappropriate comparisons, especially ones that are implicit in operations such as joins on relations.

Here, according to my understanding, is an example, showing how it might be implemented in Rel:

Ex. 1

Dictionary;
SNO CHAR;
PNO CHAR;

End Dictionary;

As far as the type definitions are concerned this is equivalent to

Ex. 2

TYPE SNO POSSREP { Value CHAR };
TYPE PNO POSSREP { Value CHAR };

but there are two more points to the proposal.  Calling those types dictionary types and their defining statements dictionary elements:

  1. The definition of a variable with declared type a dictionary type and name the name of that type need not repeat the name. So VAR SNO; is short for VAR SNO SNO;

The same applies to declarations of objects such as attributes, parameters, and possrep components:

Ex. 3

VAR SP BASE REL{SNO, PNO, QTY INT} KEY{SNO, PNO};

VAR SP BASE REL{SNO SNO, PNO PNO, QTY INT}
KEY{SNO, PNO};

Ex. 4

OPERATOR CheckSNO {SNO} RETURNS BOOLEAN;

RETURN Left(THE_Value(SNO),1) = 'S';

END OPERATOR;

OPERATOR CheckSNO {SNO SNO} RETURNS BOOLEAN;

RETURN Left(THE_Value(SNO),1) = 'S';

END OPERATOR;

Ex. 5

Assume TypeX is a dictionary type.  Then

TYPE TwinX POSSREP { TypeX, TypeX2 TypeX};

TYPE TwinX POSSREP { TypeX TypeX, TypeX2 TypeX};

  1. Provision is made for the same default type to be used for more than one object name by allowing several names to be specified in the same dictionary element.

Ex. 6

Dictionary;
SNO, SNO1, SNO2 CHAR;
PNO, PNO1, PNO2 CHAR;

End Dictionary;

The defined types are just SNO and PNO as in Ex 1, but the declared type names for objects named SNO1 or SNO2 can be omitted in the same manner as for objects named SNO.  A similar observation applies to objects name PNO1 or PNO2.

Questions:

I know that some of these may have been answered already (after I had first drafted this posting) but I decided to leave them in for completeness.

Assume the dictionary is as defined in Ex. 6.

Q1:      Regarding Ex. 4, can the RETURNS clause be omitted such that the operator’s declared type name defaults to the operator’s name?

No. At least it's not part of what I propose.

Q2:      Are the equivalencen given Ex 3-5 correct?  E.g, this a legal relvar definition?

VAR S BASE REL{SNO SNO, NAME CHAR, CITY CHAR} KEY{SNO};

Yes.

Q3:      Is this a legal relvar definition?

VAR X BASE REL{SNO INT} KEY{SNO};

Yes. Defining a dictionary entry SNO does not obligate that the only SNO be the dictionary definition. It's a distinct type from the dictionary definition's type, so type safety is preserved.

Q4:      Given relvar R of heading {A INT}, is this expression legal:

R RENAME {A AS SNO}

Yes. SNO is not forced to be globally unique(ly typed.)

Q5:      Given relvar R of heading {A INT}, is this expression legal:

EXTEND R : {SNO := A + 1}

Yes. Again, SNO is not forced to be globally unique.

Q6:      The database is to be extended with some new relvars and it is desired to use new dictionary types for some of the attributes.  How can these additional dictionary types be added?  (I assume use of INSERT on the relevant catalog relvar would be one not-very-convenient method.  Right?)

There would be some (implementation-dependent, perhaps) mechanisms to ALTER DICTIONARY ADD <entry>, ALTER DICTIONARY DELETE <name>, etc.

Q7:      Possibly a stupid question.  Couldn’t system-defined scalar types be assumed to be dictionary types too?  For example, couldn’t this be legal?

VAR CHAR;

If not, why not?  In fact, why can’t all scalar types be dictionary types?  I realise that would need a different way of specifying additional names as in Ex. 6.  I also realise that expressions such as Value(Value(CHAR)), and so on ad infinitum would then be legal!

Which is shorthand for VAR CHAR CHAR ?

I suppose it could, but it seems a curious (ab)use of the (intent of the) dictionary mechanism. Primitive built-in types are notionally special in most languages and I'd be inclined to preserve that here, if only to avoid readability issues from (mis)using it, along with practical implementation parsing issues from INT/INTEGER, CHAR/CHARACTER, RAT/RATIONAL, BOOL/BOOLEAN being reserved words. (I don't recall whether they're still reserved words in Rel or not.)

Thanks Dave.  Your answers are as expected.  I've seen the subsequent discussions with Erwin and dandl.

I can't see anything wrong with the proposal but I have to say I don't feel at all enthusiastic about it and it's certainly not to my personal taste.  Providing a shorthand for defining types that have a single possrep with a single component doesn't seem to be of much benefit.  I'm more disturbed by allowing the type to be omitted in variable/attribute/parameter/possrep component definitions.  It seems just as likely to lead to some confusion as to be of any real benefit, considering that solutions to the problem at hand are already fairly easily available in TD and Rel.

I'm now wondering why Tobega advanced that "type attribute" idea in the first place.  What perceived deficiency in TTM or TD was it supposed to address?

Hugh

 

I can't speak for Tobega, but the goal of my DICTIONARY proposal is to make it easy to create to create schemas with distinct attribute types where they should be distinct, the same attribute types where they should be the same, and all attribute types distinct from the built-in types even if they're using the built-in types.

Arguably, for proper type safety, that's what we should always do.  Defining attributes like Customer_ID and Product_ID and Quantity all as INTEGER is abominable -- though in the real world of SQL, commonplace -- and DICTIONARY is intended to make it easy to stop doing it badly and easy to start doing it right. (The same rationale presumably applies to distinct types in some SQL implementations.)

I presume Tobega (though, again, I can't speak for him) intended the same thing, but approaches it a different way.

But, yes, you can do exactly the same thing without DICTIONARY. Simply define and use the approach where I've shown what DICTIONARY is shorthand for. That's what I tend to do, anyway.

The deficiency in Tutorial D is that it's rather laborious to do it the right way and all too easy to do it the wrong way (just use INT and CHAR and BOOL and RATIONAL for everything), so  it would be nice -- pedagogically, polemically (kind of...), and practically -- to have a construct (DICTIONARY) that strongly encourages (but doesn't require) you to do it the right way all the time.

Yes, I know about the motivation but I don't think you've justified the additional feature: omission of type name in declarations that use a dictionary type.  That's the bit I'm most bothered about and I don't see what it's got to do with the main problem.

Btw, I note that when dictionary type values are displayed in Rel, the underlying value v will appear wrapped, as typename(v).  In BS12 we probably would have extended our existing support for specifying default display formats to allow the display format for THE_Value(v) to be used instead.

Hugh

Coauthor of The Third Manifesto and related books.
PreviousPage 2 of 3Next