The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Dictionaries

Quote from Hugh on May 8, 2021, 11:05 am
Quote from Dave Voorhis on May 7, 2021, 2:51 pm
Quote from Hugh on May 7, 2021, 2:27 pm
Quote from Dave Voorhis on May 6, 2021, 1:08 pm
Quote from Hugh on May 6, 2021, 11:06 am

I refer to Dave Voorhis’s “dictionary” proposal that has been advanced and discussed under the rubric Tuples FTW.

In brief, he proposes a shorthand for defining a set of user-defined types that each have just a single possrep and a single possrep component.  In each case the possrep name is the same as the type name and the possrep component name is Value.  The purpose is much the same as that of standard SQL’s so-called distinct types: to provide a way of avoiding traps arising from inappropriate comparisons, especially ones that are implicit in operations such as joins on relations.

Here, according to my understanding, is an example, showing how it might be implemented in Rel:

Ex. 1

Dictionary;
SNO CHAR;
PNO CHAR;

End Dictionary;

As far as the type definitions are concerned this is equivalent to

Ex. 2

TYPE SNO POSSREP { Value CHAR };
TYPE PNO POSSREP { Value CHAR };

but there are two more points to the proposal.  Calling those types dictionary types and their defining statements dictionary elements:

  1. The definition of a variable with declared type a dictionary type and name the name of that type need not repeat the name. So VAR SNO; is short for VAR SNO SNO;

The same applies to declarations of objects such as attributes, parameters, and possrep components:

Ex. 3

VAR SP BASE REL{SNO, PNO, QTY INT} KEY{SNO, PNO};

VAR SP BASE REL{SNO SNO, PNO PNO, QTY INT}
KEY{SNO, PNO};

Ex. 4

OPERATOR CheckSNO {SNO} RETURNS BOOLEAN;

RETURN Left(THE_Value(SNO),1) = 'S';

END OPERATOR;

OPERATOR CheckSNO {SNO SNO} RETURNS BOOLEAN;

RETURN Left(THE_Value(SNO),1) = 'S';

END OPERATOR;

Ex. 5

Assume TypeX is a dictionary type.  Then

TYPE TwinX POSSREP { TypeX, TypeX2 TypeX};

TYPE TwinX POSSREP { TypeX TypeX, TypeX2 TypeX};

  1. Provision is made for the same default type to be used for more than one object name by allowing several names to be specified in the same dictionary element.

Ex. 6

Dictionary;
SNO, SNO1, SNO2 CHAR;
PNO, PNO1, PNO2 CHAR;

End Dictionary;

The defined types are just SNO and PNO as in Ex 1, but the declared type names for objects named SNO1 or SNO2 can be omitted in the same manner as for objects named SNO.  A similar observation applies to objects name PNO1 or PNO2.

Questions:

I know that some of these may have been answered already (after I had first drafted this posting) but I decided to leave them in for completeness.

Assume the dictionary is as defined in Ex. 6.

Q1:      Regarding Ex. 4, can the RETURNS clause be omitted such that the operator’s declared type name defaults to the operator’s name?

No. At least it's not part of what I propose.

Q2:      Are the equivalencen given Ex 3-5 correct?  E.g, this a legal relvar definition?

VAR S BASE REL{SNO SNO, NAME CHAR, CITY CHAR} KEY{SNO};

Yes.

Q3:      Is this a legal relvar definition?

VAR X BASE REL{SNO INT} KEY{SNO};

Yes. Defining a dictionary entry SNO does not obligate that the only SNO be the dictionary definition. It's a distinct type from the dictionary definition's type, so type safety is preserved.

Q4:      Given relvar R of heading {A INT}, is this expression legal:

R RENAME {A AS SNO}

Yes. SNO is not forced to be globally unique(ly typed.)

Q5:      Given relvar R of heading {A INT}, is this expression legal:

EXTEND R : {SNO := A + 1}

Yes. Again, SNO is not forced to be globally unique.

Q6:      The database is to be extended with some new relvars and it is desired to use new dictionary types for some of the attributes.  How can these additional dictionary types be added?  (I assume use of INSERT on the relevant catalog relvar would be one not-very-convenient method.  Right?)

There would be some (implementation-dependent, perhaps) mechanisms to ALTER DICTIONARY ADD <entry>, ALTER DICTIONARY DELETE <name>, etc.

Q7:      Possibly a stupid question.  Couldn’t system-defined scalar types be assumed to be dictionary types too?  For example, couldn’t this be legal?

VAR CHAR;

If not, why not?  In fact, why can’t all scalar types be dictionary types?  I realise that would need a different way of specifying additional names as in Ex. 6.  I also realise that expressions such as Value(Value(CHAR)), and so on ad infinitum would then be legal!

Which is shorthand for VAR CHAR CHAR ?

I suppose it could, but it seems a curious (ab)use of the (intent of the) dictionary mechanism. Primitive built-in types are notionally special in most languages and I'd be inclined to preserve that here, if only to avoid readability issues from (mis)using it, along with practical implementation parsing issues from INT/INTEGER, CHAR/CHARACTER, RAT/RATIONAL, BOOL/BOOLEAN being reserved words. (I don't recall whether they're still reserved words in Rel or not.)

Thanks Dave.  Your answers are as expected.  I've seen the subsequent discussions with Erwin and dandl.

I can't see anything wrong with the proposal but I have to say I don't feel at all enthusiastic about it and it's certainly not to my personal taste.  Providing a shorthand for defining types that have a single possrep with a single component doesn't seem to be of much benefit.  I'm more disturbed by allowing the type to be omitted in variable/attribute/parameter/possrep component definitions.  It seems just as likely to lead to some confusion as to be of any real benefit, considering that solutions to the problem at hand are already fairly easily available in TD and Rel.

I'm now wondering why Tobega advanced that "type attribute" idea in the first place.  What perceived deficiency in TTM or TD was it supposed to address?

Hugh

 

I can't speak for Tobega, but the goal of my DICTIONARY proposal is to make it easy to create to create schemas with distinct attribute types where they should be distinct, the same attribute types where they should be the same, and all attribute types distinct from the built-in types even if they're using the built-in types.

Arguably, for proper type safety, that's what we should always do.  Defining attributes like Customer_ID and Product_ID and Quantity all as INTEGER is abominable -- though in the real world of SQL, commonplace -- and DICTIONARY is intended to make it easy to stop doing it badly and easy to start doing it right. (The same rationale presumably applies to distinct types in some SQL implementations.)

I presume Tobega (though, again, I can't speak for him) intended the same thing, but approaches it a different way.

But, yes, you can do exactly the same thing without DICTIONARY. Simply define and use the approach where I've shown what DICTIONARY is shorthand for. That's what I tend to do, anyway.

The deficiency in Tutorial D is that it's rather laborious to do it the right way and all too easy to do it the wrong way (just use INT and CHAR and BOOL and RATIONAL for everything), so  it would be nice -- pedagogically, polemically (kind of...), and practically -- to have a construct (DICTIONARY) that strongly encourages (but doesn't require) you to do it the right way all the time.

Yes, I know about the motivation but I don't think you've justified the additional feature: omission of type name in declarations that use a dictionary type.  That's the bit I'm most bothered about and I don't see what it's got to do with the main problem.

It falls naturally out of having the DICTIONARY -- why would you redundantly specify the type again in a heading or variable/parameter declaration when you've already specified it once in the dictionary?

It's part of the reason for having the DICTIONARY -- so you can specify desired attributes just by name, rather than name-and-type. For a large schema, it can become quite tedious to specify the CUSTOMER_ID CUSTOMER_ID (<identifier> <type>) every time you declare a CUSTOMER_ID attribute, given CUSTOMER_ID is always going to be (or should be!) of type CUSTOMER_ID.

Likewise, for a large program, it can become quite tedious to specify CUSTOMER_ID CUSTOMER_ID every time you declare a CUSTOMER_ID variable or parameter, given it's always going to be (or should be!) of type CUSTOMER_ID.

And it discourages the user from taking the lazy way out and declaring CUSTOMER_ID as CHAR wherever CUSTOMER_ID appears. It discourages the user from making the mistake of declaring CUSTOMER_ID as CUSTOMER_ID in this relvar and INT in that relvar (or as CHAR in this relvar and INT in that relvar.)

Obviously, you can simply explicitly create a CUSTOMER_ID type and specify it wherever you declare a CUSTOMER_ID variable, parameter or attribute, but DICTIONARY makes it easier to write and get right.

Btw, I note that when dictionary type values are displayed in Rel, the underlying value v will appear wrapped, as typename(v).  In BS12 we probably would have extended our existing support for specifying default display formats to allow the display format for THE_Value(v) to be used instead.

My presumption was that within the RelTutorial D environment, any interaction is mainly development activity, so it's helpful from a debugging/development point of view for values to be clearly and unambiguously displayed in a fashion that also conveniently allows "round-tripping." In other words, so that you can copy any emitted value at the output back into the input, and have it be correctly parsed as the same value. It's handy for development and debugging.

If you're using Rel as a DBMS or application development environment, then for user displays it's straightforward to emit values with custom formatting when needed -- e.g., obtain THE_Value(someType) and display it.

What I dislike, particularly in debugging environments, is for every value to be emitted as a bare string with type information either completely elided, or at best somewhere else so you have to go looking for it. I like the fact that Rel emits CUSTOMER_ID(33) as CUSTOMER_ID(33) rather than 33, particularly as in Tutorial D a literal of type CUSTOMER_ID is specified as CUSTOMER_ID(33) and not 33.

Though admittedly, this is a bit of a marmite feature. Folks I've showed it to are rarely neutral; they either love it or hate it.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on May 8, 2021, 8:14 am
Quote from dandl on May 8, 2021, 2:20 am

I'm now wondering why Tobega advanced that "type attribute" idea in the first place.  What perceived deficiency in TTM or TD was it supposed to address?

I agree. What problem are we trying to solve, and what would a really good solution look like?

If the aim is merely to show intended usage in the database, then type aliases are enough. Since we also want types with components eg Point(x,y)), something that closely resembles a type with a single component might do the job. But although we now know that ID and QTY and STATUS are different, we don't know what purpose they serve or what operations are allowed on them. We might suspect that LOW_SCORE and HIGH_SCORE are related, while LOW_TEMP and LOW_LEVEL and LOW_BALANCE probably are not, but the compiler can't help us.

None of this helps with avoiding clangers like SUM(LOW_TEMP) or AVERAGE(CLIENT_ID) or ORDER BY BITMAP_BLOB or (QTY+STATUS-ACCOUNT_BALANCE). To do that we need a type system that allows fine-grained types with operations drawn from those available to the underlying base type. It's not inheritance, it's more like the Haskell system. You need a way to say that ID is an integer with comparison but no arithmetic operators, while STATUS can be averaged but not summed and QTY cannot be added to SHIP_WEIGHT.

And that's before we even get started on units of measure.

DICTIONARY is precisely about avoiding clangers like SUM(LOW_TEMP) or AVERAGE(CLIENT_ID) or ORDER BY BITMAP_BLOB or (QTY+STATUS-ACCOUNT_BALANCE).

That's exactly what it prevents, assuming LOW_TEMP and CLIENT_ID and BITMAP_BLOB and QTY and STATUS and ACCOUNT_BALANCE have been defined in a DICTIONARY.

In fact, you don't even need DICTIONARY. It's just sugar. You could explicitly define DICTIONARY-equivalent types, per my examples showing what the DICTIONARY shorthand is longhand for.

That certainly prevents you from writing AVG(R, CLIENT_ID), at least by default.

It doesn't prevent you from writing AVG(R, THE_Value(CLIENT_ID)), if that's really what you want. But it makes the developer be explicit about it, and it's much less likely to happen carelessly and unintentionally than if you defined CLIENT_ID to be of type INTEGER.

So it does nothing. You declare types for each attribute, and because those types have no operators (except the default equality) you cannot use them for anything. But you can cast values to the underlying type, and then there are no restrictions at all, you can use them however you wish. The code will now be littered with casting cruft but will be no more safe than it was before.

If that's not so, please show me the code.

And if you really, really want to be able to say AVG(R, CLIENT_ID), then you can define (at least in Rel) your own AVG that works on type CLIENT_ID. The same goes for other operators, too. Remember that the Tutorial D type system is open -- new operators can be added that reference any type at any time, including (in Rel, at least) symbolic operators like +, -, /, *, etc.

As such, "a type system that allows fine-grained types with operations drawn from those available to the underlying base type" is exactly what it is. You can either define new operators as needed, or access the underlying "base" type via THE_Value(...), whilst "avoiding clangers like SUM(LOW_TEMP) or AVERAGE(CLIENT_ID) or ORDER BY BITMAP_BLOB or (QTY+STATUS-ACCOUNT_BALANCE)."

The examples I gave were intended to show the need to discriminate: this operator applies to this type but not to that one. Your cast operation rides roughshod over any attempt to discriminate.

That's exactly what my DICTIONARY proposal is about. It's not that you can't do all of it in Tutorial D as it stands -- you most certainly can.

DICTIONARY is about making it syntactically easier.

Without addressing the real problem or getting any real help from the compiler.

 

Andl - A New Database Language - andl.org
Quote from dandl on May 9, 2021, 8:51 am
Quote from Dave Voorhis on May 8, 2021, 8:14 am
Quote from dandl on May 8, 2021, 2:20 am

I'm now wondering why Tobega advanced that "type attribute" idea in the first place.  What perceived deficiency in TTM or TD was it supposed to address?

I agree. What problem are we trying to solve, and what would a really good solution look like?

If the aim is merely to show intended usage in the database, then type aliases are enough. Since we also want types with components eg Point(x,y)), something that closely resembles a type with a single component might do the job. But although we now know that ID and QTY and STATUS are different, we don't know what purpose they serve or what operations are allowed on them. We might suspect that LOW_SCORE and HIGH_SCORE are related, while LOW_TEMP and LOW_LEVEL and LOW_BALANCE probably are not, but the compiler can't help us.

None of this helps with avoiding clangers like SUM(LOW_TEMP) or AVERAGE(CLIENT_ID) or ORDER BY BITMAP_BLOB or (QTY+STATUS-ACCOUNT_BALANCE). To do that we need a type system that allows fine-grained types with operations drawn from those available to the underlying base type. It's not inheritance, it's more like the Haskell system. You need a way to say that ID is an integer with comparison but no arithmetic operators, while STATUS can be averaged but not summed and QTY cannot be added to SHIP_WEIGHT.

And that's before we even get started on units of measure.

DICTIONARY is precisely about avoiding clangers like SUM(LOW_TEMP) or AVERAGE(CLIENT_ID) or ORDER BY BITMAP_BLOB or (QTY+STATUS-ACCOUNT_BALANCE).

That's exactly what it prevents, assuming LOW_TEMP and CLIENT_ID and BITMAP_BLOB and QTY and STATUS and ACCOUNT_BALANCE have been defined in a DICTIONARY.

In fact, you don't even need DICTIONARY. It's just sugar. You could explicitly define DICTIONARY-equivalent types, per my examples showing what the DICTIONARY shorthand is longhand for.

That certainly prevents you from writing AVG(R, CLIENT_ID), at least by default.

It doesn't prevent you from writing AVG(R, THE_Value(CLIENT_ID)), if that's really what you want. But it makes the developer be explicit about it, and it's much less likely to happen carelessly and unintentionally than if you defined CLIENT_ID to be of type INTEGER.

So it does nothing. You declare types for each attribute, and because those types have no operators (except the default equality) you cannot use them for anything. But you can cast values to the underlying type, and then there are no restrictions at all, you can use them however you wish. The code will now be littered with casting cruft but will be no more safe than it was before.

If that's not so, please show me the code.

I'm not going to paste a whole schema, but note that most distinct types are isolated, "meeting" with other distinct types at limited points. We don't want to add QTY to PRODUCT_NUMBER, or JOIN PRODUCT_NUMBER to CUSTOMER_NUMBER. That fact can be effectively used, effectively encouraged (even more so with DICTIONARY) and results in a relatively small need to create new operators to operate on distinct types. Those tend to be neatly encapsulated in operators which start by copying relevant values from THE_Value(type1) and THE_Value(type2) -- or just THE_Value(sometype) -- to local variables, manipulate them, and return a result.

But that happens relatively rarely. In practice, the code doesn't become littered with THE_Value(...) invocations, because operators can be added as needed and each typically has only a few.

And if you really, really want to be able to say AVG(R, CLIENT_ID), then you can define (at least in Rel) your own AVG that works on type CLIENT_ID. The same goes for other operators, too. Remember that the Tutorial D type system is open -- new operators can be added that reference any type at any time, including (in Rel, at least) symbolic operators like +, -, /, *, etc.

As such, "a type system that allows fine-grained types with operations drawn from those available to the underlying base type" is exactly what it is. You can either define new operators as needed, or access the underlying "base" type via THE_Value(...), whilst "avoiding clangers like SUM(LOW_TEMP) or AVERAGE(CLIENT_ID) or ORDER BY BITMAP_BLOB or (QTY+STATUS-ACCOUNT_BALANCE)."

The examples I gave were intended to show the need to discriminate: this operator applies to this type but not to that one. Your cast operation rides roughshod over any attempt to discriminate.

I'm not sure what you mean by "cast operation", because there isn't any casting per se. If you mean the THE_Value(something) operators, they no more "ride roughshod over any attempt to discriminate" than, say, creating class instances in C# that have some getters to obtain component values. That doesn't "ride roughshod" over any attempt to discriminate the class instances from other class instances, does it?

That's exactly what my DICTIONARY proposal is about. It's not that you can't do all of it in Tutorial D as it stands -- you most certainly can.

DICTIONARY is about making it syntactically easier.

Without addressing the real problem or getting any real help from the compiler.

It's addressing the real problem, which is lazily defining almost all attributes (and variables and parameters) as INT, BOOL, RAT and CHAR, leading to inadvertently joining, adding, averaging, summing, whatevering them when you shouldn't.

It's getting exactly the help from the compiler that you want, because the compiler:

  • Defines new types to wrap INT, BOOL, RAT, CHAR (or directly use, or wrap, your own user-defined types if you like) via DICTIONARY.
  • Lets you use DICTIONARY names in attribute/variable/parameter declarations, which improves brevity and reduces the likelihood of making mistakes.
  • Makes it harder to inadvertently join, add, average, sum, etc., attributes/variables/parameters when you shouldn't do so, because type safety.
  • Obligates that it be explicit and obvious when you do intentionally want to join (if they're differently-typed), add, average, sum, or whatever attributes/variables/parameters.

That said, imagine a typical type like INT wrapped in a typical type like QUANTITY. In other words:

TYPE QUANTITY POSSREP {Value INT};

It might be manually defined, or defined via DICTIONARY; doesn't matter.

For brevity's sake if nothing else, it might be nice -- given two or more attributes/variables/parameters of type QUANTITY, to be able to perform INTEGER operations on them without having to say THE_Value(QUANTITY) + THE_Value(QUANTITY_ORDERED), so that you could just say QUANTITY + QUANTITY_ORDERED -- but it would only be nice. It's certainly not necessary, and arguably there's value in being explicit about what's going on, and there are decisions that would have to be made about whether INTEGER could partake equally or not. E.g., is QUANTITY + QUANTITY_ORDERED + 3 allowed? Is QUANTITY + QUANTITY_ORDERED + QUANTITY_BACKORDERED allowed if QUANTITY_BACKORDERED is defined to be INTEGER? Etc.

If allowed at all, it should probably only be allowable in some explicitly designated scope. Probably better not to allow it, and stick to requiring use of THE_Value(...)

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org