Should a list of values be a table or something else?

#1 · October 22, 2018, 4:54 am

Assume a database design with a lot of fields that are of type 'list of values'. They might be "yes,no,maybe", "urgent,important,normal,low", "$,$$,$$$,$$$$,$$$$$", etc. Assume values are never deleted and rarely if ever added. Assume the values are self-explaining, they do not need a description. Make whatever other assumptions you like.

The question is: do such values always justify a table for each, or is there some particular criterion on which it is better to treat them as an enumerated type?

Andl - A New Database Language - andl.org

#2 · October 22, 2018, 10:35 am

My general rules, to the extent that they're rules rather than case-by-case considerations, are:

If the database is shared or likely to be shared by multiple applications -- particularly multiple applications written in multiple languages -- then "list of values" or "static lookups" or whatever are implemented as tables.
If the database is not shared or not likely to be shared by multiple applications -- such as a persistence layer for a single application -- and the "list of values" is likely to be tightly coupled to application code, then I'll use an enumerated type.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#3 · October 23, 2018, 10:33 am

This is my first post to the new forum. Fingers crossed ...

In Rel, when your "table" approach seems best, I often use a view--a virtual relvar defined on a relation literal (so not really a variable in any sense!), combined with a foreign key constraint defined using IS_EMPTY(foo NOT MATCHING bar). In some cases the values are n-tuples (n>1), making this approach especially pleasing (to me, at least).

Coauthor of The Third Manifesto and related books.

#4 · October 23, 2018, 10:48 am

1) Most of your examples are a pretty poor case of "self-explaining" because it makes assumptions about which particular natural language is spoken by the entire world.
2) That notwithstanding, enumerated type seems a better fit to me than defining a relvar for it (virtual and read-only or not). On count of point 1), in any multilingual setting 1 table won't be enough because you need a second holding the I18N/L10N labels. Furthermore, if you make it a relvar, what are you going to use as the type of the attribute ? What are you going to do when the need for / usefulness of an operator operating on your values becomes clear ? As long as you can be certain that you'll never ever need anything else than just value equality, you're OK. When can you know that for sure ? And furthermore, TTM has not really been designed so [conforming languages have the property] that the relvar name of your relvar can be used in any place where the name of your enumerated type could be used.

Author of SIRA_PRISE

#5 · October 23, 2018, 11:26 pm

Quote from Erwin on October 23, 2018, 10:48 am

1) Most of your examples are a pretty poor case of "self-explaining" because it makes assumptions about which particular natural language is spoken by the entire world.

Not really. I assume there is a totally separate translation phase. In which case it may be easier to identify what to translate if the relevant terms are part of a type definition than if they're just data in the database.

Your other comments are reasonable, but duck the question: what in particular would tip the balance towards using a table?

Andl - A New Database Language - andl.org

#6 · October 24, 2018, 5:27 am

Quote from dandl on October 22, 2018, 4:54 am

Assume a database design with a lot of fields that are of type 'list of values'. They might be "yes,no,maybe", "urgent,important,normal,low", "$,$$,$$$,$$$$,$$$$$", etc. Assume values are never deleted and rarely if ever added. Assume the values are self-explaining, they do not need a description. Make whatever other assumptions you like.

The question is: do such values always justify a table for each, or is there some particular criterion on which it is better to treat them as an enumerated type?

There is certainly a particular criterion. The criterion is whether or not any individual values are effectively program source code. If at any time, in any code whether some type definition or any application procedural code, the code specifically makes reference to individual values of the set, meaning that any values correspond to specific program logic, then it should be an enumerated type. Whereas, if there is no code that ever makes reference to individual values, and every value is treated the same by code, then the options are effectively like user data like names or street addresses etc. So, things like status codes hard-coded to drive logic, those should be an enumerated type, whereas options that only have meaning to users should be data. And if you have cases that might cross boundaries such as hard-coded status codes with names that users see, the codes should be an enumeration type, and the labels can be data which matches those to user-defined labels. Its really as simple as that. Very clear cut. Any questions?

#7 · October 24, 2018, 7:12 am

Quote from Darren Duncan on October 24, 2018, 5:27 am

There is certainly a particular criterion. The criterion is whether or not any individual values are effectively program source code. If at any time, in any code whether some type definition or any application procedural code, the code specifically makes reference to individual values of the set, meaning that any values correspond to specific program logic, then it should be an enumerated type.

It's a valid point, but is is enough? Are you saying that we must never use an enumerated type for the allowable values unless we use those values in source code?

Andl - A New Database Language - andl.org

#8 · October 24, 2018, 3:17 pm

Quote from dandl on October 24, 2018, 7:12 am

Quote from Darren Duncan on October 24, 2018, 5:27 am

There is certainly a particular criterion. The criterion is whether or not any individual values are effectively program source code. If at any time, in any code whether some type definition or any application procedural code, the code specifically makes reference to individual values of the set, meaning that any values correspond to specific program logic, then it should be an enumerated type.

It's a valid point, but is is enough? Are you saying that we must never use an enumerated type for the allowable values unless we use those values in source code?

I'm saying that any enumeration which source code depends on for its logic should be canonically defined in its source code and it shouldn't be possible to do anything in the database that would disrupt that logic. The other way, defining enumerations that the code doesn't reference, is fine, especially if you conceptualize that users would never update that list themselves, and that updating the options is conceptually a source code change.

#9 · October 25, 2018, 11:48 am

A set of allowable values for an attribute should be declared as an enumerated type if (1) there is more than one relvar with an attribute that allows that particular set of allowable values, and (2) members of that set of allowable values will not be compared to variables or attributes declared with a type that is a superset of that set of values.

The values of an enumerated type are not a subset of the possrep component's type, and therefore values of an enumerated type cannot be compared to values of the type of the possrep component.

Brian

#10 · October 25, 2018, 4:54 pm

In response to Brian, I would make an allowable exception in the case where the possible values constitute a key of one relvar and a foreign key in all the others. I find that especially pleasant when I like the values to be tuples whose attributes can then be mentioned in invocations of rel ops.

Hugh

Coauthor of The Third Manifesto and related books.

The Forum for Discussion about The Third Manifesto and Related Matters

Should a list of values be a table or something else?