The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

7NF and ℵ

One of the disadvantages of 6NF is the need to replicate the attributes of the central relation across all the others.  If the central relation's candidate key (the whole tuple) is lengthy or contains large attribute values, the cost in storage can be large.  In 7NF (which I propose here), we add an synthetic attribute named ℵ that is also a candidate key for the central relation.  By doing this, we can make ℵ an attribute, and the candidate key, for all the peripheral relations, so that they are all of degree exactly 2, affording a considerable simplification.

Another advantage to 7NF is that in 6NF k + 1 relations are required to represent a 1NF relation, where k is the rank of the original relation minus the rank of the central relation.  But in 7NF, if the values of ℵ are made unique across the whole database, then distinct central relations may share 7NF peripheral relations if they contain the same attribute.  Thus a single peripheral relation with heading (ℵ, PURCHASE_PRICE) can be shared by many central relations: a product relation, an invoice relation, an asset relation, and so on.

The type of ℵ is arbitrary as long as its value space is sufficiently large.  It can be a serially assigned integer; a random integer or string with a sufficiently large number of bits, such as a UUID; or (worst case) the concatenation of the relation name with the string image of all the attributes in the central relation, in which case the improvement is merely theoretical, as the storage consumption may well be worse than before.

Of course, if a DBMS restricted all public relvars to 6NF, it would be possible to implement them with 7NF under the (erm) table.  But since that isn't likely to be the case, 7NF is like the other NFs, a design principle.

Comments?

I'm surprised that these days "the cost in storage" is any concern at all.

we add an synthetic attribute named ℵ that is also a candidate key for the central relation.

So in at least the central relation, there is duplicated information, with the usual opportunities for anomalies. Or do you propose that once an ℵ is allocated to a (central relation's) tuple, there's no need to change it even if the 'business key's attributes change? Then you have to distinguish UPDATEs (which don't change the ℵ) vs DELETE/INSERT which will need to cascade any new ℵ to the peripheral relations.

But in 7NF, if the values of ℵ are made unique across the whole database, then distinct central relations may share 7NF peripheral relations if they contain the same attribute.

I'd like to see the characteristic predicate for a relvar with an ℵ and a MONEY that variously holds PURCHASE_PRICE, COST_PRICE, EXTENDED_PRICE, TAX_AMOUNT, .... What's its INCLUSION DEPENDENCY (aka FOREIGN KEY constraint) going to look like? There's nothing in the ℵ tells which central relation it belongs to. What if we want two MONEYs peripheral to the same central tuple?

And I see no grounds for calling this "7NF": it's not beyond 6NF; as soon as you have Foreign Keys, you can potentially use a synthetic Primary Persistent Key. Every darn application I've been working with recently does this already. Bane of my life. Dratted ORM (done badly, according to Dave).

Quote from AntC on September 8, 2019, 10:29 pm

I'm surprised that these days "the cost in storage" is any concern at all.

It's true we have Big Storage nowadays, but Big Data is outrunning it.  We had a zettabyte (10²¹ bytes) of storage in use in 2012 and 18 times that now, with 175 zettabytes expected by 2025.

 

Or do you propose that once an ℵ is allocated to a (central relation's) tuple, there's no need to change it even if the 'business key's attributes change? Then you have to distinguish UPDATEs (which don't change the ℵ) vs DELETE/INSERT which will need to cascade any new ℵ to the peripheral relations.

"Exactly so!" quoth he.

But in 7NF, if the values of ℵ are made unique across the whole database, then distinct central relations may share 7NF peripheral relations if they contain the same attribute.

I'd like to see the characteristic predicate for a relvar with an ℵ and a MONEY that variously holds PURCHASE_PRICE, COST_PRICE, EXTENDED_PRICE, TAX_AMOUNT, .... What's its INCLUSION DEPENDENCY (aka FOREIGN KEY constraint) going to look like? There's nothing in the ℵ tells which central relation it belongs to. What if we want two MONEYs peripheral to the same central tuple?

I understand 6NF to be about attributes, not domains; you have a peripheral relation for each non-key attribute of the original relation, and it's irrelevant if they all have the same domain, all have different domains, or in between.  What if we want two MONEYs in the same non-6NF relation?  We give them different names, and just so here.

So here we don't merge any of these, we only merge PURCHASE_PRICE in one original (non-7NF) relation with PURCHASE_PRICE attributes in the other original relations that contain it.

 

we only merge PURCHASE_PRICE in one original (non-7NF) relation with PURCHASE_PRICE attributes in the other original relations that contain it.

You're not answering the point about the Foreign Key constraint:

  • I'm inserting a TUPLE{ ℵ PKEY,  MONEY PURCHASE_PRICE} into a peripheral relvar.
  • That's a peripheral for several "original" relvars.
  • I need to validate that the PKEY exists already (or as part of this multiple update) in the PURCHASE_ORDER_DETAIL relvar.
  • How does the constraint checker know which relvar to go looking in?
  • Or do you suggest it looks in every relvar for which PURCHASE_PRICE is peripheral, and be happy if it finds that just anywhere?
  • (That last bullet was heavy with sarcasm, I probably need to point out.)
  • This is a issue about the information structure of the schema. There's a separate issue about the performance/indexing of the peripheral relvar, we don't want to rob Peter to pay Paul -- i.e. win on disk space but lose on processing for a bigger number of index entries.
  • Does mashing several different PURCHASE_PRICEs into one peripheral relvar actually save anything at all? (It cuts down on the number of relvars/names; is that any kind of saving?) What's wrong with a separate relvar for each distinct 'role' of PURCHASE_PRICE?
  • Oh, I seem to have reinvented 6NF (with surrogate keys).

 

Quote from AntC on September 9, 2019, 6:54 am
  • Does mashing several different PURCHASE_PRICEs into one peripheral relvar actually save anything at all? (It cuts down on the number of relvars/names; is that any kind of saving?)

Perhaps not, and there are practical difficulties, as you point out.  But merging tables was just an afterthought on my part: the essence here is ℵ.

  • Oh, I seem to have reinvented 6NF (with surrogate keys).

I think the use of ℵ is different enough from 6NF as usually defined to deserve a new name, but I won't argue that.

Another afterthought:  if all ℵ attributes throughout the database have a common domain ℸ (what you called PKEY above), then it is possible to treat the database as a graph database (in the modern sense, not Codd's sense of a Codasyl network database).  Each node in a graph DB has a type which prescribes that the node has zero or more required properties (the attributes of the central relation) and zero or more optional properties (defined in peripheral relations), where a property is a pair <name, value> and value may be of type ℸ as well as any other type.  For this purpose we relax the idea that the non-ℵ attributes of the central relation form a candidate key.

Graph databases in turn are isomorphic to RDF triple stores (with a slight limitation due to static typing), so we now have an embedding of both models in the relational model; we can, for example, write a RM database that accepts GraphQL or Sparql queries on databases of the right overall shape.

(I'm prepared to have this idea shot down too.)