The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Object-Relational hybrids and `id` as Primary key

Page 1 of 2Next

I've just been answering a SO question on schema design -- which was a quite interesting real-life example in its own right, but -- the questioner seemed to have got off-track from the outset because:

They're using 'eloquent'/Laravel, which seems to be some Object-Relational hybrid, with the usual feeble justification for calling it "relational". It by default allocates a Primary key to every table/object, which is a biginteger auto-incrementing, named id. (Thus of course natural joins immediately don't work.)

It seems you can override that default behaviour see `#Primary Keys` here. But the ability to override is not well-advertised. (There's a few other SO questions telling you how; but even some comments on them saying you should follow Laravel conventions or you'll get into trouble.)

Very occasionally in designing schemas I've used an auto-incrementing int as Primary key. Never have I wanted to call it id. (More usually, I've wanted the key to be a hash of some auto-incrementing system value -- hashed so I get a good spread across the indexes, rather than every concurrent transaction hammering at the same block.)

What I'd expect from a helpful schema design tool is say: you haven't declared a (Primary) Key here; that means I'll allocate an int called id; and that's almost certainly what you don't want; are you sure?

 

It was a common convention in Ruby on Rails to have an auto-incrementing primary key called 'something_id', but I don't remember it as a cause of problems. The core might be transactions like sale or invoice, with the id field used to link to customers and products. It was preferred to use id fields for linking rather than customer account no because of the problems if a customer was given two accounts.

Granted one should not use the same name to mean different things in different tables if one ever wants to do a natural join. But apart from that, what's the beef? Give the id fields unique names and we're done.

Andl - A New Database Language - andl.org

I see 2 main beefs.

One was using the same field name "id" for holding data with different meanings, so yes natural joins don't work.

The other one is auto-incrementing fields, which have a host of problems of their own for example you have much more complicated logic of having to insert a record and then only after that do you know what its primary key is, especially if you have other records that need to point to it.  And also its non-repeatable if you need to rollback and try again etc.  And the predictability of these is potentially a security issue.

I figure that if one needs to use serially generated ids, the better way is that the id generator is not tied to the tablevar definition and is explicitly invoked first as its own operation to get the id to use, and then as a separate operation you insert the record such that the insertion uses the generated number as static data like any other.

Quote from Darren Duncan on February 14, 2023, 5:41 am

...

I figure that if one needs to use serially generated ids, the better way is that the id generator is not tied to the tablevar definition ...

As an afterthought, it might be Laravel is generating GUID's from the global DBMS. But yeah, there's the same difficulties with reproducibility/rollback, as you mention.

At least, its convention could be that the field name is <table_name>_id in snake_case -- which is what it recommends using on other tables that foreign-key to the id.

Oh but see '#Timestamps' at that same link: "By default, Eloquent expects created_at and updated_at columns to exist " -- that is, same-named in every table. (Again you can override that default.) It really is hostile to Natural Join.

Quote from dandl on February 14, 2023, 5:32 am

... It was preferred to use id fields for linking rather than customer account no because of the problems if a customer was given two accounts.

That sounds like the shibboleth of wanting to make customer numbers meaningful --  like using the first few letters of their name. (And then they go changing their name.) Don't do that -- that is, don't make business keys meaningful; generate a separate namedex lookup for searching -- not that fancy searching is problematic these days.

So yes there's legit reasons for wanting a meaningless/might as well be auto-gen'd key -- but not for making that the default design. (And in that case, per Darren's point about rollback, I'd generate that and pre-fill it on the screen upon going into the 'New customer' setup. If the user ends up cancelling, we've an unused number, so what?) Still I wouldn't generate a key field only after the transaction had posted an insert.

 

And about the created/updated-at fields, that brings up another thing. I feel a best default database design is that tablevars are insert-only and represent an event or snapshot, which AFAIK is actually the case with most real-world tablevars anyway, rather than representing mutable entities. With this design, there is no concept of updated-at, and you don't need separate auditing triggers and tablevars etc in order to have universal auditing, because the canonical tablevars are self-auditing by nature of being insert-only.

Quote from AntC on February 14, 2023, 6:40 am
Quote from dandl on February 14, 2023, 5:32 am

... It was preferred to use id fields for linking rather than customer account no because of the problems if a customer was given two accounts.

That sounds like the shibboleth of wanting to make customer numbers meaningful --  like using the first few letters of their name. (And then they go changing their name.) Don't do that -- that is, don't make business keys meaningful; generate a separate namedex lookup for searching -- not that fancy searching is problematic these days.

So yes there's legit reasons for wanting a meaningless/might as well be auto-gen'd key -- but not for making that the default design. (And in that case, per Darren's point about rollback, I'd generate that and pre-fill it on the screen upon going into the 'New customer' setup. If the user ends up cancelling, we've an unused number, so what?) Still I wouldn't generate a key field only after the transaction had posted an insert.

 

Not at all, but the example that really taught us was UR numbers in hospitals. The UR number is where the physical record is filed and has real meaning, but patients are often given a new UR number when they rock up in the middle of the night or give a different spelling of the name, or the wrong/duplicate UR number when it's actually a relative with the same name. You need a real unique identifier (a GUID if you like) to link together all the UR numbers. Messy.

Andl - A New Database Language - andl.org
Quote from AntC on February 14, 2023, 6:40 am
Quote from dandl on February 14, 2023, 5:32 am

... It was preferred to use id fields for linking rather than customer account no because of the problems if a customer was given two accounts.

That sounds like the shibboleth of wanting to make customer numbers meaningful --  like using the first few letters of their name. (And then they go changing their name.) Don't do that -- that is, don't make business keys meaningful; generate a separate namedex lookup for searching -- not that fancy searching is problematic these days.

So yes there's legit reasons for wanting a meaningless/might as well be auto-gen'd key -- but not for making that the default design. (And in that case, per Darren's point about rollback, I'd generate that and pre-fill it on the screen upon going into the 'New customer' setup. If the user ends up cancelling, we've an unused number, so what?) Still I wouldn't generate a key field only after the transaction had posted an insert.

 

It's sounding more and more like a retake on Rails.

Andl - A New Database Language - andl.org
Quote from dandl on February 14, 2023, 9:47 am

 

... when they rock up in the middle of the night or give a different spelling of the name, or the wrong/duplicate UR number when it's actually a relative with the same name.

Heh, heh. One of the projects I was dragged into -- and I'll have to kill you if I tell too much -- was tracing crims (not) paying fines. They'd do anything to have their 'career' at one end of the country not connected to their new identity at the other. The police/court systems weren't nationally co-ordinated, so it fell to a central government long after the court case to try to connect everything up. Yes there was a messy process to try to link them. But occasionally they linked too many. (Father and son, same named, who occasionally did live at the same address.) So you couldn't blat all the data into one id/there was an even messier process to partially unlink.

It's sounding more and more like a retake on Rails.

Is this because they're all trying to graft OO on top of Relational, and doing it badly? Under OO everything is an Object, so must have a (GU)ID. There might be arbitrarily complex data items 'belonging to' that object -- IOW this is a hierarchical data structure.

So an Invoice 'belongs to' a customer; an Invoice line 'belongs to' an Invoice; a Product sold 'belongs to' an Invoice line. Except a Product also 'belongs to' a warehouse/sitting on the shelves; and 'belongs to' a Purchase line to restock it. Etc. I probably mean an instance of Product/there's no instance in two places at the same time, but we're not going to individually barcode each can of beans as it goes through the warehouse.

So does OO have a concept of disparate Objects linking in multiple roles to multiple other Objects? How does that go with encapsulation? The linking is not a message passed at a point in time, but a permanent connection.

Now you're starting to sound like IMS.😀

Andl - A New Database Language - andl.org
Page 1 of 2Next