The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

"The Programmer As Navigator"

I've been trying to understand Bachmann's Turing Award lecture, where he gives a list of ways to navigate through a Codasyl database.  Here it is:

1. He [the programmer] can start at the beginning of the database, or at any known record, and sequentially access the "next"
record in the database until he reaches a record of interest or reaches the end.

2. He can enter the database with a database key that provides direct access to the physical location of a record. (A database key is the permanent virtual memory address assigned to a record at the time that it was created.)

3. He can enter the database in accordance with the value of a primary data key. (Either the indexed sequential or randomized access techniques will yield the same result.)

4. He can enter the database with a secondary data key value and sequentially access all records having that particular data value for the field.

5. He can start from the owner of a set and sequentially access all the member records. (This is equivalent to converting a primary data key into a secondary data key.)

6. He can start with any member record of a set and access either the next or prior member of that set.

7. He can start from any member of a set and access the owner of the set, thus converting a secondary data key into a primary data key.

Now methods 2, 5, 6, and 7 are clear enough.  But 1, 3, and 4 run up against (as far as I can tell) his description of what a database is as distinct from a single file:  "In a database, it is common to have several or many different kinds of records. For an example, in a personnel database there might be employee records, department records, skill records, deduction records, work history records, and education records. Each type of record has its own unique primary data key, and all of its other fields are potential secondary data keys."  So:

  • In the sequential access of method 1, what order is implied?  Are the records retrieved in order of insertion, or in some deterministic but unknown order, or in an order that can vary from run to run?  And are they segregated by record type, or must the programmer be prepared to handle a mixture of record types?
  • Similarly, in methods 3 and 4, how is it known what record type the keys apply to?  The phrase "enter the database in accordance with the value of a primary data key" gives no indication.  If the primary key values of different record types are the same, presumably this does not mean that all of them are retrieved.

Can anyone with an understanding of the Codasyl model make sense of this?  Thanks.

Quote from johnwcowan on November 15, 2019, 11:05 pm

I've been trying to understand Bachmann's Turing Award lecture, where he gives a list of ways to navigate through a Codasyl database.  Here it is:

1. He [the programmer] can start at the beginning of the database, or at any known record, and sequentially access the "next"
record in the database until he reaches a record of interest or reaches the end.

2. He can enter the database with a database key that provides direct access to the physical location of a record. (A database key is the permanent virtual memory address assigned to a record at the time that it was created.)

3. He can enter the database in accordance with the value of a primary data key. (Either the indexed sequential or randomized access techniques will yield the same result.)

4. He can enter the database with a secondary data key value and sequentially access all records having that particular data value for the field.

5. He can start from the owner of a set and sequentially access all the member records. (This is equivalent to converting a primary data key into a secondary data key.)

6. He can start with any member record of a set and access either the next or prior member of that set.

7. He can start from any member of a set and access the owner of the set, thus converting a secondary data key into a primary data key.

Now methods 2, 5, 6, and 7 are clear enough.  But 1, 3, and 4 run up against (as far as I can tell) his description of what a database is as distinct from a single file:  "In a database, it is common to have several or many different kinds of records. For an example, in a personnel database there might be employee records, department records, skill records, deduction records, work history records, and education records. Each type of record has its own unique primary data key, and all of its other fields are potential secondary data keys."  So:

  • In the sequential access of method 1, what order is implied?  Are the records retrieved in order of insertion, or in some deterministic but unknown order, or in an order that can vary from run to run?  And are they segregated by record type, or must the programmer be prepared to handle a mixture of record types?
  • Similarly, in methods 3 and 4, how is it known what record type the keys apply to?  The phrase "enter the database in accordance with the value of a primary data key" gives no indication.  If the primary key values of different record types are the same, presumably this does not mean that all of them are retrieved.

Can anyone with an understanding of the Codasyl model make sense of this?  Thanks.

Yes.  I recognize it very very well from what I remember of IDMS DML, that is, I don't really know how well IDMS DML complies to the official codasyl stuff.

  1. is the process of a "sweep".  You go through all the records in a particular area of the database in order of physical recording.  You can inspect what record type it is by looking at the RECORD_NAME field of the SUBSCHEMA_CTRL communication block and then do a case construct on that field.  Alternatively you could ask the system to retrieve only the records of a particular record type.
  2. is record access using a dbkey.  an IDMS dbkey is a unique identifier of a record within the whole database.  In my days, it was a 4-byte value with the first three bytes holding the "page number within database" and the last byte holding the "record sequence number within page" (deleting a record with sequence nbr 2 would not cause record nbr 3 to become nbr 2 though).  Bad practice to rely on this mechanism too heavily, though sometimes you needed it to "restore database currency" (operations like "OBTAIN NEXT ... WITHIN <area>" were always relative to "current record of ...")
  3. Is the process of accessing, say, a customer record using a customer number value as key.  If the physical org was hashing-based, it would be OBTAIN CALC REC_CUST, if the physical org was index-key based, then it would be OBTAIN REC_CUST USING <index key field in program>.  IDMS also allowed the DBA to allow duplicates for a calc key, which could be accessed one by one using OBTAIN DUPLICATE after a successful OBTAIN CALC.
  4. Point 5 (!!!!!!) is the process of first making, say, a given customer (CUST_REC) "current" in your run-unit and then accessing all of that particular customer's order, or addresses, or contracts, or anything connected to the CUST_REC via a SET.  That is, something like OBTAIN NEXT <member_rec_name> WITHIN <set-name>.  Sets could also be multi-member and you could also request OBTAIN NEXT WITHIN <set-name> and then you'd first have to inspect the RECORD_NAME field to know what record type the DBMS returned to you.
  5. Point 4 (!!!!!!!) appears to appeal to the fact that sets could be ordered (and the ordering key for that set then being that "secondary key value", so that, say, all orders of a given customer could easily be accessed in order date order.  You could obviously then also "position" your run-unit somewhere in the middle of that set and thus process, say, only the orders as of this-or-that particular date (of this particular customer).
  6. Is just the facility you need to use to "loop through the members of a set" as described in 4/5.  Don't see why it's a bullet in its own right here.
  7. OBTAIN OWNER is the facility that got mostly used when processing BOM structures.  From a given part, you could access, say, PART_CONTAINMENT records using typically methods 4/5/6 (say, OBTAIN FIRST PART_CONTAINMENT WITHIN CONTAINS_PART), but if you wanted to get to the actual data of the contained part you'd have to follow the "owner" link of the second set between the two record types, say, OBTAIN OWNER WITHIN CONTAINED_BY_PART.

The questions he asks at the end have been, in the case of IDMS, answered as design choices made by the authors.