The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Some issues with TTM

PreviousPage 3 of 4Next
Quote from dandl on May 9, 2020, 1:53 pm

It's only a D in the same sense that you also might "have a D" if you write SQL to only use "SELECT DISTINCT."

Even you know that isn't true. SQL has no type system remotely satisfying the requirements of D.

In other words, maybe it meets the pro/pre-scriptions in some legal sense, but it entirely misses the overall point: to be a better language for data management. Surely the fact that you have to rely on code generation or reflection -- or much manual circumlocution -- is enough to preclude RM Pre 26 compliance, even if everything else legally gets a pass?

It meets the prescriptions (of D) in the full sense. The developers of the world have already given us better languages for most aspects of data management, the only bit actually missing is a usable tuple type to allow a natural implementation of the RA. This shows how to solve that problem. The rest is free.

No, you don't have to rely on code generation or reflection, those are just aids that might help things along. The generics in C# do most of the heavy lifting.

In case you were wondering, here is some working code. I think you'll be able to work out what it does without my help. Obviously a real application would have a comprehensive library of tuple type definitions, so the ones shown here would not be needed.

WriteLine(Supplier.S
.Select(t => t.Status == 30)
.Rename<TupS,TupSX>()
.Project<TupSX,Tup1>());
public class TupSX : TupleBase { public readonly static string[] Heading = { "SNo", "SName", "Status", "Supplier City" }; }
public class Tup1 : TupleBase { public readonly static string[] Heading = { "Supplier City" }; }
WriteLine(Supplier.S .Select(t => t.Status == 30) .Rename<TupS,TupSX>() .Project<TupSX,Tup1>()); public class TupSX : TupleBase { public readonly static string[] Heading = { "SNo", "SName", "Status", "Supplier City" }; } public class Tup1 : TupleBase { public readonly static string[] Heading = { "Supplier City" }; }
    WriteLine(Supplier.S
      .Select(t => t.Status == 30)
      .Rename<TupS,TupSX>()
      .Project<TupSX,Tup1>());

public class TupSX : TupleBase { public readonly static string[] Heading = { "SNo", "SName", "Status", "Supplier City" }; }
public class Tup1 : TupleBase { public readonly static string[] Heading = { "Supplier City" }; }

Personally I think the world has made the right choice, by concentrating on general purpose languages and adding ever more powerful features on which a D can be built, if needed.

Dynamic tuple/relation types are straightforward. They're inside every existing D implementation.

The trick is a static tuple/relation type that obeys TTM semantics.

Also, where have you declared the types of SNo, SName, Status, and Supplier City?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on May 9, 2020, 2:22 pm
Quote from dandl on May 9, 2020, 1:53 pm

... SQL has no type system remotely satisfying the requirements of D.

In other words, maybe it meets the pro/pre-scriptions in some legal sense, but it entirely misses the overall point: to be a better language for data management. Surely the fact that you have to rely on code generation or reflection -- or much manual circumlocution -- is enough to preclude RM Pre 26 compliance, even if everything else legally gets a pass?

Personally I think the world has made the right choice, by concentrating on general purpose languages and adding ever more powerful features on which a D can be built, if needed.

Dynamic tuple/relation types are straightforward. They're inside every existing D implementation.

The trick is a static tuple/relation type that obeys TTM semantics.

Also, where have you declared the types of SNo, SName, Status, and Supplier City?

I would love to see a fully-worked-through type system in which:

  • Type SNo is a first-class type <A, T> like <SNo, Int> with the Int being the PhysRep), the SNo appearing only at the (static) type level, to distinguish it from <PNo, Int> -- which also PhysRep bare Int. By 'first-class' I mean there can be variables/functions/arguments of that type, not merely that type can appear inside TUP{ }, REL{ }. Also although the PhysRep is Int the programmer can't (normally/easily) do arithmetic on SNos.
  • Type SName is similarly a first-class <SName, CHAR>, PhysRep CHAR.
  • Status is an enumeration, whose PhysRep is Int but only values 10, 20, 30, etc are accessible (probably with meaningful names that are not necessarily in alphabetic order of magnitude of the underlying Int).
  • Supplier City has PhysRep CHAR; is a restricted set of values; but not an enumeration because the possible values change over time; shares the same set of values as Parts City; but is statically type-distinct from Parts City, so there's no 'accidental' JOIN ON CITY with S JOIN P.

I'm guessing that for the PhysReps, the familiar range of system-supplied types is adequate; as it is in fact in Haskell, despite all the fancy type-superstructures and polymorphism it supports. So chiefly the 'type system' is a veneer to make sure you don't mis-treat values of the same PhysRep as 'same meaning' when they aren't. So I'm agreeing with (some of) TTM's objectives, but reacting against what seems to me excessive circumlocution/a great deal more than veneer.

Then I'm not saying such a system is TTM-ish. I'm not saying it can be supported by any current/common language. I'm not even saying it's coherent and consistent -- I don't think I'll know until there's plenty of flesh on the bones.

I've seen each of the bits of what I've described in various languages; but not all of them at the same time in one language. Perhaps putting them all together leads to incoherence? Then how/why?

Personally I think the world has made the right choice, by concentrating on general purpose languages and adding ever more powerful features on which a D can be built, if needed.

Dynamic tuple/relation types are straightforward. They're inside every existing D implementation.

The trick is a static tuple/relation type that obeys TTM semantics.

There is nothing dynamic going on here. All these type declarations are static, instantiated at compile time as generics. All the type usage is fully type checked, both in the editor and at compile time. All Pre semantics are obeyed. The only thing missing here is detecting errors across headings, or between headings and declared types.

Also, where have you declared the types of SNo, SName, Status, and Supplier City?

They were in an earlier post. Here is some complete code. It's still got rough edges, but it's getting better.

As you can see the executable code and data are nice and compact and easy to read. The class definitions are verbose. They look a lot like ordinary POCOs, with the types declared in the ctor and getters. These are the only bits that might be generated.

[edit: oops. I pasted in the wrong code.]

// the query
      WriteLine(Supplier.P
        .Rename<TupP, TupPcolour>()
        .Select(t => t.Colour == "Red")
        .Project<TupPcolour, TupPPno>()
        .Join<TupPPno, TupSP, TupPjoin>(Supplier.SP)
        .Format());

// the data

  public class Supplier {
    public static RelP P = RelP.Create<RelP>(
      new List<TupP> {
        new TupP( "P1", "Nut",   "Red",   12.0m,"London" ),
        new TupP( "P2", "Bolt",  "Green", 17.0m,"Paris"  ),
        new TupP( "P3", "Screw", "Blue",  17.0m,"Oslo"   ),
        new TupP( "P4", "Screw", "Red",   14.0m,"London" ),
        new TupP( "P5", "Cam",   "Blue",  12.0m,"Paris"  ),
        new TupP( "P6", "Cog",   "Red",   19.0m,"London" ),
      });

    public static RelSP SP = RelSP.Create<RelSP>(
      new List<TupSP> {
        new TupSP( "S1", "P1", 300 ),
        new TupSP( "S1", "P2", 200 ),
        new TupSP( "S1", "P3", 400 ),
        new TupSP( "S1", "P4", 200 ),
        new TupSP( "S1", "P5", 100 ),
        new TupSP( "S1", "P6", 100 ),
        new TupSP( "S2", "P1", 300 ),
        new TupSP( "S2", "P2", 400 ),
        new TupSP( "S3", "P2", 200 ),
        new TupSP( "S4", "P2", 200 ),
        new TupSP( "S4", "P4", 300 ),
        new TupSP( "S4", "P5", 400 ),
      });
 }

// the class definitions

  public class TupPcolour : TupleBase {
    public readonly static string[] Heading = { "PNo", "PName", "Colour", "Weight", "City" };
    public string Colour { get { return (string)Values[2]; } }
  }

  public class TupPPno : TupleBase {
    public readonly static string[] Heading = { "PNo", "PName" };
  }

  public class TupPjoin : TupleBase {
    public readonly static string[] Heading = { "PNo", "PName", "SNo", "Qty" };
  }

  public class RelP : RelationBase<TupP> { }
  public class TupP : TupleBase {
    public readonly static string[] Heading = { "PNo", "PName", "Color", "Weight", "City" };
    public string Pno { get { return (string)_values[0]; } }
    public string Pname { get { return (string)_values[1]; } }
    public string Color { get { return (string)_values[2]; } }
    public decimal Weight { get { return (decimal)_values[3]; } }
    public string City { get { return (string)_values[4]; } }
    public TupP(string pno, string pname, string color, decimal weight, string city) :
      base(new object[] { pno, pname, color, weight, city }) {
    }
  }

  public class RelSP : RelationBase<TupSP> { }
  public class TupSP : TupleBase {
    public readonly static string[] Heading = { "SNo", "PNo", "Qty" };
    public string Sno { get { return (string)_values[0]; } }
    public string Pno { get { return (string)_values[1]; } }
    public int Qty { get { return (int)_values[2]; } }
    public TupSP(string Sno, string Pno, int Qty) : base(
      new object[] { Sno, Pno, Qty }) {
    }
  }

 

Andl - A New Database Language - andl.org
Quote from AntC on May 10, 2020, 12:21 am
Quote from Dave Voorhis on May 9, 2020, 2:22 pm
Quote from dandl on May 9, 2020, 1:53 pm

... SQL has no type system remotely satisfying the requirements of D.

In other words, maybe it meets the pro/pre-scriptions in some legal sense, but it entirely misses the overall point: to be a better language for data management. Surely the fact that you have to rely on code generation or reflection -- or much manual circumlocution -- is enough to preclude RM Pre 26 compliance, even if everything else legally gets a pass?

Personally I think the world has made the right choice, by concentrating on general purpose languages and adding ever more powerful features on which a D can be built, if needed.

Dynamic tuple/relation types are straightforward. They're inside every existing D implementation.

The trick is a static tuple/relation type that obeys TTM semantics.

Also, where have you declared the types of SNo, SName, Status, and Supplier City?

I would love to see a fully-worked-through type system in which:

  • Type SNo is a first-class type <A, T> like <SNo, Int> with the Int being the PhysRep), the SNo appearing only at the (static) type level, to distinguish it from <PNo, Int> -- which also PhysRep bare Int. By 'first-class' I mean there can be variables/functions/arguments of that type, not merely that type can appear inside TUP{ }, REL{ }. Also although the PhysRep is Int the programmer can't (normally/easily) do arithmetic on SNos.

Could you please clarify: in TTM that A is an attribute name. Surely you don't want attribute name as part of the type?

  • Type SName is similarly a first-class <SName, CHAR>, PhysRep CHAR.
  • Status is an enumeration, whose PhysRep is Int but only values 10, 20, 30, etc are accessible (probably with meaningful names that are not necessarily in alphabetic order of magnitude of the underlying Int).
  • Supplier City has PhysRep CHAR; is a restricted set of values; but not an enumeration because the possible values change over time; shares the same set of values as Parts City; but is statically type-distinct from Parts City, so there's no 'accidental' JOIN ON CITY with S JOIN P.

I'm guessing that for the PhysReps, the familiar range of system-supplied types is adequate; as it is in fact in Haskell, despite all the fancy type-superstructures and polymorphism it supports. So chiefly the 'type system' is a veneer to make sure you don't mis-treat values of the same PhysRep as 'same meaning' when they aren't. So I'm agreeing with (some of) TTM's objectives, but reacting against what seems to me excessive circumlocution/a great deal more than veneer.

IMO there are 9 base types: bool, int, real, decimal, text, time, binary, enum, struct. The first 7 have obvious low level representations (binary is a string of bytes rather than characters). Enum is a short string, but can be compressed to a small integer lookup on a table. Struct is a UDT, recursively composed of any of the 9 base types.

With these base types you can store (almost) anything. In particular you can do units of measure or currency nicely as a struct of real and enum. There are situations where it would be nice to have union types, and for TTM you need tuple types, and they do need to store differently from the others. So maybe there are 10 or 11. But no more.

On top of this you need user-defined type names to distinguish types that are physically the same but need to be kept distinct to a greater or lesser degree since they are intended to be used differently.Think: email type, URL type, a survey int type with a range of 0-9.

The major compiled languages support most of these but not (a) a usable enum (b) any kind of type aliasing (c) union types (d) tuple types (but you can fake it).

Then I'm not saying such a system is TTM-ish. I'm not saying it can be supported by any current/common language. I'm not even saying it's coherent and consistent -- I don't think I'll know until there's plenty of flesh on the bones.

I've seen each of the bits of what I've described in various languages; but not all of them at the same time in one language. Perhaps putting them all together leads to incoherence? Then how/why?

At least the target is not vast. All those types are known quantities, but languages to support them are in short supply.

Andl - A New Database Language - andl.org
Quote from dandl on May 10, 2020, 1:09 am

The trick is a static tuple/relation type that obeys TTM semantics.

There is nothing dynamic going on here.

The contents of the Heading array can only be referenced dynamically, yes?

Why are there are two separate but parallel naming schemes, one defined by a string array -- Heading -- which is notionally static but can only be referenced dynamically, and one defined statically by read-only properties?

Why use read-only properties at all, given that tuples are immutable?

Can't you just define a tuple type like this?

public class TupP : TupleBase {
    public const string Pno;
    public const string Pname;
    public const string Color;
    public const decimal Weight;
    public const string City;
    public TupP(string pno, string pname, string color, decimal weight, string city) {
       Pno = pno; Pname = pname; Color = color; Weight = weight; City = city;
    }
}

All these type declarations are static, instantiated at compile time as generics. All the type usage is fully type checked, both in the editor and at compile time. All Pre semantics are obeyed. The only thing missing here is detecting errors across headings, or between headings and declared types.

If you dispense with the Heading array, doesn't that limitation go away?

And what obligates Heading to be there? Is it defined in TupleBase as protected? If so, what happens if you forget to assign it in TupleBase derivations?

Also, where have you declared the types of SNo, SName, Status, and Supplier City?

They were in an earlier post. Here is some complete code. It's still got rough edges, but it's getting better.

As you can see the executable code and data are nice and compact and easy to read. The class definitions are verbose. They look a lot like ordinary POCOs, with the types declared in the ctor and getters. These are the only bits that might be generated.

Yes, the class definitions seem unduly complex when ideally you should be able to just declare a tuple value type, per my example above.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on May 10, 2020, 5:31 am
Quote from AntC on May 10, 2020, 12:21 am
Quote from Dave Voorhis on May 9, 2020, 2:22 pm
Quote from dandl on May 9, 2020, 1:53 pm

... SQL has no type system remotely satisfying the requirements of D.

In other words, maybe it meets the pro/pre-scriptions in some legal sense, but it entirely misses the overall point: to be a better language for data management. Surely the fact that you have to rely on code generation or reflection -- or much manual circumlocution -- is enough to preclude RM Pre 26 compliance, even if everything else legally gets a pass?

Personally I think the world has made the right choice, by concentrating on general purpose languages and adding ever more powerful features on which a D can be built, if needed.

Dynamic tuple/relation types are straightforward. They're inside every existing D implementation.

The trick is a static tuple/relation type that obeys TTM semantics.

Also, where have you declared the types of SNo, SName, Status, and Supplier City?

I would love to see a fully-worked-through type system in which:

  • Type SNo is a first-class type <A, T> like <SNo, Int> with the Int being the PhysRep), the SNo appearing only at the (static) type level, to distinguish it from <PNo, Int> -- which also PhysRep bare Int. By 'first-class' I mean there can be variables/functions/arguments of that type, not merely that type can appear inside TUP{ }, REL{ }. Also although the PhysRep is Int the programmer can't (normally/easily) do arithmetic on SNos.

Could you please clarify: in TTM that A is an attribute name. Surely you don't want attribute name as part of the type?

I'm going to violently disagree with your catalogue of types below, so we're maybe talking at cross purposes; but yes definitely I want Attribute name as part of the type structure -- I'll leave moot whether "part of the type". An Attribute name is not a value-inhabited (component of) a type. An Attribute name 'applied' to a value-inhabited type by pairing as in <A, T> is perhaps better thought of as a type generator in same sense as TUP{ }.

I want REL{TUP{ SNo 3, PNo 3, Qty 3 }} to have a PhysRep as a vector of three Ints, each of value 3, each type-distinct such that it's a type error to compare them or try to do arithmetic between them. Then the technical term for the Attribute name is 'Phantom type'.

  • Type SName is similarly a first-class <SName, CHAR>, PhysRep CHAR.
  • Status is an enumeration, whose PhysRep is Int but only values 10, 20, 30, etc are accessible (probably with meaningful names that are not necessarily in alphabetic order of magnitude of the underlying Int).
  • Supplier City has PhysRep CHAR; is a restricted set of values; but not an enumeration because the possible values change over time; shares the same set of values as Parts City; but is statically type-distinct from Parts City, so there's no 'accidental' JOIN ON CITY with S JOIN P.

I'm guessing that for the PhysReps, the familiar range of system-supplied types is adequate; as it is in fact in Haskell, despite all the fancy type-superstructures and polymorphism it supports. So chiefly the 'type system' is a veneer to make sure you don't mis-treat values of the same PhysRep as 'same meaning' when they aren't. So I'm agreeing with (some of) TTM's objectives, but reacting against what seems to me excessive circumlocution/a great deal more than veneer.

IMO there are 9 base types: bool, int, real, decimal, text, time, binary, enum, struct. The first 7 have obvious low level representations (binary is a string of bytes rather than characters). Enum is a short string, but can be compressed to a small integer lookup on a table.

I don't think there's any sense in which enum is a 'base type'. It's implemented as one of the other types (Int or Text); it's the language semantic's/compiler's job to map between appearance in the language vs PhysRep. Essentially by the same mechanism. It might be implemented as a renaming of the base type. [That concept isn't specific to Haskell, it's borrowed from ML; sorry I can't find a better reference at short notice.] Note type renaming is a distinct concept to 'type alias' that I've been using before: an alias is just a shorthand/can be completely replaced by the full form; a renaming gives a distinct name (therefore distinct type in a nominative system), but the same set of values with the same PhysRep, and the same implementation of operations.

Struct is a UDT, recursively composed of any of the 9 base types.

struct is different in nature. You've left out: function types; phantom types; renamed types; parametric polymorphic types; overloaded/constrained types. struct and all those others are different in nature because they're composed from or built over the base types.

With these base types you can store (almost) anything. In particular you can do units of measure or currency nicely as a struct of real and enum. There are situations where it would be nice to have union types, and for TTM you need tuple types, and they do need to store differently from the others. So maybe there are 10 or 11. But no more.

On top of this you need user-defined type names to distinguish types that are physically the same but need to be kept distinct to a greater or lesser degree since they are intended to be used differently.Think: email type, URL type, a survey int type with a range of 0-9.

You might be trying to say type renamings.

The major compiled languages support most of these but not (a) a usable enum (b) any kind of type aliasing (c) union types (d) tuple types (but you can fake it).

Haskell has enums; both type aliases and type renaming; tagged unions. Hugs/Trex extension has tuple types; ghc/PolyKinds extension (and friends) supports faking tuple types, with a type-level String/name phantom type.

Then I'm not saying such a system is TTM-ish. I'm not saying it can be supported by any current/common language. I'm not even saying it's coherent and consistent -- I don't think I'll know until there's plenty of flesh on the bones.

I've seen each of the bits of what I've described in various languages; but not all of them at the same time in one language. Perhaps putting them all together leads to incoherence? Then how/why?

At least the target is not vast. All those types are known quantities, but languages to support them are in short supply.

 

Quote from Dave Voorhis on May 10, 2020, 9:03 am
Quote from dandl on May 10, 2020, 1:09 am

The trick is a static tuple/relation type that obeys TTM semantics.

There is nothing dynamic going on here.

The contents of the Heading array can only be referenced dynamically, yes?

Why are there are two separate but parallel naming schemes, one defined by a string array -- Heading -- which is notionally static but can only be referenced dynamically, and one defined statically by read-only properties?

The point of this entire exercise is to demonstrate that we don't need a new D, a modern GP language can do it all provided it has (a) good support for value types (b) a good enough tuple type. I think that point is made.

So now we're arguing over how best to do it, which is just fine. I made some choices in the details of how to do this implementation because I wanted to avoid reflection. I wanted to focus on the static view, what the compiler or compiler extension sees, and so the machinery is all very transparent. The point is, all you need for a tuple type is 3 things:

  1. A heading (set of strings)
  2. Typed getters and ctor, matching the heading
  3. Machinery for generic RA algorithms to get attribute values.

The mechanism I chose does that. There are other ways, some might even be better.

Why use read-only properties at all, given that tuples are immutable?

TTM requires getters (give or take that bad wording in RM Pre 6). Why not?

Can't you just define a tuple type like this?

public class TupP : TupleBase {
    public const string Pno;
    public const string Pname;
    public const string Color;
    public const decimal Weight;
    public const string City;
    public TupP(string pno, string pname, string color, decimal weight, string city) {
       Pno = pno; Pname = pname; Color = color; Weight = weight; City = city;
    }
}

Possibly. Now you need reflection to construct the heading, and you need more reflection to do the generic getting of values. Could have its advantages, wasn't my choice.

All these type declarations are static, instantiated at compile time as generics. All the type usage is fully type checked, both in the editor and at compile time. All Pre semantics are obeyed. The only thing missing here is detecting errors across headings, or between headings and declared types.

If you dispense with the Heading array, doesn't that limitation go away?

And what obligates Heading to be there? Is it defined in TupleBase as protected? If so, what happens if you forget to assign it in TupleBase derivations?

No, it's not protected, in fact it's not defined in the base at all due to a a limitation of C# (you can't inherit statics). It's now a static property of the derived tuple class. And what actually happens is it will fail pretty early at runtime if you don't set it.

Also, where have you declared the types of SNo, SName, Status, and Supplier City?

They were in an earlier post. Here is some complete code. It's still got rough edges, but it's getting better.

As you can see the executable code and data are nice and compact and easy to read. The class definitions are verbose. They look a lot like ordinary POCOs, with the types declared in the ctor and getters. These are the only bits that might be generated.

Yes, the class definitions seem unduly complex when ideally you should be able to just declare a tuple value type, per my example above.

My aim was to show that it is at least possible. To my mind the next step is define what you would like the code to look like, which involves trade-offs between a number of factors. Somewhat to my surprise it turns out that expressing a series of RA operations as a series of type conversions works quite well.Who knew that a rename or projection is just a conversion from this type to that type? It doesn't say that in TTM, and it's a long way from SQL.

The down side is the need to define way more types than any of us are used to. Select and Union are OK, but all the rest need their own type declarations, and they're quite labour intensive. The only extension I see needed is one to analyse RA operations and generate the types, and I'm not that sure how that would work.

BTW this is for C#. I think Java generics are not quite as good, but it should be equally do-able. But maybe there are other languages (FP, anyone?) which would make it easier.

Andl - A New Database Language - andl.org
Quote from dandl on May 11, 2020, 8:57 am
Quote from Dave Voorhis on May 10, 2020, 9:03 am
Quote from dandl on May 10, 2020, 1:09 am

The trick is a static tuple/relation type that obeys TTM semantics.

There is nothing dynamic going on here.

The contents of the Heading array can only be referenced dynamically, yes?

Why are there are two separate but parallel naming schemes, one defined by a string array -- Heading -- which is notionally static but can only be referenced dynamically, and one defined statically by read-only properties?

The point of this entire exercise is to demonstrate that we don't need a new D, a modern GP language can do it all provided it has (a) good support for value types (b) a good enough tuple type. I think that point is made.

I think that point was made when the first relational core library was implemented in a released D. Closed source, that would have been an early C#-based implementation -- Alphora D4 -- or perhaps RAQUEL from even earlier, if you consider RAQUEL to be a D. The first open source example would probably be Rel, with its Java-based library. Unless DuroDBMS preceded it?

I'm not sure of the chronology here.

So now we're arguing over how best to do it, which is just fine.

Not arguing, just an academic discussion.

To me it seems that there are various ways of achieving the same end, i.e., a TTM-compliant library. In the usual general-purpose languages, it will inevitably involve some mix of static compiler-based code validation and dynamic run-time (though as early as possible) code validation. The only conceptual difference between various implementation strategies is how much can be static compiler-based validation vs dynamic run-time validation, and of course what things specifically get generated (if necessary) and/or validated.

But all along I've found your examples to have what seems like more code than is needed -- like the Heading array(s) -- hence my asking.

I made some choices in the details of how to do this implementation because I wanted to avoid reflection. I wanted to focus on the static view, what the compiler or compiler extension sees, and so the machinery is all very transparent. The point is, all you need for a tuple type is 3 things:

  1. A heading (set of strings)
  2. Typed getters and ctor, matching the heading
  3. Machinery for generic RA algorithms to get attribute values.

The mechanism I chose does that. There are other ways, some might even be better.

Why use read-only properties at all, given that tuples are immutable?

TTM requires getters (give or take that bad wording in RM Pre 6). Why not?

I guess... Though the use of a property to allow the syntax of accessing an attribute a of tuple t to be t.x, when without the property the syntax would be t.x, seems rather redundant.

Though I have issues with C# properties in general. As a mechanism to invisibly convert badly-written legacy code with direct assignment and retrieval of public member variables that should be private?

Fine.

As a general way of expressing getters and setters without getter/setter syntax?

Questionable.

Can't you just define a tuple type like this?

public class TupP : TupleBase {
    public const string Pno;
    public const string Pname;
    public const string Color;
    public const decimal Weight;
    public const string City;
    public TupP(string pno, string pname, string color, decimal weight, string city) {
       Pno = pno; Pname = pname; Color = color; Weight = weight; City = city;
    }
}

Possibly. Now you need reflection to construct the heading, and you need more reflection to do the generic getting of values. Could have its advantages, wasn't my choice.

All these type declarations are static, instantiated at compile time as generics. All the type usage is fully type checked, both in the editor and at compile time. All Pre semantics are obeyed. The only thing missing here is detecting errors across headings, or between headings and declared types.

If you dispense with the Heading array, doesn't that limitation go away?

And what obligates Heading to be there? Is it defined in TupleBase as protected? If so, what happens if you forget to assign it in TupleBase derivations?

No, it's not protected, in fact it's not defined in the base at all due to a a limitation of C# (you can't inherit statics). It's now a static property of the derived tuple class. And what actually happens is it will fail pretty early at runtime if you don't set it.

Also, where have you declared the types of SNo, SName, Status, and Supplier City?

They were in an earlier post. Here is some complete code. It's still got rough edges, but it's getting better.

As you can see the executable code and data are nice and compact and easy to read. The class definitions are verbose. They look a lot like ordinary POCOs, with the types declared in the ctor and getters. These are the only bits that might be generated.

Yes, the class definitions seem unduly complex when ideally you should be able to just declare a tuple value type, per my example above.

My aim was to show that it is at least possible. To my mind the next step is define what you would like the code to look like, which involves trade-offs between a number of factors. Somewhat to my surprise it turns out that expressing a series of RA operations as a series of type conversions works quite well.Who knew that a rename or projection is just a conversion from this type to that type? It doesn't say that in TTM, and it's a long way from SQL.

I think it's fairly self-evident and unsurprising. Given VAR S REAL RELATION {S#, SNAME, STATUS, CITY} KEY {S#}, and a projection S {STATUS, CITY}, what could the type of the result be except that which is represented by heading {STATUS, CITY}, which is a distinct type from that represented by heading {S#, SNAME, STATUS, CITY}?

Indeed, the static type checking machinery in Rel is based on this. It statically computes the declared type of every expression and subexpression based on the statically computable operand and return types for every operator.

This is a common approach for handling type checking in programming languages in general. I.e., there are two streams of expression evaluation: one to compute types (typically statically, during compilation, or at least prior to execution) and one to evaluate the values (dynamically, at run-time.)

Yes, it's very different from SQL, which is declarative and notionally a calculus with no operators or operands in the conventional imperative sense. Its type system, such as it is, reflects that.

At least, that's the case with SQL's query language. The usual procedural extensions -- PL/SQL and the like -- are, of course, conventional imperative languages.

The down side is the need to define way more types than any of us are used to. Select and Union are OK, but all the rest need their own type declarations, and they're quite labour intensive. The only extension I see needed is one to analyse RA operations and generate the types, and I'm not that sure how that would work.

BTW this is for C#. I think Java generics are not quite as good, but it should be equally do-able. But maybe there are other languages (FP, anyone?) which would make it easier.

I think from what you've shown, Java would be identical. I haven't seen anything that would be precluded or complicated by Java's type erasure semantics, though I haven't looked that closely. It tends to come up more often with Java containers in particular than Java generics in general, which are quite capable.

The real problem is, as you've noted, is the difficulty with having to create types or resort to greater -- for lack of a better word -- dynamic-ness. That's what precludes practically using the usual popular general-purpose languages in the spirit of a D, even if they meet the letter of a D. Inevitably, it either means giving up some D compliance, or wrapping the library in another language to achieve better usability -- at least from a pure programming point of view -- even if it means giving up much of the conventional popular general-purpose language's environment.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Could you please clarify: in TTM that A is an attribute name. Surely you don't want attribute name as part of the type?

I'm going to violently disagree with your catalogue of types below, so we're maybe talking at cross purposes; but yes definitely I want Attribute name as part of the type structure -- I'll leave moot whether "part of the type". An Attribute name is not a value-inhabited (component of) a type. An Attribute name 'applied' to a value-inhabited type by pairing as in <A, T> is perhaps better thought of as a type generator in same sense as TUP{ }.

I want REL{TUP{ SNo 3, PNo 3, Qty 3 }} to have a PhysRep as a vector of three Ints, each of value 3, each type-distinct such that it's a type error to compare them or try to do arithmetic between them. Then the technical term for the Attribute name is 'Phantom type'.

Maybe that works, but what about the attributes Qty, Max-qty, MOQ-qty, previous-qty, etc. Are they each an individual type? If not, what are they?

  • Type SName is similarly a first-class <SName, CHAR>, PhysRep CHAR.
  • Status is an enumeration, whose PhysRep is Int but only values 10, 20, 30, etc are accessible (probably with meaningful names that are not necessarily in alphabetic order of magnitude of the underlying Int).
  • Supplier City has PhysRep CHAR; is a restricted set of values; but not an enumeration because the possible values change over time; shares the same set of values as Parts City; but is statically type-distinct from Parts City, so there's no 'accidental' JOIN ON CITY with S JOIN P.

I'm guessing that for the PhysReps, the familiar range of system-supplied types is adequate; as it is in fact in Haskell, despite all the fancy type-superstructures and polymorphism it supports. So chiefly the 'type system' is a veneer to make sure you don't mis-treat values of the same PhysRep as 'same meaning' when they aren't. So I'm agreeing with (some of) TTM's objectives, but reacting against what seems to me excessive circumlocution/a great deal more than veneer.

IMO there are 9 base types: bool, int, real, decimal, text, time, binary, enum, struct. The first 7 have obvious low level representations (binary is a string of bytes rather than characters). Enum is a short string, but can be compressed to a small integer lookup on a table.

I don't think there's any sense in which enum is a 'base type'. It's implemented as one of the other types (Int or Text); it's the language semantic's/compiler's job to map between appearance in the language vs PhysRep. Essentially by the same mechanism. It might be implemented as a renaming of the base type. [That concept isn't specific to Haskell, it's borrowed from ML; sorry I can't find a better reference at short notice.] Note type renaming is a distinct concept to 'type alias' that I've been using before: an alias is just a shorthand/can be completely replaced by the full form; a renaming gives a distinct name (therefore distinct type in a nominative system), but the same set of values with the same PhysRep, and the same implementation of operations.

Read that as 'base storage type' if you prefer. The reason enum is a type is that you can store an enum as a tiny int, but only if you know the range of values in advance. It converts to text, it is ordered by its position in the list. It should be a language type error to compare it to something that is not a member of the type. There is enough, IMO.

Struct is a UDT, recursively composed of any of the 9 base types.

struct is different in nature. You've left out: function types; phantom types; renamed types; parametric polymorphic types; overloaded/constrained types. struct and all those others are different in nature because they're composed from or built over the base types.

Again, struct is a storage type. The storage is precisely the aggregate of its components, or something more compact if the storage engine so chooses.

Function types is not easily seen as a storage type. Worth further debate. The rest are most certainly not storage types. They are the layer a language imposes over the top.

With these base types you can store (almost) anything. In particular you can do units of measure or currency nicely as a struct of real and enum. There are situations where it would be nice to have union types, and for TTM you need tuple types, and they do need to store differently from the others. So maybe there are 10 or 11. But no more.

On top of this you need user-defined type names to distinguish types that are physically the same but need to be kept distinct to a greater or lesser degree since they are intended to be used differently.Think: email type, URL type, a survey int type with a range of 0-9.

You might be trying to say type renamings.

No, that isn't enough. A URL is text for storage purposes, but only some text strings are valid URLs. Again, a layer above storage has to deal with that somehow. One possible path:

  • the database knows about storage types, no more
  • the database commits to a language (a D)
  • the catalog records type definitions as predefined patterns (email, URL, JPG, currency, etc)
    • each based on a storage type
    • these are type constraints as per TTM, written in D code
  • programs written in D can use those type definitions natively
  • others can use the values but only relative to the storage types (which they will see as native)

The major compiled languages support most of these but not (a) a usable enum (b) any kind of type aliasing (c) union types (d) tuple types (but you can fake it).

Haskell has enums; both type aliases and type renaming; tagged unions. Hugs/Trex extension has tuple types; ghc/PolyKinds extension (and friends) supports faking tuple types, with a type-level String/name phantom type.

See above. If Haskell is D for the database it gets all types natively, if not it gets storage types natively with best efforts fit.

Then I'm not saying such a system is TTM-ish. I'm not saying it can be supported by any current/common language. I'm not even saying it's coherent and consistent -- I don't think I'll know until there's plenty of flesh on the bones.

I've seen each of the bits of what I've described in various languages; but not all of them at the same time in one language. Perhaps putting them all together leads to incoherence? Then how/why?

At least the target is not vast. All those types are known quantities, but languages to support them are in short supply.

 

 

Andl - A New Database Language - andl.org

I made some choices in the details of how to do this implementation because I wanted to avoid reflection. I wanted to focus on the static view, what the compiler or compiler extension sees, and so the machinery is all very transparent. The point is, all you need for a tuple type is 3 things:

  1. A heading (set of strings)
  2. Typed getters and ctor, matching the heading
  3. Machinery for generic RA algorithms to get attribute values.

The mechanism I chose does that. There are other ways, some might even be better.

Why use read-only properties at all, given that tuples are immutable?

TTM requires getters (give or take that bad wording in RM Pre 6). Why not?

I guess... Though the use of a property to allow the syntax of accessing an attribute a of tuple t to be t.x, when without the property the syntax would be t.x, seems rather redundant.

It's a getter property because (a) this is idiomatic in C# and (b) I chose to store the data as an array of object, to avoid reflection.

Though I have issues with C# properties in general. As a mechanism to invisibly convert badly-written legacy code with direct assignment and retrieval of public member variables that should be private?

Fine.

As a general way of expressing getters and setters without getter/setter syntax?

Questionable.

This is getter/setter syntax in C#. I dislike the JavaBeans convention, and I note that the authors of JEP 359 agree with me (https://openjdk.java.net/jeps/359):

 

  • A private final field for each component of the state description;

  • A public read accessor method for each component of the state description, with the same name and type as the component;

That precisely describes a C# property.

 

Yes, the class definitions seem unduly complex when ideally you should be able to just declare a tuple value type, per my example above.

Yes, just like a value type declared in Java without Record, or a collection before there was generics. Help from the compiler makes a big difference.

I should have mentioned before that while it's technically possible to avoid having to declare a heading by using reflection (a) reflection in C# cannot get the order of the fields (b) reflection at field level is a serious performance hit, whereas copying object values around is very fast.

My aim was to show that it is at least possible. To my mind the next step is define what you would like the code to look like, which involves trade-offs between a number of factors. Somewhat to my surprise it turns out that expressing a series of RA operations as a series of type conversions works quite well.Who knew that a rename or projection is just a conversion from this type to that type? It doesn't say that in TTM, and it's a long way from SQL.

I think it's fairly self-evident and unsurprising. Given VAR S REAL RELATION {S#, SNAME, STATUS, CITY} KEY {S#}, and a projection S {STATUS, CITY}, what could the type of the result be except that which is represented by heading {STATUS, CITY}, which is a distinct type from that represented by heading {S#, SNAME, STATUS, CITY}?

Indeed, the static type checking machinery in Rel is based on this. It statically computes the declared type of every expression and subexpression based on the statically computable operand and return types for every operator.

Of course you know that. It just isn't a widely known fact that RA project is equivalent to a type coercion.

The real problem is, as you've noted, is the difficulty with having to create types or resort to greater -- for lack of a better word -- dynamic-ness. That's what precludes practically using the usual popular general-purpose languages in the spirit of a D, even if they meet the letter of a D. Inevitably, it either means giving up some D compliance, or wrapping the library in another language to achieve better usability -- at least from a pure programming point of view -- even if it means giving up much of the conventional popular general-purpose language's environment.

At the risk of repeating myself, the question at issue is: what would it take for a GP language to qualify as a fully compliant D? The answer is clearly: not a whole lot.We can get most of the way using just standard features like generics and reflection, but to go the whole way? Roughly the equivalent of adding anonymous classes to C# in support of LINQ, or adding Record to Java. All you need is two things:

  • a way to declare a tuple class, with the heading and field accessing machinery generated by the compiler
  • a way to annotate an RA operation with the desired heading, and have the compiler either pull in a known tuple type or generate one to order.

As compared to creating a new full stack language from scratch, that's at least 2-3 orders of magnitude less effort.

Andl - A New Database Language - andl.org
PreviousPage 3 of 4Next