The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Some issues with TTM

PreviousPage 4 of 4
Quote from dandl on May 12, 2020, 5:08 am

I made some choices in the details of how to do this implementation because I wanted to avoid reflection. I wanted to focus on the static view, what the compiler or compiler extension sees, and so the machinery is all very transparent. The point is, all you need for a tuple type is 3 things:

  1. A heading (set of strings)
  2. Typed getters and ctor, matching the heading
  3. Machinery for generic RA algorithms to get attribute values.

The mechanism I chose does that. There are other ways, some might even be better.

Why use read-only properties at all, given that tuples are immutable?

TTM requires getters (give or take that bad wording in RM Pre 6). Why not?

I guess... Though the use of a property to allow the syntax of accessing an attribute a of tuple t to be t.x, when without the property the syntax would be t.x, seems rather redundant.

It's a getter property because (a) this is idiomatic in C# and (b) I chose to store the data as an array of object, to avoid reflection.

Though I have issues with C# properties in general. As a mechanism to invisibly convert badly-written legacy code with direct assignment and retrieval of public member variables that should be private?

Fine.

As a general way of expressing getters and setters without getter/setter syntax?

Questionable.

This is getter/setter syntax in C#. I dislike the JavaBeans convention, and I note that the authors of JEP 359 agree with me (https://openjdk.java.net/jeps/359):

 

  • A private final field for each component of the state description;

  • A public read accessor method for each component of the state description, with the same name and type as the component;

That precisely describes a C# property.

I appreciate the use of actual getters and using a method-invocation syntax on them, e.g., x.geLastname() or x.lastname() or whatever. What I find a tad baffling is the C# property approach of defining a read-only property for lastname so that you can write x.lastname, which is exactly the same thing you would write without having the property at all. So what does the property gain you except a small amount of additional verbosity?

Yes, the class definitions seem unduly complex when ideally you should be able to just declare a tuple value type, per my example above.

Yes, just like a value type declared in Java without Record, or a collection before there was generics. Help from the compiler makes a big difference.

I should have mentioned before that while it's technically possible to avoid having to declare a heading by using reflection (a) reflection in C# cannot get the order of the fields (b) reflection at field level is a serious performance hit, whereas copying object values around is very fast.

For things like this, reflection normally gets used in a setup stage, but after using it when (say) retrieving the first record of n records, you cache the field references so that subsequent operations on n - 1 records no longer use reflection or at least don't do reflective look-ups.

Have you compared performance on various sizes of record set using a reflective approach vs a non-reflective approach?

My aim was to show that it is at least possible. To my mind the next step is define what you would like the code to look like, which involves trade-offs between a number of factors. Somewhat to my surprise it turns out that expressing a series of RA operations as a series of type conversions works quite well.Who knew that a rename or projection is just a conversion from this type to that type? It doesn't say that in TTM, and it's a long way from SQL.

I think it's fairly self-evident and unsurprising. Given VAR S REAL RELATION {S#, SNAME, STATUS, CITY} KEY {S#}, and a projection S {STATUS, CITY}, what could the type of the result be except that which is represented by heading {STATUS, CITY}, which is a distinct type from that represented by heading {S#, SNAME, STATUS, CITY}?

Indeed, the static type checking machinery in Rel is based on this. It statically computes the declared type of every expression and subexpression based on the statically computable operand and return types for every operator.

Of course you know that. It just isn't a widely known fact that RA project is equivalent to a type coercion.

I suppose...

The real problem is, as you've noted, is the difficulty with having to create types or resort to greater -- for lack of a better word -- dynamic-ness. That's what precludes practically using the usual popular general-purpose languages in the spirit of a D, even if they meet the letter of a D. Inevitably, it either means giving up some D compliance, or wrapping the library in another language to achieve better usability -- at least from a pure programming point of view -- even if it means giving up much of the conventional popular general-purpose language's environment.

At the risk of repeating myself, the question at issue is: what would it take for a GP language to qualify as a fully compliant D? The answer is clearly: not a whole lot.We can get most of the way using just standard features like generics and reflection, but to go the whole way? Roughly the equivalent of adding anonymous classes to C# in support of LINQ, or adding Record to Java. All you need is two things:

  • a way to declare a tuple class, with the heading and field accessing machinery generated by the compiler
  • a way to annotate an RA operation with the desired heading, and have the compiler either pull in a known tuple type or generate one to order.

As compared to creating a new full stack language from scratch, that's at least 2-3 orders of magnitude less effort.

All fine if you (a) like the general purpose language; and (b) only need a query engine without a transactional storage engine.

The latter can certainly be added to/as a library.

The former represents a more complex issue. If you like the usual gaggle of general purpose programming languages, then D-ifiying one or more of them is a solution.

However, if you see D -- or for that matter, any non-mainstream or new language -- as an opportunity to extend the state-of-the-art and create something better, than merely D-ifiying C# or Java or Python or C++ isn't good enough. Then it's not just about implementing the relational model and adhering to the pre/pro-scriptions, it's about RM Pre 26 and your fundamental belief that C# and Java and Python and C++ all violate it.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on May 12, 2020, 10:39 am
Quote from dandl on May 12, 2020, 5:08 am

I made some choices in the details of how to do this implementation because I wanted to avoid reflection. I wanted to focus on the static view, what the compiler or compiler extension sees, and so the machinery is all very transparent. The point is, all you need for a tuple type is 3 things:

  1. A heading (set of strings)
  2. Typed getters and ctor, matching the heading
  3. Machinery for generic RA algorithms to get attribute values.

The mechanism I chose does that. There are other ways, some might even be better.

Why use read-only properties at all, given that tuples are immutable?

TTM requires getters (give or take that bad wording in RM Pre 6). Why not?

I guess... Though the use of a property to allow the syntax of accessing an attribute a of tuple t to be t.x, when without the property the syntax would be t.x, seems rather redundant.

It's a getter property because (a) this is idiomatic in C# and (b) I chose to store the data as an array of object, to avoid reflection.

Though I have issues with C# properties in general. As a mechanism to invisibly convert badly-written legacy code with direct assignment and retrieval of public member variables that should be private?

Fine.

As a general way of expressing getters and setters without getter/setter syntax?

Questionable.

This is getter/setter syntax in C#. I dislike the JavaBeans convention, and I note that the authors of JEP 359 agree with me (https://openjdk.java.net/jeps/359):

 

  • A private final field for each component of the state description;

  • A public read accessor method for each component of the state description, with the same name and type as the component;

That precisely describes a C# property.

I appreciate the use of actual getters and using a method-invocation syntax on them, e.g., x.geLastname() or x.lastname() or whatever. What I find a tad baffling is the C# property approach of defining a read-only property for lastname so that you can write x.lastname, which is exactly the same thing you would write without having the property at all. So what does the property gain you except a small amount of additional verbosity?

Quite a lot actually. I really dislike the added verbosity of method call syntax, especially the getXxx() of Beans, but calls rather than direct access to fields give you a lot more control over access, inheritance, associated computations. I find it perfectly natural, and the Beans approach distasteful. Seems the authors of JEP 359 are with me.

Yes, the class definitions seem unduly complex when ideally you should be able to just declare a tuple value type, per my example above.

Yes, just like a value type declared in Java without Record, or a collection before there was generics. Help from the compiler makes a big difference.

I should have mentioned before that while it's technically possible to avoid having to declare a heading by using reflection (a) reflection in C# cannot get the order of the fields (b) reflection at field level is a serious performance hit, whereas copying object values around is very fast.

For things like this, reflection normally gets used in a setup stage, but after using it when (say) retrieving the first record of n records, you cache the field references so that subsequent operations on n - 1 records no longer use reflection or at least don't do reflective look-ups.

Have you compared performance on various sizes of record set using a reflective approach vs a non-reflective approach?

No, and I don't plan to. I am willing to accept reflection in the set up phase (but mindful this is all 'unsafe', in the sense the compiler has no idea what you're doing). I am reluctant to use reflection to move a value from tuple A to tuple B (two calls per value, per row, per RA step) when there are good alternatives. I would expect a speed penalty of the order of x10.

The real problem is, as you've noted, is the difficulty with having to create types or resort to greater -- for lack of a better word -- dynamic-ness. That's what precludes practically using the usual popular general-purpose languages in the spirit of a D, even if they meet the letter of a D. Inevitably, it either means giving up some D compliance, or wrapping the library in another language to achieve better usability -- at least from a pure programming point of view -- even if it means giving up much of the conventional popular general-purpose language's environment.

At the risk of repeating myself, the question at issue is: what would it take for a GP language to qualify as a fully compliant D? The answer is clearly: not a whole lot.We can get most of the way using just standard features like generics and reflection, but to go the whole way? Roughly the equivalent of adding anonymous classes to C# in support of LINQ, or adding Record to Java. All you need is two things:

  • a way to declare a tuple class, with the heading and field accessing machinery generated by the compiler
  • a way to annotate an RA operation with the desired heading, and have the compiler either pull in a known tuple type or generate one to order.

As compared to creating a new full stack language from scratch, that's at least 2-3 orders of magnitude less effort.

All fine if you (a) like the general purpose language; and (b) only need a query engine without a transactional storage engine.

The latter can certainly be added to/as a library.

I've already said the main area of application is where SQL is not available or can't do what you want. No, I'm not a fan of the GP language that does everything; I've long argued for using DSLs wherever possible. But if you're going to have a DSL you need (a) tools to make creating a DSL easy and (b) a really clear idea of how it talks to its host. TTM/D as it stands does neither: it's way too big and complex to knock off a quick DSL in a couple of days, and it discourages being seen as a sub-language. If a DSL to replace SQL queries is what you need, then TTM may lead you well astray. The prescriptions for that would be quite a bit different.

The former represents a more complex issue. If you like the usual gaggle of general purpose programming languages, then D-ifiying one or more of them is a solution.

However, if you see D -- or for that matter, any non-mainstream or new language -- as an opportunity to extend the state-of-the-art and create something better, than merely D-ifiying C# or Java or Python or C++ isn't good enough. Then it's not just about implementing the relational model and adhering to the pre/pro-scriptions, it's about RM Pre 26 and your fundamental belief that C# and Java and Python and C++ all violate it.

If you take a holistic view of TTM/D, what it argues for is an industrial strength GP language built around a model of value types. In that context, the tuple type feature and the RA feature are relatively minor players. Yes, setting out to add D-ness to an OO language is really an ambit play: I've shown that it's possible, not that it's a good idea.

It would be better to start with a strong FP language, already built on value types, that already has a strong community, a good infrastructure and thereby a claim to RM Pre 26. Then add a tuple type to that, and you're perhaps on the way to a solid D. I've been pondering whether it's time to learn F# (or Scala or whatever), and that might be a good reason.

Andl - A New Database Language - andl.org
Quote from dandl on May 12, 2020, 11:21 am
Quote from Dave Voorhis on May 12, 2020, 10:39 am
Quote from dandl on May 12, 2020, 5:08 am

I made some choices in the details of how to do this implementation because I wanted to avoid reflection. I wanted to focus on the static view, what the compiler or compiler extension sees, and so the machinery is all very transparent. The point is, all you need for a tuple type is 3 things:

  1. A heading (set of strings)
  2. Typed getters and ctor, matching the heading
  3. Machinery for generic RA algorithms to get attribute values.

The mechanism I chose does that. There are other ways, some might even be better.

Why use read-only properties at all, given that tuples are immutable?

TTM requires getters (give or take that bad wording in RM Pre 6). Why not?

I guess... Though the use of a property to allow the syntax of accessing an attribute a of tuple t to be t.x, when without the property the syntax would be t.x, seems rather redundant.

It's a getter property because (a) this is idiomatic in C# and (b) I chose to store the data as an array of object, to avoid reflection.

Though I have issues with C# properties in general. As a mechanism to invisibly convert badly-written legacy code with direct assignment and retrieval of public member variables that should be private?

Fine.

As a general way of expressing getters and setters without getter/setter syntax?

Questionable.

This is getter/setter syntax in C#. I dislike the JavaBeans convention, and I note that the authors of JEP 359 agree with me (https://openjdk.java.net/jeps/359):

 

  • A private final field for each component of the state description;

  • A public read accessor method for each component of the state description, with the same name and type as the component;

That precisely describes a C# property.

I appreciate the use of actual getters and using a method-invocation syntax on them, e.g., x.geLastname() or x.lastname() or whatever. What I find a tad baffling is the C# property approach of defining a read-only property for lastname so that you can write x.lastname, which is exactly the same thing you would write without having the property at all. So what does the property gain you except a small amount of additional verbosity?

Quite a lot actually. I really dislike the added verbosity of method call syntax, especially the getXxx() of Beans, but calls rather than direct access to fields give you a lot more control over access, inheritance, associated computations.

Whether getters are verbose or not is a separate issue. What I'm referring to is the use of empty read properties to give you exactly the same syntax and semantics as not using a property at all. A neat thing about properties is that you can use public const member variables freely, and as soon as you need to filter or alter their values at point of read, then you can change the member variable to private const and wrap it in a property and the only code that changes is the referenced class.

In other words, you can start with no properties at all -- just read const member variables (with properties you can, if you want to, even *shudder* directly access writable member variables) -- and add properties only as needed.

That's how I thought properties were meant to work.

I had a colleague at my previous job who did the same thing as you: created gaggles of otherwise empty properties by default. I raised the same issue with him and only got a grumpy "it's the way you're supposed to do it!" which I didn't find very clarifying.

I find it perfectly natural, and the Beans approach distasteful. Seems the authors of JEP 359 are with me.

Java records auto-generate getters, which is quite different from properties (at least syntactically), per my explanation above.

Yes, the class definitions seem unduly complex when ideally you should be able to just declare a tuple value type, per my example above.

Yes, just like a value type declared in Java without Record, or a collection before there was generics. Help from the compiler makes a big difference.

I should have mentioned before that while it's technically possible to avoid having to declare a heading by using reflection (a) reflection in C# cannot get the order of the fields (b) reflection at field level is a serious performance hit, whereas copying object values around is very fast.

For things like this, reflection normally gets used in a setup stage, but after using it when (say) retrieving the first record of n records, you cache the field references so that subsequent operations on n - 1 records no longer use reflection or at least don't do reflective look-ups.

Have you compared performance on various sizes of record set using a reflective approach vs a non-reflective approach?

No, and I don't plan to. I am willing to accept reflection in the set up phase (but mindful this is all 'unsafe', in the sense the compiler has no idea what you're doing). I am reluctant to use reflection to move a value from tuple A to tuple B (two calls per value, per row, per RA step) when there are good alternatives. I would expect a speed penalty of the order of x10.

It involves a level of indirection even after you've obtained the field reference, but I'm not sure it's that slow. Maybe it is. I'll have to try it in Java and C# and do some comparisons.

The real problem is, as you've noted, is the difficulty with having to create types or resort to greater -- for lack of a better word -- dynamic-ness. That's what precludes practically using the usual popular general-purpose languages in the spirit of a D, even if they meet the letter of a D. Inevitably, it either means giving up some D compliance, or wrapping the library in another language to achieve better usability -- at least from a pure programming point of view -- even if it means giving up much of the conventional popular general-purpose language's environment.

At the risk of repeating myself, the question at issue is: what would it take for a GP language to qualify as a fully compliant D? The answer is clearly: not a whole lot.We can get most of the way using just standard features like generics and reflection, but to go the whole way? Roughly the equivalent of adding anonymous classes to C# in support of LINQ, or adding Record to Java. All you need is two things:

  • a way to declare a tuple class, with the heading and field accessing machinery generated by the compiler
  • a way to annotate an RA operation with the desired heading, and have the compiler either pull in a known tuple type or generate one to order.

As compared to creating a new full stack language from scratch, that's at least 2-3 orders of magnitude less effort.

All fine if you (a) like the general purpose language; and (b) only need a query engine without a transactional storage engine.

The latter can certainly be added to/as a library.

I've already said the main area of application is where SQL is not available or can't do what you want. No, I'm not a fan of the GP language that does everything; I've long argued for using DSLs wherever possible. But if you're going to have a DSL you need (a) tools to make creating a DSL easy and (b) a really clear idea of how it talks to its host. TTM/D as it stands does neither: it's way too big and complex to knock off a quick DSL in a couple of days, and it discourages being seen as a sub-language. If a DSL to replace SQL queries is what you need, then TTM may lead you well astray. The prescriptions for that would be quite a bit different.

It's an interesting use case where you need something like SQL queries, don't have SQL, and it's all read-only or at least non-transactional non-concurrent writes. I'm not sure TTM is necessarily the answer, but I'm not sure the relational model in general is the right answer, either.

The former represents a more complex issue. If you like the usual gaggle of general purpose programming languages, then D-ifiying one or more of them is a solution.

However, if you see D -- or for that matter, any non-mainstream or new language -- as an opportunity to extend the state-of-the-art and create something better, than merely D-ifiying C# or Java or Python or C++ isn't good enough. Then it's not just about implementing the relational model and adhering to the pre/pro-scriptions, it's about RM Pre 26 and your fundamental belief that C# and Java and Python and C++ all violate it.

If you take a holistic view of TTM/D, what it argues for is an industrial strength GP language built around a model of value types. In that context, the tuple type feature and the RA feature are relatively minor players. Yes, setting out to add D-ness to an OO language is really an ambit play: I've shown that it's possible, not that it's a good idea.

It would be better to start with a strong FP language, already built on value types, that already has a strong community, a good infrastructure and thereby a claim to RM Pre 26. Then add a tuple type to that, and you're perhaps on the way to a solid D. I've been pondering whether it's time to learn F# (or Scala or whatever), and that might be a good reason.

The answer to the question you haven't asked is unquestionably Haskell.

Or Lisp. :-)

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Whether getters are verbose or not is a separate issue. What I'm referring to is the use of empty read properties to give you exactly the same syntax and semantics as not using a property at all. A neat thing about properties is that you can use public const member variables freely, and as soon as you need to filter or alter their values at point of read, then you can change the member variable to private const and wrap it in a property and the only code that changes is the referenced class.

In other words, you can start with no properties at all -- just read const member variables (with properties you can, if you want to, even *shudder* directly access writable member variables) -- and add properties only as needed.

That's how I thought properties were meant to work.

You lost me. Yes, to a large extent C# properties support the same caller syntax as non-private fields, but  otherwise both the syntax and semantics are different.If you start out with fields and change them to properties then of course you change the API, so the caller has to know and be recompiled. If you are using reflection that has to change too.

For me, the reason to use properties was of course the ability to store data as object for generic access but expose it as a typed value. Good fit.

I had a colleague at my previous job who did the same thing as you: created gaggles of otherwise empty properties by default. I raised the same issue with him and only got a grumpy "it's the way you're supposed to do it!" which I didn't find very clarifying.

It's a good question and deserved a better answer. I think you asked the wrong person.

My take is to use:

  • non-private fields only for simple value types, with the implicit guarantee nothing will ever change, no exceptions, no code will ever execute
  • properties for all accessible values, with the implicit guarantee that the behaviour is field-like (low cost, barring exceptions what you write is what you read)
  • methods for anything else (possible high cost operation, possible state change beyond the obvious, possible behaviour dependent on state).

Java has no way to distinguish those last two other than a naming convention.

I find it perfectly natural, and the Beans approach distasteful. Seems the authors of JEP 359 are with me.

Java records auto-generate getters, which is quite different from properties (at least syntactically), per my explanation above.

Java limitation. If Java had properties, I'm sure Records would auto-generate them.

Have you compared performance on various sizes of record set using a reflective approach vs a non-reflective approach?

No, and I don't plan to. I am willing to accept reflection in the set up phase (but mindful this is all 'unsafe', in the sense the compiler has no idea what you're doing). I am reluctant to use reflection to move a value from tuple A to tuple B (two calls per value, per row, per RA step) when there are good alternatives. I would expect a speed penalty of the order of x10.

It involves a level of indirection even after you've obtained the field reference, but I'm not sure it's that slow. Maybe it is. I'll have to try it in Java and C# and do some comparisons.

The C# code for moving objects around is pure assignment. To move between typed fields you need a FieldInfo.GetValue() and a FieldInfo.SetValue. I checked the source code: there are a few lines of wrapper code before it disappears into the CLR. I'm not familiar enough to navigate that, but I think that will be all native C++ code from there down. Best of luck!

The real problem is, as you've noted, is the difficulty with having to create types or resort to greater -- for lack of a better word -- dynamic-ness. That's what precludes practically using the usual popular general-purpose languages in the spirit of a D, even if they meet the letter of a D. Inevitably, it either means giving up some D compliance, or wrapping the library in another language to achieve better usability -- at least from a pure programming point of view -- even if it means giving up much of the conventional popular general-purpose language's environment.

At the risk of repeating myself, the question at issue is: what would it take for a GP language to qualify as a fully compliant D? The answer is clearly: not a whole lot.We can get most of the way using just standard features like generics and reflection, but to go the whole way? Roughly the equivalent of adding anonymous classes to C# in support of LINQ, or adding Record to Java. All you need is two things:

  • a way to declare a tuple class, with the heading and field accessing machinery generated by the compiler
  • a way to annotate an RA operation with the desired heading, and have the compiler either pull in a known tuple type or generate one to order.

As compared to creating a new full stack language from scratch, that's at least 2-3 orders of magnitude less effort.

All fine if you (a) like the general purpose language; and (b) only need a query engine without a transactional storage engine.

The latter can certainly be added to/as a library.

I've already said the main area of application is where SQL is not available or can't do what you want. No, I'm not a fan of the GP language that does everything; I've long argued for using DSLs wherever possible. But if you're going to have a DSL you need (a) tools to make creating a DSL easy and (b) a really clear idea of how it talks to its host. TTM/D as it stands does neither: it's way too big and complex to knock off a quick DSL in a couple of days, and it discourages being seen as a sub-language. If a DSL to replace SQL queries is what you need, then TTM may lead you well astray. The prescriptions for that would be quite a bit different.

It's an interesting use case where you need something like SQL queries, don't have SQL, and it's all read-only or at least non-transactional non-concurrent writes. I'm not sure TTM is necessarily the answer, but I'm not sure the relational model in general is the right answer, either.

As I said, I think the RA is good choice for when:

  1. You already have a bunch of flat tables (XLS, CSV, Web API) and no SQL
  2. You have a query too hard for SQL eg a statistical function aggregation or a computation/agregation on a user-defined type (say a vector or a matrix or one with units).

The former represents a more complex issue. If you like the usual gaggle of general purpose programming languages, then D-ifiying one or more of them is a solution.

However, if you see D -- or for that matter, any non-mainstream or new language -- as an opportunity to extend the state-of-the-art and create something better, than merely D-ifiying C# or Java or Python or C++ isn't good enough. Then it's not just about implementing the relational model and adhering to the pre/pro-scriptions, it's about RM Pre 26 and your fundamental belief that C# and Java and Python and C++ all violate it.

If you take a holistic view of TTM/D, what it argues for is an industrial strength GP language built around a model of value types. In that context, the tuple type feature and the RA feature are relatively minor players. Yes, setting out to add D-ness to an OO language is really an ambit play: I've shown that it's possible, not that it's a good idea.

It would be better to start with a strong FP language, already built on value types, that already has a strong community, a good infrastructure and thereby a claim to RM Pre 26. Then add a tuple type to that, and you're perhaps on the way to a solid D. I've been pondering whether it's time to learn F# (or Scala or whatever), and that might be a good reason.

The answer to the question you haven't asked is unquestionably Haskell.

Or Lisp. :-)

Not really. They don't adequately meet the RM Pre 26 and OO Pre 3, at least in terms of a sufficient community of users and infrastructure.

The obvious choices are:

  1. JVM: Scala, Kotlin
  2. C/C++ relatives: Rust, Dlang, Go
  3. CLR: F#
Andl - A New Database Language - andl.org
PreviousPage 4 of 4