The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

NoSQL and SQL and 'relational' -- a mess of disinformation

Quote from dandl on February 1, 2021, 10:14 am
Quote from Dave Voorhis on January 31, 2021, 3:58 pm
Quote from dandl on January 31, 2021, 9:59 am
Quote from Dave Voorhis on January 30, 2021, 2:46 pm
Quote from dandl on January 30, 2021, 1:31 pm

No, that isn't it. If it were possible to get 90% of the basic CRUD stuff generated automatically, it's really easy to add in a programming language for the other 10%. Why can't we get to 90%?

The 90% -- which is probably actually 80%, per the Pareto principle -- is easy in everything. It doesn't matter whether it's auto-generated or easy-to-write, it takes little time and effort.

It's the other 10% - 20% that takes all the time and effort, making whatever focus there is on trivial CRUD stuff -- whether generated automatically or not -- largely irrelevant.

So take a specific example: Chinook or NorthWind, they're both about a dozen tables. Assume we have a data dictionary of display field names and types for each, some really basic field validation, a list of data entry forms and a list of reports. How do you go about creating that, and how long does it take?

A dozen tables is roughly an afternoon's work in pure code if you've got reasonable UI libraries.

The usual Java UI libraries -- swing or SWT -- make that easy, because they have layout managers that do all the pixel-nudging for you. You specify (via Java code) that you want this bunch of widgets in a grid inside another grid inside a set of columns or whatever, and it lays them out sensibly for you. You can specify explicit pixel positions of you're masochistic, but you try to avoid that.

So take me through that.

  • What specs are you using (other than the SQL DDL)?

It depends on the project. It might be nothing more than SQL DDL, or it might be my notes from a meeting with the client, or a mobile phone pic of the whiteboard from a meeting with a client, or the client's notes for me, or some pre-existing software, or some mock-up by a design person, or an old-skool pixel-by-pixel-layout sketch on graph paper from an old-skool designer, or I'm left to do whatever I think is appropriate, or you-name-it.

But somewhere you will need field specific info such as friendly display name, display formatting, simple validation?

Depends on the requirements. Sometimes display names, display formatting and validation are really important and they wind up as arguments to the form-field constructors.

Sometimes the only validation is a few CHECK constraints in the database, and the database field names and default formatting are fine.

  • What libraries are those?

Most of the stuff I do these days (that has any front-end at all) is Java, using either Swing (https://en.wikipedia.org/wiki/Swing_(Java)) for pure desktop stuff or RAP/RWT (which is essentially SWT) for Web front-ends (https://www.eclipse.org/swt/ and https://wiki.eclipse.org/RAP/RWT) or cross-platform desktop.

I've used Swing and SWT (Knime uses both!), and they are verbose.  So you have pre-written templates, and some kind of meta data at the field level?

It's that plus I have a gaggle of pre-written code -- libraries that get used, and previously-written code as starting points that gets copy-n-pasted from project to project -- for creation of (usually) database-driven forms and ancillary infrastructure. In its simplest form, it's driven by database metadata and I just instantiate a home-made TableEditor with database connection and table name parameters, and it gives the user a table grid with usual facilities.

But isn't an automation of that process exactly what I've been proposing? Templates, metadata, generated code?

Yes, but it's code driven, not table driven. Making it table-driven like a 1980's CASE tool only hinders development -- except when it occasionally (if ever) hits the sweet spot of doing exactly what we need.

Usually, it doesn't.

Though when that sort of default edit-a-table functionality is sufficient, increasingly it's typical to just set up the database and hand it over to the user to edit via pgAdmin4 or Microsoft Access via linked tables, along with a bunch of predefined queries (as *.sql scripts) for producing reports.

There was a time when users wouldn't put up with that -- too hard, too technical -- but now average IT ability is far stronger, so being pointed at a database admin tool (or MS Access) is fine. For some, it's just like the other eight or so (or whatever) raw databases they're already editing to do their job, so no problem.

Ugh! Yes, I'm assuming a user who will adapt to the software and not write big cheques for fancy customisation. But no, I'm assuming existing CRUD editors are not enough, we want the next level up. But we don't want to use a visual editor like Access, and we want to target other runtime environments.

Modern database-admin-tool CRUD editors are pretty good. So is MS Access. So is my bag of code-driven Java goodies, which I keep thinking I ought to clean up and put on GitHub, but then I get busy with work or hobbies or DIY and I don't.

Some day, I'll do it. I was looking at it just last night in writing my last post, and thought "hey, this is pretty good stuff; others should get to use it," but... Time, and distractions.

  • How many lines of code do you write in your 4 hours?

I haven't measured it that way. The trivial stuff I can hammer out at an alarming rate, for some undefined value of "alarming."

I haven't measured a lines-per-hour as such, because the easy stuff is negligible time and effort compared to the hard(er) stuff. It's the hard(er) specific, custom requirements that take time, effort, and attention.

But in terms of code volume, there's ideally one line per database table field unless I'm lucky enough to be able to use TableEditor (see above), then it's one line per table, plus whatever application scaffolding is needed. That can often be mostly copied from a previous project and modified. Realistically, it's one line per table for some tables, one line per database table field for other tables, and n lines for a bunch of customisations because requirements, plus fairly minimal scaffolding overhead.

So you like to do it with code, but that's personal preference. There is no reason you couldn't be writing one line of metadata per field, table, report, etc, is there? Or is there?

By "one line of metadata", you mean entering it into a table?  And having it emit code somewhere?

If so, then I wind up with a rigid tool that almost certainly won't do what I want on the next project even if it does exactly what I need on this project, because it's not emitting exactly the code I could have written in the first place...

That means I'm better off not maintaining an inflexible, non-compose-able, un-programmable, non-extensible code generator, and instead focusing on making my native code libraries better.

The customisations take the bulk of time, effort, and lines.

[And BTW: if it's just a series of API calls, why does it need the power of a GP language instead of just some tables to fill in?]

Composability, modifiability, customisability, and the fact that it can all be automated as desired.

(That last bit needs emphasis: as desired is as I desire, not as some tool developer desires.)

I.e., the reasons we use code for anything.

Sometimes I can't simply use the database table metadata to auto-generate a form, but I can use a loop to emit a gaggle of field definitions from two or three lines of code. Sometimes I need to customise the form fields in unanticipated ways, like all the previous projects could use default system colours but this one needs colours specified and we'll never need that facility again. Or whatever.

If I'm relying on table-based definitions, I'm either forced to adhere to whatever the table-driven-code-generator facility does, or I have to modify the table-driven-code-generator facility every time I need to meet a new kind of requirement. In the long run, that turns into more work to maintain because I'm maintaining application code and libraries and a table-driven code generator, compared to defining interfaces with code (maintaining just libraries and application code.)

In the 1980's, I developed a tool that worked that way, essentially, though it used a meta-language instead of tables, but the idea was similar. It generated code.  We replaced it with a pure-target-code solution written in C++. The slight increase in verbosity (and, for lack of a better term, C++ishness) was offset by a significant increase in flexibility, programmability, extensibility, and customisability.

So that's the part I'm trying to get a handle on. Yes, I know very well how hard it is to create and maintain tools, that's what I do.

My experience suggests that the best tool always winds up being the target high-level language, whether it's SQL or Java/C#.

Much as I don't like to admit it, that probably includes the front-end's JavaScript/HTML/CSS, too, though at least for typical CRUD-oriented business apps I've been able to replace it with pure Java code and haven't encountered any serious limitations. Yet.

I tend to write helpers and subclasses to embody typical look-and-feel, so if the desired new form layouts fit within those, it becomes no more than invoking a bunch of constructors with text prompts and defaults.

Though there's always bits where somebody wants something out-of-band, e.g., "Everything is fine as it is [i.e., simple and trivial to code], but when you click that radio button, we want this to move over there and that to disappear and this to appear and a thing to slide down from the top to fill in the calculation from that over here and make this move down and disable all these menu items and enable those menu items and allow the user to right-click to get a popup but only if the value in that field is filled in. There are some more rules but I'll have Sam in Accounting send you those on Tuesday morning so you'll have time to implement them all before we deploy Tuesday afternoon."

There's always one of those.

Perhaps, but not this time. We've quoted $400 for this job (4 hours, $100/hr) and every variation is $10K, if they want to pay it. They won't.

Reports, for me at least, tend to be emit-a-CSV-file or emit-an-Excel-spreadsheet rather than layout-based. I haven't had to do fiddly text-layout on reports in years, but I imagine it would still be tedious and slow for custom formatted layouts. For the usual columnar stuff, emit-a-CSV-file or emit-an-Excel-file seems to be fine, since reporting is increasingly a matter of generating data to be ingested by other systems -- and only casually eyeballed by humans in between systems, if at all -- rather than generating formatted text to be printed and read.

Not this time. CSV is good, but it has to let you choose the fields, selection criteria and joins. Also sub-totaling, because Excel is really bad at that.

The old days of formatted-for-printing reports with detail lines interspersed with subtotal and total lines seem to be going away, even though at least the data is easy to generate -- just union the detail CSV records with the summary CSV records. But increasingly I'm asked for the detail and the totals to be separate -- usually ingested by distinct systems rather than read by humans -- which makes report development easy.

Easy enough to parametrise, too. It's just forms to get parameter values and parametric queries to generate them.

The tedious and slow bit tends to be the business logic behind the forms and reports, which is code-writing same as it ever was.

Not this time. Every quote starts at $10K, so they won't ask for many of those.

Probably the wrong term, I didn't mean a declarative language, I meant no code,  just data in tables. And nothing that pretends to be code either.

I'm not sure that saves much if anything. Is filling in a bunch of tables with text using pgadmin4 or whatever that much easier and faster than writing the same content via lean code, even in a relatively verbose language like C# or Java?

And what about when I have ten or twenty or fifty rows with almost the same thing but for some changing value. I would far rather implement that with .ForEach(...) or a loop than pound out ten/twenty/fifty almost-identical-except-for-some-changing-value rows in the database, ugh.

Though I could write a query to do it. Except... If it's ten/twenty/fifty rows and I need to change them after I've generated them, now I've got to delete them and regenerate them and... Oh, but they've been customised...

I don't really care what tool we use to generate the data, as long as we recognise it's data, not code. I'm putting forward the hypothesis that you can generate filing cabinet window-on-date apps from data without writing application code. If not, I would like to know what falsifies the hypothesis.

Customisation. There is always customisation, sometimes big, sometimes small. Otherwise, if they just need to edit/search/enter/update data in tables, throw pgAdmin4 or MS Access (perhaps with the Wizard-generated forms) at the client and let them edit the data.

It sounds like you're looking for old-skool CASE tools, which were (notoriously) table-driven and which still seem to be around. Even just yesterday, I got a pseudo-salespitch on Quora from https://parametricsystems.com.au, which makes something called ENIGMA that seems like a fill-in-the-tables CASE tool. It apparently generates code.

So really this is the nub of the problem. We can all see the template/metadata/generation part of the solution solving 80% of the problem. What we can't see is a way to leverage that but leave room for the bits that don't quite fit.

Let's say an application that genuinely solves a reasonable customer problem is complex. Within the essential complexity of the solution there is routinely:

  • 80% of regular complexity
  • 20% of irregular complexity.

The regular complexity is seductively easy, because we can easily capture the regularities in tables, metadata, templates, code generation and so on. We think we're nearly done. We're not.

The irregular complexity is intractably hard. We can:

  1. Add features to our 'regularity' solution. That's hard work, and at the end of the day it comes to resemble code. Then it's just another language and most likely it won't get used.
  2. Find ways to plug in bits of custom code. Learning about all those special plug in places is hard, and mostly likely too hard.
  3. Go back to code first, but retain bits of the 'regularity' solution to reduce code volume. This is state of the art, this is you.

So the heart of the problem as I first stated it is that we just don't know how to integrate a regularity-based system to handle regular complexity with a code-based system to handle irregular complexity, other than method (c). And even if we know parts of the answer, and try to move up a level, it rapidly gets so (accidentally) complex we give up and go back to method (c).

Yes, (c) seems -- for all its limitations and flaws -- to invariably work better than the alternatives when we take all the factors into account.

That makes sense, because that's what code is for.

Attempts to replace code with non-code, but still do all the things that makes code powerful is, I think, doomed. That's because code is the most efficient, effective, powerful way to express the essential complexity of code. Yes, it's difficult and time-consuming and complex, but overall, the only approaches that would be less difficult, time-consuming, and complex would be everything else.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from tobega on February 1, 2021, 10:36 am

I think if you could just specify a database table and have everything generated from that, fine

As soon as you have to add any other data to control how things are generated, you have essentially created a programming language. One that I'm not good at and only use a few days a year at best, and it generally isn't a very capable language either.

But if everything just follows from the table definition, why not just hand the users SQLdeveloper or some such tool to just look directly at the data? Even let them write a little SQL.

Increasingly, I'm seeing users do exactly that.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on February 1, 2021, 11:44 am
Quote from dandl on February 1, 2021, 10:14 am
Quote from Dave Voorhis on January 31, 2021, 3:58 pm

Most of the stuff I do these days (that has any front-end at all) is Java, using either Swing (https://en.wikipedia.org/wiki/Swing_(Java)) for pure desktop stuff or RAP/RWT (which is essentially SWT) for Web front-ends (https://www.eclipse.org/swt/ and https://wiki.eclipse.org/RAP/RWT) or cross-platform desktop.

I've used Swing and SWT (Knime uses both!), and they are verbose.  So you have pre-written templates, and some kind of meta data at the field level?

Sorry, I missed this one in my previous response.

I wouldn't even go as far as describing it as pre-written templates. It's code from previous projects. I just copy it and alter it to suit. At some point I'll refactor it into something more intelligently reusable.

Sometimes there is validation/description/prompt/whatever metadata -- usually specified as field constructor arguments -- sometimes I just present a table browser. Depends on the requirements.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

So really this is the nub of the problem. We can all see the template/metadata/generation part of the solution solving 80% of the problem. What we can't see is a way to leverage that but leave room for the bits that don't quite fit.

Let's say an application that genuinely solves a reasonable customer problem is complex. Within the essential complexity of the solution there is routinely:

  • 80% of regular complexity
  • 20% of irregular complexity.

The regular complexity is seductively easy, because we can easily capture the regularities in tables, metadata, templates, code generation and so on. We think we're nearly done. We're not.

The irregular complexity is intractably hard. We can:

  1. Add features to our 'regularity' solution. That's hard work, and at the end of the day it comes to resemble code. Then it's just another language and most likely it won't get used.
  2. Find ways to plug in bits of custom code. Learning about all those special plug in places is hard, and mostly likely too hard.
  3. Go back to code first, but retain bits of the 'regularity' solution to reduce code volume. This is state of the art, this is you.

So the heart of the problem as I first stated it is that we just don't know how to integrate a regularity-based system to handle regular complexity with a code-based system to handle irregular complexity, other than method (c). And even if we know parts of the answer, and try to move up a level, it rapidly gets so (accidentally) complex we give up and go back to method (c).

Yes, (c) seems -- for all its limitations and flaws -- to invariably work better than the alternatives when we take all the factors into account.

That makes sense, because that's what code is for.

Attempts to replace code with non-code, but still do all the things that makes code powerful is, I think, doomed. That's because code is the most efficient, effective, powerful way to express the essential complexity of code. Yes, it's difficult and time-consuming and complex, but overall, the only approaches that would be less difficult, time-consuming, and complex would be everything else.

It's all code (or data). That's not the issue. It's regularity that makes the difference.

Pegasus is a PEG parser that emits C#. The generated code is striking in its regularity. You get to include snippets at various points, but the generated code is regular.

It's easy to generate UI code, or data access code, or reporting code from metadata that is regular. If you have a table of UI fields with a label, column name, display format, field validation, etc, etc in a perfectly regular structure you can generate regular code (or interpret that table directly). Tools like pgAdmin do exactly that: they rely on the regularity of the DDL. One complicated bit of code to handle one field, then just repeat for each field in the table. Done. There is code in the tool, but there is no code specific to the application: not needed.

But irregularity is the killer for this approach. Something really simple like: this field or that field can be blank, but not both. Or: this date must not be a Tuesday. Or: this number must show in red if it's more than this value. Like you said: must have for this job, don't care for that job, irregular.

So: currently the only way that consistently works is method (c). Method (a) is plain wrong, but there might be a way to do method (b) if and only if we could use metadata for the regular parts and code for the irregular parts (like Pegasus).

Do you know any products that have seriously tried to do that (and not drifted off into method a)?

 

Andl - A New Database Language - andl.org
Quote from dandl on February 1, 2021, 1:31 pm

So really this is the nub of the problem. We can all see the template/metadata/generation part of the solution solving 80% of the problem. What we can't see is a way to leverage that but leave room for the bits that don't quite fit.

Let's say an application that genuinely solves a reasonable customer problem is complex. Within the essential complexity of the solution there is routinely:

  • 80% of regular complexity
  • 20% of irregular complexity.

The regular complexity is seductively easy, because we can easily capture the regularities in tables, metadata, templates, code generation and so on. We think we're nearly done. We're not.

The irregular complexity is intractably hard. We can:

  1. Add features to our 'regularity' solution. That's hard work, and at the end of the day it comes to resemble code. Then it's just another language and most likely it won't get used.
  2. Find ways to plug in bits of custom code. Learning about all those special plug in places is hard, and mostly likely too hard.
  3. Go back to code first, but retain bits of the 'regularity' solution to reduce code volume. This is state of the art, this is you.

So the heart of the problem as I first stated it is that we just don't know how to integrate a regularity-based system to handle regular complexity with a code-based system to handle irregular complexity, other than method (c). And even if we know parts of the answer, and try to move up a level, it rapidly gets so (accidentally) complex we give up and go back to method (c).

Yes, (c) seems -- for all its limitations and flaws -- to invariably work better than the alternatives when we take all the factors into account.

That makes sense, because that's what code is for.

Attempts to replace code with non-code, but still do all the things that makes code powerful is, I think, doomed. That's because code is the most efficient, effective, powerful way to express the essential complexity of code. Yes, it's difficult and time-consuming and complex, but overall, the only approaches that would be less difficult, time-consuming, and complex would be everything else.

It's all code (or data). That's not the issue. It's regularity that makes the difference.

Pegasus is a PEG parser that emits C#. The generated code is striking in its regularity. You get to include snippets at various points, but the generated code is regular.

It's easy to generate UI code, or data access code, or reporting code from metadata that is regular. If you have a table of UI fields with a label, column name, display format, field validation, etc, etc in a perfectly regular structure you can generate regular code (or interpret that table directly). Tools like pgAdmin do exactly that: they rely on the regularity of the DDL. One complicated bit of code to handle one field, then just repeat for each field in the table. Done. There is code in the tool, but there is no code specific to the application: not needed.

But irregularity is the killer for this approach. Something really simple like: this field or that field can be blank, but not both. Or: this date must not be a Tuesday. Or: this number must show in red if it's more than this value. Like you said: must have for this job, don't care for that job, irregular.

So: currently the only way that consistently works is method (c). Method (a) is plain wrong, but there might be a way to do method (b) if and only if we could use metadata for the regular parts and code for the irregular parts (like Pegasus).

Do you know any products that have seriously tried to do that (and not drifted off into method a)?

 

Sorry, I must be missing something here. Pegasus is a parser generator. It appears to let you create languages.

Don't we already have languages that allow us to effectively create CRUD applications, like C# and Java, without the -- as we understand all too well -- undesirable tradeoffs that come from creating new languages?

Before, I thought you were trying to make CRUD application development easier by making it table-driven, like we did with various CASE tools in the 1980's (and, as I mentioned, some still seem to be doing.)

Now, you're suggesting a standalone language?

What does the new language do that C# and Java don't do?


Edit: I re-read it, and I guess what you're asking is, is it possible to combine some non-programming configuration code plus conventional target-language programming?

That sounds like the unpleasantness of mixed XML/JSON/whatever + native code we find in so many ORMs, dependency injection frameworks, style sheets, UI layers (e.g., JavaFX's FXML and WPF's XAML), configuration systems, and so on.

I don't think they're a good thing1. As is typical of such things, they're yet another language to learn, which also takes away programmability, compose-ability, flexibility, and integrate-ability with the target language.

In general:

Don't give me a new language unless it's a whole new end-to-end ecosystem that replaces C# or Java, with clear, strong benefits for embracing it. Like, say, Go or Kotlin; not like yet another dodgy ORM-QL or XML/JSON-fest internal sublanguage or anything like it.

Instead, give me better native libraries that I can program, customise, compose, and integrate with the rest of my application code. Make it easier for me to write and integrate code in external languages like SQL and HTML/CSS/JavaScript, and stop trying to hide them and write them for me.

--

1 Though maybe run-time configuration is ok. Except... I'm going to try making a library to define run-time configurations in pure Java, for all the reasons that code is good, and having the configuration logic compile and evaluate them at run-time.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on February 1, 2021, 1:38 pm
Quote from dandl on February 1, 2021, 1:31 pm

So really this is the nub of the problem. We can all see the template/metadata/generation part of the solution solving 80% of the problem. What we can't see is a way to leverage that but leave room for the bits that don't quite fit.

Let's say an application that genuinely solves a reasonable customer problem is complex. Within the essential complexity of the solution there is routinely:

  • 80% of regular complexity
  • 20% of irregular complexity.

The regular complexity is seductively easy, because we can easily capture the regularities in tables, metadata, templates, code generation and so on. We think we're nearly done. We're not.

The irregular complexity is intractably hard. We can:

  1. Add features to our 'regularity' solution. That's hard work, and at the end of the day it comes to resemble code. Then it's just another language and most likely it won't get used.
  2. Find ways to plug in bits of custom code. Learning about all those special plug in places is hard, and mostly likely too hard.
  3. Go back to code first, but retain bits of the 'regularity' solution to reduce code volume. This is state of the art, this is you.

So the heart of the problem as I first stated it is that we just don't know how to integrate a regularity-based system to handle regular complexity with a code-based system to handle irregular complexity, other than method (c). And even if we know parts of the answer, and try to move up a level, it rapidly gets so (accidentally) complex we give up and go back to method (c).

Yes, (c) seems -- for all its limitations and flaws -- to invariably work better than the alternatives when we take all the factors into account.

That makes sense, because that's what code is for.

Attempts to replace code with non-code, but still do all the things that makes code powerful is, I think, doomed. That's because code is the most efficient, effective, powerful way to express the essential complexity of code. Yes, it's difficult and time-consuming and complex, but overall, the only approaches that would be less difficult, time-consuming, and complex would be everything else.

It's all code (or data). That's not the issue. It's regularity that makes the difference.

Pegasus is a PEG parser that emits C#. The generated code is striking in its regularity. You get to include snippets at various points, but the generated code is regular.

It's easy to generate UI code, or data access code, or reporting code from metadata that is regular. If you have a table of UI fields with a label, column name, display format, field validation, etc, etc in a perfectly regular structure you can generate regular code (or interpret that table directly). Tools like pgAdmin do exactly that: they rely on the regularity of the DDL. One complicated bit of code to handle one field, then just repeat for each field in the table. Done. There is code in the tool, but there is no code specific to the application: not needed.

But irregularity is the killer for this approach. Something really simple like: this field or that field can be blank, but not both. Or: this date must not be a Tuesday. Or: this number must show in red if it's more than this value. Like you said: must have for this job, don't care for that job, irregular.

So: currently the only way that consistently works is method (c). Method (a) is plain wrong, but there might be a way to do method (b) if and only if we could use metadata for the regular parts and code for the irregular parts (like Pegasus).

Do you know any products that have seriously tried to do that (and not drifted off into method a)?

 

Sorry, I must be missing something here. Pegasus is a parser generator. It appears to let you create languages.

Don't we already have languages that allow us to effectively create CRUD applications, like C# and Java, without the -- as we understand all too well -- undesirable tradeoffs that come from creating new languages?

Compiler-comiler tools like yacc, lex and Pegasus are not programming languages. In each case they construct a data model that conforms to a well-specified formal structure (LR, regex, PEG, etc) and then generate code to compare incoming source code to that model. The code is highly repetitive and not suitable for editing. As well, they insert snippets of user-written code to add extra checks and/or do useful work. It is perfectly possible to write lexers and parsers by hand, but you get better results using these tools. I know, I've done it both ways.

SQL is not a programming language. It is arguably a specification for a data structure that will be interpreted by the RDBMS query optimiser and query executor. It has an embedded expression evaluator, but the rest is just data.

Xml, JSON and Yaml are not programming languages. Each is a language for specifying structured data.

Before, I thought you were trying to make CRUD application development easier by making it table-driven, like we did with various CASE tools in the 1980's (and, as I mentioned, some still seem to be doing.)

Now, you're suggesting a standalone language?

What does the new language do that C# and Java don't do?

Nothing. The whole idea is that it does much less. The proposition is that:

  • There are benefits from using the least powerful available language for each identifiable situation (Pegasus rather than hand-coded PEG parser)
  • 80% of all applications can be completely specified by a suitable constructed data model (not necessarily tables)
  • The other 20% is easily handled by code
  • We currently don't know any good way to put them together (but we know lots of bad ways).

Edit: I re-read it, and I guess what you're asking is, is it possible to combine some non-programming configuration code plus conventional target-language programming?

That sounds like the unpleasantness of mixed XML/JSON/whatever + native code we find in so many ORMs, dependency injection frameworks, style sheets, UI layers (e.g., JavaFX's FXML and WPF's XAML), configuration systems, and so on.

I don't think they're a good thing1. As is typical of such things, they're yet another language to learn, which also takes away programmability, compose-ability, flexibility, and integrate-ability with the target language.

Data structuring languages are intrinsically easy to learn, because they do so little. If you already know the LR, SLR, LL, PEG, regex parser models, writing the grammar in one of those is easy.

In general:

Don't give me a new language unless it's a whole new end-to-end ecosystem that replaces C# or Java, with clear, strong benefits for embracing it. Like, say, Go or Kotlin; not like yet another dodgy ORM-QL or XML/JSON-fest internal sublanguage or anything like it.

No, we're not going there. That's Model A, and 'here there be dragons'. This is all about Model B: data model + code snippets.

Instead, give me better native libraries that I can program, customise, compose, and integrate with the rest of my application code. Make it easier for me to write and integrate code in external languages like SQL and HTML/CSS/JavaScript, and stop trying to hide them and write them for me.

--

1 Though maybe run-time configuration is ok. Except... I'm going to try making a library to define run-time configurations in pure Java, for all the reasons that code is good, and having the configuration logic compile and evaluate them at run-time.

Actually, this isn't for you. If this idea worked, it would take work away from you because maybe building a data model is an analyst job, not a programmer job. Maybe you would only get to write the snippets.

But please note: pgAdmin is absolutely not the tool for constructing an application data model. You need a new language because you need to write the model in text, supported by a brilliant dedicated editor. That way you can cut and paste, clone, search and replace just like you do for code. But it's not code, it's data, and the editor makes sure it's always valid.

So:

  • What is the data model that completely specifies 80% of a NorthWinds/Chinook application? (*)
  • What are the snippets needed for the other 20%?
  • How can they be put together?

(*) It's likely that, as with SQL, the data model can be extended with expression evaluation, but that's phase II.

Andl - A New Database Language - andl.org
Quote from dandl on February 2, 2021, 5:49 am
Quote from Dave Voorhis on February 1, 2021, 1:38 pm
Quote from dandl on February 1, 2021, 1:31 pm

So really this is the nub of the problem. We can all see the template/metadata/generation part of the solution solving 80% of the problem. What we can't see is a way to leverage that but leave room for the bits that don't quite fit.

Let's say an application that genuinely solves a reasonable customer problem is complex. Within the essential complexity of the solution there is routinely:

  • 80% of regular complexity
  • 20% of irregular complexity.

The regular complexity is seductively easy, because we can easily capture the regularities in tables, metadata, templates, code generation and so on. We think we're nearly done. We're not.

The irregular complexity is intractably hard. We can:

  1. Add features to our 'regularity' solution. That's hard work, and at the end of the day it comes to resemble code. Then it's just another language and most likely it won't get used.
  2. Find ways to plug in bits of custom code. Learning about all those special plug in places is hard, and mostly likely too hard.
  3. Go back to code first, but retain bits of the 'regularity' solution to reduce code volume. This is state of the art, this is you.

So the heart of the problem as I first stated it is that we just don't know how to integrate a regularity-based system to handle regular complexity with a code-based system to handle irregular complexity, other than method (c). And even if we know parts of the answer, and try to move up a level, it rapidly gets so (accidentally) complex we give up and go back to method (c).

Yes, (c) seems -- for all its limitations and flaws -- to invariably work better than the alternatives when we take all the factors into account.

That makes sense, because that's what code is for.

Attempts to replace code with non-code, but still do all the things that makes code powerful is, I think, doomed. That's because code is the most efficient, effective, powerful way to express the essential complexity of code. Yes, it's difficult and time-consuming and complex, but overall, the only approaches that would be less difficult, time-consuming, and complex would be everything else.

It's all code (or data). That's not the issue. It's regularity that makes the difference.

Pegasus is a PEG parser that emits C#. The generated code is striking in its regularity. You get to include snippets at various points, but the generated code is regular.

It's easy to generate UI code, or data access code, or reporting code from metadata that is regular. If you have a table of UI fields with a label, column name, display format, field validation, etc, etc in a perfectly regular structure you can generate regular code (or interpret that table directly). Tools like pgAdmin do exactly that: they rely on the regularity of the DDL. One complicated bit of code to handle one field, then just repeat for each field in the table. Done. There is code in the tool, but there is no code specific to the application: not needed.

But irregularity is the killer for this approach. Something really simple like: this field or that field can be blank, but not both. Or: this date must not be a Tuesday. Or: this number must show in red if it's more than this value. Like you said: must have for this job, don't care for that job, irregular.

So: currently the only way that consistently works is method (c). Method (a) is plain wrong, but there might be a way to do method (b) if and only if we could use metadata for the regular parts and code for the irregular parts (like Pegasus).

Do you know any products that have seriously tried to do that (and not drifted off into method a)?

 

Sorry, I must be missing something here. Pegasus is a parser generator. It appears to let you create languages.

Don't we already have languages that allow us to effectively create CRUD applications, like C# and Java, without the -- as we understand all too well -- undesirable tradeoffs that come from creating new languages?

Compiler-comiler tools like yacc, lex and Pegasus are not programming languages. In each case they construct a data model that conforms to a well-specified formal structure (LR, regex, PEG, etc) and then generate code to compare incoming source code to that model. The code is highly repetitive and not suitable for editing. As well, they insert snippets of user-written code to add extra checks and/or do useful work. It is perfectly possible to write lexers and parsers by hand, but you get better results using these tools. I know, I've done it both ways.

I've used a fair amount of yacc and lex in the day, and javacc a few times, tried earlier versions of antlr but didn't like them. I'd say the weakness here is that you end up putting code mixed into the DSL where the IDE doesn't recognize it as code.

I am currently using ANTLR4 to parse Tailspin, which I think actually mostly solved the interfacing problem by letting me implement parse-tree visitors and solved the utility problem by trying to do what I mean even when it's theoretically non-deterministic but practically usually works out.

Even so, I've invested a significant amount of time first learning the tool, then dealing with various quirks that arise, especially from the lexer being independent of the parser. Unfortunately I am not sure that has given me any useful knowledge about parsing, other than how to use the particular tool. I am now of the opinion that a simple recursive-descent parser would have taken less time overall and been more flexible and, as far as I've been able to determine, most language implementors seem to agree. The one saving grace is that the parser syntax gives me a good overview of, well, the syntax.

Of course, if the code has to be maintained by someone else in future, they could just get right onto working with a plain code parser, while they would have to expend significant effort to just understand enough about how the chosen parser/compiler tool works to work with it.

That said, regex is now probably useful enough and ubiquitous enough that it can almost be counted as basic programming knowledge. It is hugely advantageous that it is used the same from within any programming language, with PCRE being the de-facto standard.

SQL is not a programming language. It is arguably a specification for a data structure that will be interpreted by the RDBMS query optimiser and query executor. It has an embedded expression evaluator, but the rest is just data.

Xml, JSON and Yaml are not programming languages. Each is a language for specifying structured data.

Well, xml, json and yaml are strictly speaking just serialization formats. But as soon as you create a schema with elements and attributes that have some meaning, you have created a language. Arguably it's a programming language because it's used to program a system and most likely it's on it's way to fulfilling Greenspun's tenth rule.

SQL with recursive CTEs is actually Turing complete, so how does it differ from other programming languages?

Before, I thought you were trying to make CRUD application development easier by making it table-driven, like we did with various CASE tools in the 1980's (and, as I mentioned, some still seem to be doing.)

Now, you're suggesting a standalone language?

What does the new language do that C# and Java don't do?

Nothing. The whole idea is that it does much less. The proposition is that:

  • There are benefits from using the least powerful available language for each identifiable situation (Pegasus rather than hand-coded PEG parser)
  • 80% of all applications can be completely specified by a suitable constructed data model (not necessarily tables)
  • The other 20% is easily handled by code
  • We currently don't know any good way to put them together (but we know lots of bad ways).

Ah, yes, I understand the vision. The trick is indeed to figure out how to put it together. How is it expressed in a way that becomes composable with the host language?

In Tailspin, I have added syntax to parse string data into structured data, https://github.com/tobega/tailspin-v0/blob/master/TailspinReference.md#composer

My current work with relations is to first introduce a relation value data structure that is useful on its own, but I intend it to be the way to interface with database products too, https://github.com/tobega/tailspin-v0/blob/master/TailspinReference.md#relations

I haven't gotten around to UI-toolkits yet, but it's entirely possible to have language constructs that draw up a user interface in some representation, the latest of these efforts perhaps being https://flutter.dev/ It remains to be seen how great it is to debug the generated html pages in the web case.

What you are proposing becomes even more difficult because you are both starting and ending outside the host language. That seems to indicate that for it to be feasible at all, you should probably start with a construct defined in the host language and propagate it both ways. Preferably without the middle construct being too much of a language in its own right, nor too restrictive in how to compose it with custom code.

Quote from dandl on February 2, 2021, 5:49 am
Quote from Dave Voorhis on February 1, 2021, 1:38 pm
Quote from dandl on February 1, 2021, 1:31 pm

So really this is the nub of the problem. We can all see the template/metadata/generation part of the solution solving 80% of the problem. What we can't see is a way to leverage that but leave room for the bits that don't quite fit.

Let's say an application that genuinely solves a reasonable customer problem is complex. Within the essential complexity of the solution there is routinely:

  • 80% of regular complexity
  • 20% of irregular complexity.

The regular complexity is seductively easy, because we can easily capture the regularities in tables, metadata, templates, code generation and so on. We think we're nearly done. We're not.

The irregular complexity is intractably hard. We can:

  1. Add features to our 'regularity' solution. That's hard work, and at the end of the day it comes to resemble code. Then it's just another language and most likely it won't get used.
  2. Find ways to plug in bits of custom code. Learning about all those special plug in places is hard, and mostly likely too hard.
  3. Go back to code first, but retain bits of the 'regularity' solution to reduce code volume. This is state of the art, this is you.

So the heart of the problem as I first stated it is that we just don't know how to integrate a regularity-based system to handle regular complexity with a code-based system to handle irregular complexity, other than method (c). And even if we know parts of the answer, and try to move up a level, it rapidly gets so (accidentally) complex we give up and go back to method (c).

Yes, (c) seems -- for all its limitations and flaws -- to invariably work better than the alternatives when we take all the factors into account.

That makes sense, because that's what code is for.

Attempts to replace code with non-code, but still do all the things that makes code powerful is, I think, doomed. That's because code is the most efficient, effective, powerful way to express the essential complexity of code. Yes, it's difficult and time-consuming and complex, but overall, the only approaches that would be less difficult, time-consuming, and complex would be everything else.

It's all code (or data). That's not the issue. It's regularity that makes the difference.

Pegasus is a PEG parser that emits C#. The generated code is striking in its regularity. You get to include snippets at various points, but the generated code is regular.

It's easy to generate UI code, or data access code, or reporting code from metadata that is regular. If you have a table of UI fields with a label, column name, display format, field validation, etc, etc in a perfectly regular structure you can generate regular code (or interpret that table directly). Tools like pgAdmin do exactly that: they rely on the regularity of the DDL. One complicated bit of code to handle one field, then just repeat for each field in the table. Done. There is code in the tool, but there is no code specific to the application: not needed.

But irregularity is the killer for this approach. Something really simple like: this field or that field can be blank, but not both. Or: this date must not be a Tuesday. Or: this number must show in red if it's more than this value. Like you said: must have for this job, don't care for that job, irregular.

So: currently the only way that consistently works is method (c). Method (a) is plain wrong, but there might be a way to do method (b) if and only if we could use metadata for the regular parts and code for the irregular parts (like Pegasus).

Do you know any products that have seriously tried to do that (and not drifted off into method a)?

 

Sorry, I must be missing something here. Pegasus is a parser generator. It appears to let you create languages.

Don't we already have languages that allow us to effectively create CRUD applications, like C# and Java, without the -- as we understand all too well -- undesirable tradeoffs that come from creating new languages?

Compiler-comiler tools like yacc, lex and Pegasus are not programming languages. In each case they construct a data model that conforms to a well-specified formal structure (LR, regex, PEG, etc) and then generate code to compare incoming source code to that model. The code is highly repetitive and not suitable for editing. As well, they insert snippets of user-written code to add extra checks and/or do useful work. It is perfectly possible to write lexers and parsers by hand, but you get better results using these tools. I know, I've done it both ways.

SQL is not a programming language. It is arguably a specification for a data structure that will be interpreted by the RDBMS query optimiser and query executor. It has an embedded expression evaluator, but the rest is just data.

Xml, JSON and Yaml are not programming languages. Each is a language for specifying structured data.

Exactly the problem.

We've arbitrarily divided languages into "not programming language" / "data language" vs "programming language", when we should have listened to the LISP (and Forth, and later, Haskell and other) folks who went down this path before us and realised -- or knew to begin with -- that every non-programmable language is a lesser thing than an equivalent programming language. Thus, LISPers don't use (for example) a separate data or configuration language, they use LISP.

They extend the base language, rather than trying to generate it or replace it, which only creates a rigidity, inflexibility, and impedance mismatch between the host language and the sublanguage.

I have no problem with XML/YAML/JSON/whatever if I never have to see it. If it lives purely at the serialisation/deserialisation level, then that's fine -- I'm no more likely to encounter it than the raw TCP/IP packets travelling over my WiFi and Ethernet network.

The moment I'm expected to intelligently manipulate them, consider their structure -- either directly or indirectly -- give me a full-fledged programming language to do it.

There are few things worse to my mind than having to edit thousands of lines of data language -- knowing that it could be represented in a few lines of programming language -- because some other developer decided to make things "easier" for me.

Before, I thought you were trying to make CRUD application development easier by making it table-driven, like we did with various CASE tools in the 1980's (and, as I mentioned, some still seem to be doing.)

Now, you're suggesting a standalone language?

What does the new language do that C# and Java don't do?

Nothing. The whole idea is that it does much less. The proposition is that:

  • There are benefits from using the least powerful available language for each identifiable situation (Pegasus rather than hand-coded PEG parser)
  • 80% of all applications can be completely specified by a suitable constructed data model (not necessarily tables)
  • The other 20% is easily handled by code
  • We currently don't know any good way to put them together (but we know lots of bad ways).

As I've noted, these days I see a lot of:

  1. A database developer sets up a database with a full set of constraints. If the requirements are minimal, the user is handed pgAdmin or SQLDeveloper, some training if they haven't used pgAdmin or SQLDeveloper before (though they usually have), maybe are given a bunch of pre-written *.sql scripts in a folder, and it's job done. There's your database, call us if you need anything.
  2. As above, but with a thin user-friendly skin written in Python DJango (it auto-generates some simple but usable table editing stuff, mainly intended for admin use) or using (picking a typical example at random) Java Spring with Vaadin (if a bit more customisation is needed), or any of dozens of similar toolsets (including my own, if it's me.) That's a bit more work than handing over the database and pgAdmin, but it offers a slightly friendlier face and potentially more functionality.
  3. A thing that would have been a CRUD front-end a few years ago now has a front-end that is nothing like CRUD -- maybe it's a graph where you move the lines, maybe it's Google Maps where you stick pushpins, maybe it's a mobile app that tracks something you do -- and there just isn't a traditional table interface because in production it's all software talking to other software. Humans just monitor it with their Grafana dashboards and access the database via pgAdmin or SQLDeveloper if something goes wrong.

Edit: I re-read it, and I guess what you're asking is, is it possible to combine some non-programming configuration code plus conventional target-language programming?

That sounds like the unpleasantness of mixed XML/JSON/whatever + native code we find in so many ORMs, dependency injection frameworks, style sheets, UI layers (e.g., JavaFX's FXML and WPF's XAML), configuration systems, and so on.

I don't think they're a good thing1. As is typical of such things, they're yet another language to learn, which also takes away programmability, compose-ability, flexibility, and integrate-ability with the target language.

Data structuring languages are intrinsically easy to learn, because they do so little. If you already know the LR, SLR, LL, PEG, regex parser models, writing the grammar in one of those is easy.

They're easy to learn but hard to use, not because they're complicated, but because they're too simplistic. Inevitably, you want to (say) create a loop to generate these 100 nearly-identical-but-for-this records but... No loop.

So you write code in a more powerful programming language to generate code in the less powerful data language, which should tell you that the code you wrote to generate the code you need should have been all the code you need.

In general:

Don't give me a new language unless it's a whole new end-to-end ecosystem that replaces C# or Java, with clear, strong benefits for embracing it. Like, say, Go or Kotlin; not like yet another dodgy ORM-QL or XML/JSON-fest internal sublanguage or anything like it.

No, we're not going there. That's Model A, and 'here there be dragons'. This is all about Model B: data model + code snippets.

Instead, give me better native libraries that I can program, customise, compose, and integrate with the rest of my application code. Make it easier for me to write and integrate code in external languages like SQL and HTML/CSS/JavaScript, and stop trying to hide them and write them for me.

--

1 Though maybe run-time configuration is ok. Except... I'm going to try making a library to define run-time configurations in pure Java, for all the reasons that code is good, and having the configuration logic compile and evaluate them at run-time.

Actually, this isn't for you. If this idea worked, it would take work away from you because maybe building a data model is an analyst job, not a programmer job. Maybe you would only get to write the snippets.

But please note: pgAdmin is absolutely not the tool for constructing an application data model. You need a new language because you need to write the model in text, supported by a brilliant dedicated editor. That way you can cut and paste, clone, search and replace just like you do for code. But it's not code, it's data, and the editor makes sure it's always valid.

What analyst jobs?

Those are mostly gone now. There are just developer jobs with different (or all -- i.e., "full stack") core skills. You write the model in SQL using SQL Developer or pgAdmin or whatever, the result is a database, hand over the credentials. Job done, and it's simpler than the Excel spreadsheet that did it before.

That is what's happening in the corporate world these days.

So:

  • What is the data model that completely specifies 80% of a NorthWinds/Chinook application? (*)
  • What are the snippets needed for the other 20%?
  • How can they be put together?

(*) It's likely that, as with SQL, the data model can be extended with expression evaluation, but that's phase II.

I think we've seen variations on that going back to the 1970's, and everything since has made stabs at the same idea, whether dBase x, PCFile, MS Access, IBM Query, spreadsheets (yes, they are), and every time -- aside from spreadsheets, from which I think we have a lot to learn a lot about agility and ease of use in an end-user data tool (and, yes, I'm still working on a spreadsheet/datasheet tool) -- we wind up either handing over the database (e.g., write some SQL and give 'em pgAdmin) or code is the right way to do it.

Sounds like you're trying to create something in between -- like dBase x, PCFile, MS Access, IBM Query, you-name-it -- and whilst I can't fault the desire ("this should be easy!") I think it suffers the same problems that propel developers to repeatedly attempt to create general-purpose no-code and low-code systems, which every time turn out to make the essential complexity of programming harder rather than easier.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Compiler-comiler tools like yacc, lex and Pegasus are not programming languages. In each case they construct a data model that conforms to a well-specified formal structure (LR, regex, PEG, etc) and then generate code to compare incoming source code to that model. The code is highly repetitive and not suitable for editing. As well, they insert snippets of user-written code to add extra checks and/or do useful work. It is perfectly possible to write lexers and parsers by hand, but you get better results using these tools. I know, I've done it both ways.

SQL is not a programming language. It is arguably a specification for a data structure that will be interpreted by the RDBMS query optimiser and query executor. It has an embedded expression evaluator, but the rest is just data.

Xml, JSON and Yaml are not programming languages. Each is a language for specifying structured data.

Exactly the problem.

We've arbitrarily divided languages into "not programming language" / "data language" vs "programming language", when we should have listened to the LISP (and Forth, and later, Haskell and other) folks who went down this path before us and realised -- or knew to begin with -- that every non-programmable language is a lesser thing than an equivalent programming language. Thus, LISPers don't use (for example) a separate data or configuration language, they use LISP.

Not arbitrarily, very precisely. A Turing complete language can do anything, but it cannot be proved to complete or be correct, and if it expresses an algorithm it cannot (in general) be proven to do so, or be optimised to do it better. SQL exists because it cannot do everything, but it can be proved to complete, and it can be optimised. These are strong benefits you lose when you insist on introducing arbitrary code.

A data model language can be proved to produce precisely that data model and no other. And the Lisp guys were wrong. If the language chain of HTML/CSS/JS/JSON/XML/SQL teaches us anything, it's that one language does not rule them all.

They extend the base language, rather than trying to generate it or replace it, which only creates a rigidity, inflexibility, and impedance mismatch between the host language and the sublanguage.

I have no problem with XML/YAML/JSON/whatever if I never have to see it. If it lives purely at the serialisation/deserialisation level, then that's fine -- I'm no more likely to encounter it than the raw TCP/IP packets travelling over my WiFi and Ethernet network.

The moment I'm expected to intelligently manipulate them, consider their structure -- either directly or indirectly -- give me a full-fledged programming language to do it.

There are few things worse to my mind than having to edit thousands of lines of data language -- knowing that it could be represented in a few lines of programming language -- because some other developer decided to make things "easier" for me.

I understand perfectly that you are 100% code-centric, so we are not talking about tools for you to use. This is about tools (like pgAdmin) that can be used by non-programmers to design, maintain and extend window-on-data applications.

The proposition is that:

  • There are benefits from using the least powerful available language for each identifiable situation (Pegasus rather than hand-coded PEG parser)
  • 80% of all applications can be completely specified by a suitable constructed data model (not necessarily tables)
  • The other 20% is easily handled by code
  • We currently don't know any good way to put them together (but we know lots of bad ways).

As I've noted, these days I see a lot of:

  1. A database developer sets up a database with a full set of constraints. If the requirements are minimal, the user is handed pgAdmin or SQLDeveloper, some training if they haven't used pgAdmin or SQLDeveloper before (though they usually have), maybe are given a bunch of pre-written *.sql scripts in a folder, and it's job done. There's your database, call us if you need anything.

The tools I'm proposing can do better than that.

  1. As above, but with a thin user-friendly skin written in Python DJango (it auto-generates some simple but usable table editing stuff, mainly intended for admin use) or using (picking a typical example at random) Java Spring with Vaadin (if a bit more customisation is needed), or any of dozens of similar toolsets (including my own, if it's me.) That's a bit more work than handing over the database and pgAdmin, but it offers a slightly friendlier face and potentially more functionality.

Getting closer.

  1. A thing that would have been a CRUD front-end a few years ago now has a front-end that is nothing like CRUD -- maybe it's a graph where you move the lines, maybe it's Google Maps where you stick pushpins, maybe it's a mobile app that tracks something you do -- and there just isn't a traditional table interface because in production it's all software talking to other software. Humans just monitor it with their Grafana dashboards and access the database via pgAdmin or SQLDeveloper if something goes wrong.

Not going there.

Data structuring languages are intrinsically easy to learn, because they do so little. If you already know the LR, SLR, LL, PEG, regex parser models, writing the grammar in one of those is easy.

They're easy to learn but hard to use, not because they're complicated, but because they're too simplistic. Inevitably, you want to (say) create a loop to generate these 100 nearly-identical-but-for-this records but... No loop.

So you write code in a more powerful programming language to generate code in the less powerful data language, which should tell you that the code you wrote to generate the code you need should have been all the code you need.

Only if you're code-centric. The better way (usually) is data-oriented tools: text editor, things like sed/grep/awk, table-editors. Improve the tool so people not as smart as you can understand and maintain the system, don't fail over to code.

But please note: pgAdmin is absolutely not the tool for constructing an application data model. You need a new language because you need to write the model in text, supported by a brilliant dedicated editor. That way you can cut and paste, clone, search and replace just like you do for code. But it's not code, it's data, and the editor makes sure it's always valid.

What analyst jobs?

Those are mostly gone now. There are just developer jobs with different (or all -- i.e., "full stack") core skills. You write the model in SQL using SQL Developer or pgAdmin or whatever, the result is a database, hand over the credentials. Job done, and it's simpler than the Excel spreadsheet that did it before.

That is what's happening in the corporate world these days.

Could be, but that's not my proposition. There are non-programmers out there who can contribute to the design, maintenance and extension of application programs, This is about better tools for them.

So:

  • What is the data model that completely specifies 80% of a NorthWinds/Chinook application? (*)
  • What are the snippets needed for the other 20%?
  • How can they be put together?

(*) It's likely that, as with SQL, the data model can be extended with expression evaluation, but that's phase II.

I think we've seen variations on that going back to the 1970's, and everything since has made stabs at the same idea, whether dBase x, PCFile, MS Access, IBM Query, spreadsheets (yes, they are), and every time -- aside from spreadsheets, from which I think we have a lot to learn a lot about agility and ease of use in an end-user data tool (and, yes, I'm still working on a spreadsheet/datasheet tool) -- we wind up either handing over the database (e.g., write some SQL and give 'em pgAdmin) or code is the right way to do it.

Actually, no. I've used those and many others, and they're a mish-mash of ideas. Back in the dBase days there were no languages of substance on PCs (that's why I created Powerflex), but now that's a solved problem. Data storage is a solved problem. There is almost nothing to learn from attempts going back more than about 20 years, because this is not the problem they (we) were trying to solve.

Sounds like you're trying to create something in between -- like dBase x, PCFile, MS Access, IBM Query, you-name-it -- and whilst I can't fault the desire ("this should be easy!") I think it suffers the same problems that propel developers to repeatedly attempt to create general-purpose no-code and low-code systems, which every time turn out to make the essential complexity of programming harder rather than easier.

So which of my 3 questions is hard to answer?

  • A data model for 80% of the app?
  • Code snippets for 20% in your favorite language?
  • A way to put them together?

 

Andl - A New Database Language - andl.org

Compiler-comiler tools like yacc, lex and Pegasus are not programming languages. In each case they construct a data model that conforms to a well-specified formal structure (LR, regex, PEG, etc) and then generate code to compare incoming source code to that model. The code is highly repetitive and not suitable for editing. As well, they insert snippets of user-written code to add extra checks and/or do useful work. It is perfectly possible to write lexers and parsers by hand, but you get better results using these tools. I know, I've done it both ways.

I've used a fair amount of yacc and lex in the day, and javacc a few times, tried earlier versions of antlr but didn't like them. I'd say the weakness here is that you end up putting code mixed into the DSL where the IDE doesn't recognize it as code.

I am currently using ANTLR4 to parse Tailspin, which I think actually mostly solved the interfacing problem by letting me implement parse-tree visitors and solved the utility problem by trying to do what I mean even when it's theoretically non-deterministic but practically usually works out.

Even so, I've invested a significant amount of time first learning the tool, then dealing with various quirks that arise, especially from the lexer being independent of the parser. Unfortunately I am not sure that has given me any useful knowledge about parsing, other than how to use the particular tool. I am now of the opinion that a simple recursive-descent parser would have taken less time overall and been more flexible and, as far as I've been able to determine, most language implementors seem to agree. The one saving grace is that the parser syntax gives me a good overview of, well, the syntax.

I'm a language implementor and I don't agree. Most modern/interesting languages cannot be parsed by LL(1) or LL(k) parsers, because they require unlimited backtracking. Case in point:

ident[ident[ident[ident(ident(arg,arg),arg,arg)]]]=1

We don't know it's an assignment statement until we see the '='. I know how to solve this problem, but it takes extra work. Ditto for the lexer: simple regex is never enough. Antlr (which I have used since it was called PCCTS) can handle it, but it's complicated and it's effectively non-free as there is limited help online.

The classic solution has been LR grammars, but they're harder to write and really hard to debug. I wrote Andl using recursive descent, and then rewrote it in a fraction of the time using a PEG grammar.

Of course, if the code has to be maintained by someone else in future, they could just get right onto working with a plain code parser, while they would have to expend significant effort to just understand enough about how the chosen parser/compiler tool works to work with it.

That said, regex is now probably useful enough and ubiquitous enough that it can almost be counted as basic programming knowledge. It is hugely advantageous that it is used the same from within any programming language, with PCRE being the de-facto standard.

SQL is not a programming language. It is arguably a specification for a data structure that will be interpreted by the RDBMS query optimiser and query executor. It has an embedded expression evaluator, but the rest is just data.

Xml, JSON and Yaml are not programming languages. Each is a language for specifying structured data.

Well, xml, json and yaml are strictly speaking just serialization formats. But as soon as you create a schema with elements and attributes that have some meaning, you have created a language. Arguably it's a programming language because it's used to program a system and most likely it's on it's way to fulfilling Greenspun's tenth rule.

Greenspun was wrong (and it was a humorous offhand remark). Sufficiently big programs do almost always contain one or more sub-languages, but they're almost never related to Lisp. The commonest ones are expression languages (like calc) and stack-based languages (like Forth).

SQL with recursive CTEs is actually Turing complete, so how does it differ from other programming languages?

Recursive CTEs use fixed point recursion, and so are not Turing complete. However, they can be made so in combination with windowing, as has been proved. The technique involved is not practical, and by SQL I was referring to the subset of SQL DQL/DML in wide-spread use across all platforms.

Before, I thought you were trying to make CRUD application development easier by making it table-driven, like we did with various CASE tools in the 1980's (and, as I mentioned, some still seem to be doing.)

Now, you're suggesting a standalone language?

What does the new language do that C# and Java don't do?

Nothing. The whole idea is that it does much less. The proposition is that:

  • There are benefits from using the least powerful available language for each identifiable situation (Pegasus rather than hand-coded PEG parser)
  • 80% of all applications can be completely specified by a suitable constructed data model (not necessarily tables)
  • The other 20% is easily handled by code
  • We currently don't know any good way to put them together (but we know lots of bad ways).

Ah, yes, I understand the vision. The trick is indeed to figure out how to put it together. How is it expressed in a way that becomes composable with the host language?

In Tailspin, I have added syntax to parse string data into structured data, https://github.com/tobega/tailspin-v0/blob/master/TailspinReference.md#composer

My current work with relations is to first introduce a relation value data structure that is useful on its own, but I intend it to be the way to interface with database products too, https://github.com/tobega/tailspin-v0/blob/master/TailspinReference.md#relations

I haven't gotten around to UI-toolkits yet, but it's entirely possible to have language constructs that draw up a user interface in some representation, the latest of these efforts perhaps being https://flutter.dev/ It remains to be seen how great it is to debug the generated html pages in the web case.

What you are proposing becomes even more difficult because you are both starting and ending outside the host language. That seems to indicate that for it to be feasible at all, you should probably start with a construct defined in the host language and propagate it both ways. Preferably without the middle construct being too much of a language in its own right, nor too restrictive in how to compose it with custom code.

My intention is that the 80% part is executable in isolation, and can be readily ported across a wide range of technologies and platforms. The exact same data model will run on desktop or the Web, on Java or C# foundations, on any SQL or noSQL database. There is no defined host language.

The 20% snippet language either has to match the underlying technology or be transpiled. Look at Unity for an example of how brilliantly that works.

Andl - A New Database Language - andl.org