The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

ANSWERED: Aesthetics of collection delimiters - braces vs brackets

Quote from Darren Duncan on April 9, 2021, 9:55 am

As you may have guessed, I'm talking about Muldis Object Notation.

Here is the current version with embedded examples: https://github.com/muldis/Muldis_Object_Notation/blob/master/spec/Muldis_Object_Notation_Syntax_Plain_Text.md

Here is an example relation literal with the current syntax:

(Relation<-{
(name : "Jane Ives", birth_date : 0Lci@y1971|m11|d06,
phone_numbers : (Set<-{"+1.4045552995", "+1.7705557572"})),
(name : "Layla Miller", birth_date : 0Lci@y1995|m08|d27,
phone_numbers : (Set<-{})),
(name : "岩倉 玲音", birth_date : 0Lci@y1984|m07|d06,
phone_numbers : (Set<-{"+81.9072391679"})),
})
(Relation<-{ (name : "Jane Ives", birth_date : 0Lci@y1971|m11|d06, phone_numbers : (Set<-{"+1.4045552995", "+1.7705557572"})), (name : "Layla Miller", birth_date : 0Lci@y1995|m08|d27, phone_numbers : (Set<-{})), (name : "岩倉 玲音", birth_date : 0Lci@y1984|m07|d06, phone_numbers : (Set<-{"+81.9072391679"})), })
(Relation<-{
    (name : "Jane Ives", birth_date : 0Lci@y1971|m11|d06,
        phone_numbers : (Set<-{"+1.4045552995", "+1.7705557572"})),
    (name : "Layla Miller", birth_date : 0Lci@y1995|m08|d27,
        phone_numbers : (Set<-{})),
    (name : "岩倉 玲音", birth_date : 0Lci@y1984|m07|d06,
        phone_numbers : (Set<-{"+81.9072391679"})),
})

Here is an example of what that would look like with the alternate bracketing:

(Relation<-[
(name : "Jane Ives", birth_date : 0Lci@y1971|m11|d06,
phone_numbers : (Set<-["+1.4045552995", "+1.7705557572"])),
(name : "Layla Miller", birth_date : 0Lci@y1995|m08|d27,
phone_numbers : (Set<-[])),
(name : "岩倉 玲音", birth_date : 0Lci@y1984|m07|d06,
phone_numbers : (Set<-["+81.9072391679"])),
])
(Relation<-[ (name : "Jane Ives", birth_date : 0Lci@y1971|m11|d06, phone_numbers : (Set<-["+1.4045552995", "+1.7705557572"])), (name : "Layla Miller", birth_date : 0Lci@y1995|m08|d27, phone_numbers : (Set<-[])), (name : "岩倉 玲音", birth_date : 0Lci@y1984|m07|d06, phone_numbers : (Set<-["+81.9072391679"])), ])
(Relation<-[
    (name : "Jane Ives", birth_date : 0Lci@y1971|m11|d06,
        phone_numbers : (Set<-["+1.4045552995", "+1.7705557572"])),
    (name : "Layla Miller", birth_date : 0Lci@y1995|m08|d27,
        phone_numbers : (Set<-[])),
    (name : "岩倉 玲音", birth_date : 0Lci@y1984|m07|d06,
        phone_numbers : (Set<-["+81.9072391679"])),
])

And here are before+after examples of an interval literal, which would swap the other way:

[2.7..-9.3]
{2.7..-9.3}
[2.7..-9.3] {2.7..-9.3}
[2.7..-9.3]

{2.7..-9.3}

There are more examples at the above url.  This is still a work in progress in other respects.

Why not:

name,birth_date,phone_numbers[]

Jane Ives,1971/11/6,[+1.4045552995,+1.7705557572]

Layla Miller,1995/8/27,,

岩倉 玲音,1984/7/6,[81.9072391679]

Much easier to read, dead easy to parse. Or check out YAML.

Andl - A New Database Language - andl.org
Quote from dandl on April 9, 2021, 3:22 pm

Why not:

name,birth_date,phone_numbers[]

Jane Ives,1971/11/6,[+1.4045552995,+1.7705557572]

Layla Miller,1995/8/27,,

岩倉 玲音,1984/7/6,[81.9072391679]

Much easier to read, dead easy to parse. Or check out YAML.

One of my key design goals is to reasonably minimize the number of core syntactic elements, while being strongly typed and get most of the actual complexity through composition, while also balancing a reasonable level of terseness so its easy enough for developers to write out artifacts by hand like source code, and keeping the parsing requirements reasonably simple.

One key design decision is that the syntax is designed to be extensible, and it uses a reasonable minimum of distinct elements and characters of its own, while leaving the most available for others to define meanings to in their own superset grammar.  The analogy is that my MUON is to JSON what any superset could be to JavaScript the full language.  And likewise actually using a superset is not required.

Your example seems to basically be a CSV, and the logic to parse that would actually be considerably more complicated and guesswork to derive the meaning my version explicitly gives.

My version supports nested relations or other collections within relations to arbitrary depths, the nested plain set being an example.

I have distinct temporal type possreps for durations and instants, and both support arbitrary levels of precision or known/unknown elements, which is why the instant literal in mine had the format of name-value pairs.

I have no possrep for phone numbers and just happened to use a character string to represent one as an example, and the + is part of the standard international syntax.

One aspect where I say the spec is a work in progress is that I'm just in the process of refactoring the actual grammar definition which currently has a lot of repeated almost the same looking syntax definitions for collections.

Once the current round is done, the physical grammar will just have a total of 4 generic collection syntactic elements:

  • () - an empty Tuple
  • (this) - expression grouping parenthesis
  • (this : that) - a Pair
  • (asset1, asset2, name3: asset3, name4: asset4, ...) - a nonempty Tuple
  • {elem1, elem2, elem3: count3, elem4: count4, ...} - a Lot
  • [lower .. higher] - an Interval

These collectively use all 3 bracketing characters for distinct meanings.

A Lot is an ordered collection with possible duplicates which usually only is used directly to represent source code and wrapping it in a Pair specifying a cast such as (Relation:{...}) indicates that running that code maps it to a relation; each "elem" is a tuple, and each "count" if given is squashed to 1.

Quote from Dave Voorhis on April 9, 2021, 2:43 pm
Quote from Darren Duncan on April 9, 2021, 11:39 am

 

As for what I'm doing taking a lot of time and moving slowly, I don't see how this is much worse than how things are going among others in this forum.

After decades of talk there still isn't a widely used Industrial D from anyone else, is there?

In terms of database languages, there isn't a widely used anything-that-isn't-SQL.

Even the most widely used NoSQL systems like MongoDB and Lucene are barely a drop in the bucket compared to SQL DBMSs.

"This [notation] is for public use." Currently it's a public of 1. Users of those NoSQL systems at least number in the tens of thousands. How are you going to get to user number 2? However different is what you're doing, I don't think technical points will persuade anyone. A new interchange format needs a compelling use case.

JSON supports expressing the schema as well as the 'content'. It'll be good enough. It has more than 1 public.

"widely used Industrial D"

You are aiming to produce a widely used Industrial something?

Quote from AntC on April 9, 2021, 9:58 pm

"This [notation] is for public use." Currently it's a public of 1. Users of those NoSQL systems at least number in the tens of thousands. How are you going to get to user number 2? However different is what you're doing, I don't think technical points will persuade anyone. A new interchange format needs a compelling use case.

JSON supports expressing the schema as well as the 'content'. It'll be good enough. It has more than 1 public.

"widely used Industrial D"

You are aiming to produce a widely used Industrial something?

I am indeed aiming to produce a widely used Industrial something.  This isn't all just meant to be some academic project.

Every tool with more than one user starts with one or zero, your stating 1 is not new information.

Arguing that JSON is good enough is like arguing that SQL etc are good enough and there is no point in making a D language or other alternative for industry, do you believe the latter?

Quote from Darren Duncan on April 9, 2021, 9:12 pm
Quote from dandl on April 9, 2021, 3:22 pm

Why not:

name,birth_date,phone_numbers[]

Jane Ives,1971/11/6,[+1.4045552995,+1.7705557572]

Layla Miller,1995/8/27,,

岩倉 玲音,1984/7/6,[81.9072391679]

Much easier to read, dead easy to parse. Or check out YAML.

One of my key design goals is to reasonably minimize the number of core syntactic elements, while being strongly typed and get most of the actual complexity through composition, while also balancing a reasonable level of terseness so its easy enough for developers to write out artifacts by hand like source code, and keeping the parsing requirements reasonably simple.

One key design decision is that the syntax is designed to be extensible, and it uses a reasonable minimum of distinct elements and characters of its own, while leaving the most available for others to define meanings to in their own superset grammar.  The analogy is that my MUON is to JSON what any superset could be to JavaScript the full language.  And likewise actually using a superset is not required.

Your example seems to basically be a CSV, and the logic to parse that would actually be considerably more complicated and guesswork to derive the meaning my version explicitly gives.

My version supports nested relations or other collections within relations to arbitrary depths, the nested plain set being an example.

I have distinct temporal type possreps for durations and instants, and both support arbitrary levels of precision or known/unknown elements, which is why the instant literal in mine had the format of name-value pairs.

I have no possrep for phone numbers and just happened to use a character string to represent one as an example, and the + is part of the standard international syntax.

Then you fail to make yourself clear.

The ordinary purpose of a serialisation format is to allow data to be transferred faithfully, and JSON does that well. A separate question is how to transfer the schema for that data.

JSON is absolutely standardised. It has strings, numbers, true/false, null, objects, arrays. Some libraries impose limits (such as floating point numbers instead of arbitrary precision) and some have extensions (comments and dates). If both ends are using the same schema, and assuming a convention for dates, then AFAIK all kinds of business data can be transferred. There are probably some corner cases, but I can't think of any right now. I'm using it right now for nested relational data and it works just fine.

CSV files are sufficiently well-defined to be used to transfer relational data of scalar types, again given that the schema already exists at both ends. YAML does it all and more, but can get pretty complicated. And of course there is always XML. Why on earth would you need to add another one?

Communicating a schema, either as part of the data or separately or by implication is a separate topic.

Andl - A New Database Language - andl.org
Quote from dandl on April 10, 2021, 12:27 am

JSON is absolutely standardised. It has strings, numbers, true/false, null, objects, arrays. Some libraries impose limits (such as floating point numbers instead of arbitrary precision) and some have extensions (comments and dates). If both ends are using the same schema, and assuming a convention for dates, then AFAIK all kinds of business data can be transferred. There are probably some corner cases, but I can't think of any right now. I'm using it right now for nested relational data and it works just fine.

Please have a look at this and share your thoughts:

http://seriot.ch/parsing_json.php - "Parsing JSON is a Minefield"

That is one of my sources explaining the non-standardization and issues with JSON.

Also https://github.com/nst/JSONTestSuite can be informative.

Here's one about YAML problems:

https://hitchdev.com/strictyaml/why/implicit-typing-removed/

Also YAML being a superset of JSON inherits anything about JSON.

Quote from dandl on April 10, 2021, 12:27 am
Quote from Darren Duncan on April 9, 2021, 9:12 pm
Quote from dandl on April 9, 2021, 3:22 pm

Why not:

...

Much easier to read, dead easy to parse. Or check out YAML.

One of my key design goals is to reasonably minimize the number of core syntactic elements, while being strongly typed and get most of the actual complexity through composition, while also balancing a reasonable level of terseness so its easy enough for developers to write out artifacts by hand like source code, and keeping the parsing requirements reasonably simple.

Huh? People are going to write your formats "by hand"? Won't the compiler or DBMS do that? One of the reasons JSON's limitations and verboseness don't really matter is because there's tools to write and read, and present in pretty format. I don't think I've ever written raw JSON.

One key design decision is that the syntax is designed to be extensible, and it uses a reasonable minimum of distinct elements and characters of its own, while leaving the most available for others to define meanings to in their own superset grammar.  The analogy is that my MUON is to JSON what any superset could be to JavaScript the full language.  And likewise actually using a superset is not required.

Your example seems to basically be a CSV, and the logic to parse that would actually be considerably more complicated and guesswork to derive the meaning my version explicitly gives.

My version supports nested relations or other collections within relations to arbitrary depths, the nested plain set being an example.

I have distinct temporal type possreps for durations and instants, and both support arbitrary levels of precision or known/unknown elements, which is why the instant literal in mine had the format of name-value pairs.

I have no possrep for phone numbers and just happened to use a character string to represent one as an example, and the + is part of the standard international syntax.

The ordinary purpose of a serialisation format is to allow data to be transferred faithfully, and JSON does that well. A separate question is how to transfer the schema for that data.

Tackling the schema is the same question: TTM (and Codd's 12 Rules) require the schema be expressed (in the catalogue) as relation values. JSON can encode disparate data/multiple relations; just put the schema as a relation value in front of the content.

A slightly trickier area might be type definitions for scalars. But presuming type definitions use linear syntax, they can be expressed as relations. (I'm not saying that's an ergonomic or human-readable form, but those aren't desiderata merely for serialisation/interchange.)

JSON is absolutely standardised. It has strings, numbers, true/false, null, objects, arrays. Some libraries impose limits (such as floating point numbers instead of arbitrary precision) and some have extensions (comments and dates). If both ends are using the same schema, and assuming a convention for dates, then AFAIK all kinds of business data can be transferred. There are probably some corner cases, but I can't think of any right now. I'm using it right now for nested relational data and it works just fine.

CSV files are sufficiently well-defined to be used to transfer relational data of scalar types, again given that the schema already exists at both ends. YAML does it all and more, but can get pretty complicated. And of course there is always XML. Why on earth would you need to add another one?

Communicating a schema, either as part of the data or separately or by implication is a separate topic.

(I agree with Darren that CSV is more problematic: how to represent relation-valued and tuple-valued attributes, for example?)

Nothing looks like superseding SQL, despite al its faults and all the NoSQL buzz. There's far less wrong with JSON. I see no compelling case for superseding. The case for an alternative interchange format needs both ends of the interchange to be totally chuffed off with their current technology; and both to change over at the same time. And this across a layer of technology that ought to be invisible. There would have to be some other strong motive for the change.

Quote from AntC on April 10, 2021, 5:00 am

Huh? People are going to write your formats "by hand"? Won't the compiler or DBMS do that? One of the reasons JSON's limitations and verboseness don't really matter is because there's tools to write and read, and present in pretty format. I don't think I've ever written raw JSON.

Tackling the schema is the same question: TTM (and Codd's 12 Rules) require the schema be expressed (in the catalogue) as relation values. JSON can encode disparate data/multiple relations; just put the schema as a relation value in front of the content.

A slightly trickier area might be type definitions for scalars. But presuming type definitions use linear syntax, they can be expressed as relations. (I'm not saying that's an ergonomic or human-readable form, but those aren't desiderata merely for serialisation/interchange.)

(I agree with Darren that CSV is more problematic: how to represent relation-valued and tuple-valued attributes, for example?)

Nothing looks like superseding SQL, despite al its faults and all the NoSQL buzz. There's far less wrong with JSON. I see no compelling case for superseding. The case for an alternative interchange format needs both ends of the interchange to be totally chuffed off with their current technology; and both to change over at the same time. And this across a layer of technology that ought to be invisible. There would have to be some other strong motive for the change.

I need to clarify something important.

Muldis Object Notation (MUON) is NOT JUST an interchange format.  Rather that is one of multiple intended use cases.

It seems like a lot of the criticism I'm getting is based on the idea that serialization is the only thing it is for, and that's far from the case, I just said that was A usage.

MUON is ALSO the foundational syntax for a full programming language, Muldis Data Language (MDL), and like any typical programming language, it is intended to be commonly written by hand.

A key feature of MDL is that it is homoiconic, so all code is expressed in terms of value literals of appropriate data types that represent expressions or functions or whatever, and likewise you can generate or inspect or modify this code-as-data in programs.  A generalization of a D language system catalog.

So every MUON artifact has a dual identity as a value literal of some type, and as executable code.

There is much precedent for this already, and in fact one of the main inspirations for Muldis Data Language is the common practice of specifying SQL or other code in terms of an AST in some application language composed of trees of arrays and dictionaries and whatever.

So yes it is designed to be written by hand as much as one writes Java or whatever by hand.  But you can also generate it using code.

One intended use case is that an application has a MDL data structure representing the details of a database query or update or other remote procedure, this data structure might have been written by hand by the application's programmer or generated, then if the database is in a separate process elsewhere on the network, the data structure is likely represented as MUON while going over the wire, and the other end it becomes a MDL value again and is executed, the result then coming back in the same form.

MUON is also intended for configuration files, much as JSON/YAML/etc are, and those are often written by hand too.

 

Quote from AntC on April 10, 2021, 5:00 am
Quote from dandl on April 10, 2021, 12:27 am
Quote from Darren Duncan on April 9, 2021, 9:12 pm
Quote from dandl on April 9, 2021, 3:22 pm

Why not:

...

Much easier to read, dead easy to parse. Or check out YAML.

One of my key design goals is to reasonably minimize the number of core syntactic elements, while being strongly typed and get most of the actual complexity through composition, while also balancing a reasonable level of terseness so its easy enough for developers to write out artifacts by hand like source code, and keeping the parsing requirements reasonably simple.

Huh? People are going to write your formats "by hand"? Won't the compiler or DBMS do that? One of the reasons JSON's limitations and verboseness don't really matter is because there's tools to write and read, and present in pretty format. I don't think I've ever written raw JSON.

Which is what I was trying to get at. Is this about 'yet another JSON' or is there some other unspecified goal. You might a bit to try something out, but no self-respecting application or development environment is going to rely on hand-authoring data. [Unless the data is the code?]

One key design decision is that the syntax is designed to be extensible, and it uses a reasonable minimum of distinct elements and characters of its own, while leaving the most available for others to define meanings to in their own superset grammar.  The analogy is that my MUON is to JSON what any superset could be to JavaScript the full language.  And likewise actually using a superset is not required.

Your example seems to basically be a CSV, and the logic to parse that would actually be considerably more complicated and guesswork to derive the meaning my version explicitly gives.

My version supports nested relations or other collections within relations to arbitrary depths, the nested plain set being an example.

I have distinct temporal type possreps for durations and instants, and both support arbitrary levels of precision or known/unknown elements, which is why the instant literal in mine had the format of name-value pairs.

I have no possrep for phone numbers and just happened to use a character string to represent one as an example, and the + is part of the standard international syntax.

The ordinary purpose of a serialisation format is to allow data to be transferred faithfully, and JSON does that well. A separate question is how to transfer the schema for that data.

Tackling the schema is the same question: TTM (and Codd's 12 Rules) require the schema be expressed (in the catalogue) as relation values. JSON can encode disparate data/multiple relations; just put the schema as a relation value in front of the content.

Sorry, I was unclear. It's obvious the schema can be serialised and transferred as JSON, but now there has to be agreement of meaning. You can serialise the DDL from SQL Server and send it over to Oracle, but it will be treated as just a chunk of data, not something special like a schema.

A slightly trickier area might be type definitions for scalars. But presuming type definitions use linear syntax, they can be expressed as relations. (I'm not saying that's an ergonomic or human-readable form, but those aren't desiderata merely for serialisation/interchange.)

You lost me. There are less than 10 predefined primitive scalar types, everything else is composed from them. Why is this a problem?

JSON is absolutely standardised. It has strings, numbers, true/false, null, objects, arrays. Some libraries impose limits (such as floating point numbers instead of arbitrary precision) and some have extensions (comments and dates). If both ends are using the same schema, and assuming a convention for dates, then AFAIK all kinds of business data can be transferred. There are probably some corner cases, but I can't think of any right now. I'm using it right now for nested relational data and it works just fine.

CSV files are sufficiently well-defined to be used to transfer relational data of scalar types, again given that the schema already exists at both ends. YAML does it all and more, but can get pretty complicated. And of course there is always XML. Why on earth would you need to add another one?

Communicating a schema, either as part of the data or separately or by implication is a separate topic.

(I agree with Darren that CSV is more problematic: how to represent relation-valued and tuple-valued attributes, for example?)

Nothing looks like superseding SQL, despite al its faults and all the NoSQL buzz. There's far less wrong with JSON. I see no compelling case for superseding. The case for an alternative interchange format needs both ends of the interchange to be totally chuffed off with their current technology; and both to change over at the same time. And this across a layer of technology that ought to be invisible. There would have to be some other strong motive for the change.

As Duncan points out there are weaknesses in the JSON standard right now, but all perfectly fixable. I hope.

Andl - A New Database Language - andl.org