The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

TD data in JSON

Page 1 of 2Next

Is there a standard mapping of TD-compliant RM data onto JSON?

The approach that first comes to my mind is to represent a relation as a JSON object with two names, "heading" and "data", where the value of the first is another JSON object mapping attribute names into attribute type names, and the value of the second is a JSON array of JSON objects that map attribute names into attribute values.  A tuple is represented similarly, except that the value of "data" is a single object rather than an array of objects. A third name "type" could be added to make the distinction between relations and tuples more explicit.

As for TD integers, rationals, strings, and booleans become JSON numbers, numbers, strings, and booleans respectively.  TD integers and rationals too large or too precise to be represented as JSON numbers (which are 64-bit floats) can be transferred as strings, since the type of every datum is known and JSON's manifestly typed values are just a convenience.  Custom scalar values are JSON objects with two names, "name" for the string name of the scalar type, and "possrep" for an object that maps from named selector arguments to their values.  The value of "name" must be recognized by the client.  Semi-standardized types like timestamps may be represented by strings like "2019-06-24T00:47:16.107122953Z or equivalently as:

{"name": "timestamp", "possrep: {"seconds": 1561351636, "nanoseconds": 107122953}}

IMAO, SQL's DATE, TIME, and TIMESTAMP types with or without timezones are all horrible: the only time-related type in a database should be a Posix timestamp (the number of seconds since the Posix Epoch, 1970-01-01T00:00:00" proleptic UTC time, but excluding all leap seconds) and of nanoseconds relative to the second.  If you really care, replace "seconds" with "tai" and give the number of seconds since the Epoch including leap seconds.

Of course, JSON null is not used for anything!

Quote from johnwcowan on June 24, 2019, 2:43 pm

Is there a standard mapping of TD-compliant RM data onto JSON?

No. What would it be used for?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on June 24, 2019, 2:45 pm
Quote from johnwcowan on June 24, 2019, 2:43 pm

Is there a standard mapping of TD-compliant RM data onto JSON?

No. What would it be used for?

Moving TD-compliant data between environments such as between a database and a backup file, two different databases on the same engine or on different engines (especially if using incompatible storage conventions), or between clients and servers (per the other thread), or from a TD-compliant database to a SQL database (within SQL's limited capabilities) or vice versa (provided the SQL data conforms to the RM).

Quote from johnwcowan on June 24, 2019, 2:49 pm
Quote from Dave Voorhis on June 24, 2019, 2:45 pm
Quote from johnwcowan on June 24, 2019, 2:43 pm

Is there a standard mapping of TD-compliant RM data onto JSON?

No. What would it be used for?

Moving TD-compliant data between environments such as between a database and a backup file, two different databases on the same engine or on different engines (especially if using incompatible storage conventions), or between clients and servers (per the other thread), or from a TD-compliant database to a SQL database (within SQL's limited capabilities) or vice versa (provided the SQL data conforms to the RM).

Rel database backups are simply Tutorial D scripts that re-create the original database. Likewise for copying data between databases -- simply emit the data as a Tutorial D script and run it as a Tutorial D script.

The Rel front-end (aka DBrowser) supports reading SQL tables (and writing too, soon) from any JDBC-compliant SQL DBMS, and imports/exports CSV and Excel .xls/.xlsx files.

I have no objection to supporting JSON in principle -- it sounds technically straightforward and an afternoon or two of work, maybe -- but I'm having a hard time seeing why. It doesn't seem much like where JSON gets used all the time -- sending/receiving messages to/from the front-end/middleware layers in a Web application via a RESTful interface.

So I'm still not sure where JSON would fit in.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on June 24, 2019, 3:24 pm

I have no objection to supporting JSON in principle -- it sounds technically straightforward and an afternoon or two of work, maybe -- but I'm having a hard time seeing why.

It is the only publicly documented, widely implemented format I know of that is easily capable of storing TD data in its full generality, as I sketched out above.

It doesn't seem much like where JSON gets used all the time -- sending/receiving messages to/from the front-end/middleware layers in a Web application via a RESTful interface.

That's certainly one application, though not all Web uses are properly RESTful; often they are just SOAP-style endpoints that happen to speak JSON instead of XML.  But googling for json purpose -network -web -services pulls up many others.  JSON config files are all over the place nowadays.  Data scientists use files full of JSON values to communicate with each other, because their software often needs something with more structure than CSV can handle nowadays (it's not all flat arrays any more); indeed, Pandas now supports it natively.  It will be a cold day in Hell before WordPress has a plugin that allows WP sites to pull data from TD databases, but a JSON plugin is available and it doesn't care where the JSON comes from.  A file containing GDPR-vetted vendors is published as JSON.  One of my other interests at the moment is timezones, and there is public data showing the polygons (sometimes with holes) on the Earth's surface that correspond to each of the 400+ historical timezones; this data is in JSON.  And the list goes on, da-da da-da dah....

Everyone, I designed Muldis Object Notation (MUON) expressly for the purpose of mapping TTM/TD-compliant RM data between different systems without loss of accuracy.

See https://github.com/muldis/Muldis_Object_Notation for a completed (but subject to change) specification plus in-progress generators and parsers in multiple languages.

It has syntax specific to Tuples and Relations (by those names) among a couple dozen others.

I plan for the first fully working parser/generator library written in Java 8+ first, also released on Maven, within the next month or three.

Quote from johnwcowan on June 24, 2019, 5:06 pm
Quote from Dave Voorhis on June 24, 2019, 3:24 pm

I have no objection to supporting JSON in principle -- it sounds technically straightforward and an afternoon or two of work, maybe -- but I'm having a hard time seeing why.

It is the only publicly documented, widely implemented format I know of that is easily capable of storing TD data in its full generality, as I sketched out above.

It doesn't seem much like where JSON gets used all the time -- sending/receiving messages to/from the front-end/middleware layers in a Web application via a RESTful interface.

That's certainly one application, though not all Web uses are properly RESTful; often they are just SOAP-style endpoints that happen to speak JSON instead of XML.  But googling for json purpose -network -web -services pulls up many others.  JSON config files are all over the place nowadays.  Data scientists use files full of JSON values to communicate with each other, because their software often needs something with more structure than CSV can handle nowadays (it's not all flat arrays any more); indeed, Pandas now supports it natively.  It will be a cold day in Hell before WordPress has a plugin that allows WP sites to pull data from TD databases, but a JSON plugin is available and it doesn't care where the JSON comes from.  A file containing GDPR-vetted vendors is published as JSON.  One of my other interests at the moment is timezones, and there is public data showing the polygons (sometimes with holes) on the Earth's surface that correspond to each of the 400+ historical timezones; this data is in JSON.  And the list goes on, da-da da-da dah....

I'm not clear what "storing TD data in its full generality" means. Do you mean translating Tutorial D syntax for a TUPLE or RELATION into some equivalent JSON syntax?

I should think XML would be as general as JSON, no?

It sounds like being able to import text files in JSON format into relvars, or being able to link "live" JSON sources to relvars, might be a useful add-on to Rel. As long as the JSON sources can be reasonably mapped to a tabular row/column structure, they can be visible in Rel as relvars.

Though it's debatable how much use this would be for the typical Rel user -- usually a student or researcher -- who typically is uses it for classwork or research related to the relational model itself rather than data-crunching, but there are a few of us who use it for data crunching, so maybe worthwhile providing JSON-to-relvar mappings and JSON-export facilities.

For exporting data, I presume following some existing standards would make more sense than creating new ones?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on June 24, 2019, 6:46 pm

 

I'm not clear what "storing TD data in its full generality" means. Do you mean translating Tutorial D syntax for a TUPLE or RELATION into some equivalent JSON syntax?

Yes.

I should think XML would be as general as JSON, no?

Yes, but perhaps a little too general.  Plain Old JSON, provided it was rectangular ("mappable to a tabular ... structure") and didn't require any scalar types beyond the Big Four, could be Borged fairly directly into an existing relvar. With XML you'd have more configuration to do and less chance that some random XML file could be imported.

Though it's debatable how much use this would be for the typical Rel user -- usually a student or researcher -- who typically is uses it for classwork or research related to the relational model itself rather than data-crunching, but there are a few of us who use it for data crunching, so maybe worthwhile providing JSON-to-relvar mappings and JSON-export facilities.

Exactly.

For exporting data, I presume following some existing standards would make more sense than creating new ones?

I think so.  Both explicitly typed (per above) and implicitly (manifestly) typed versions should perhaps be available to export, as the latter will be more easily processed by other facilities.

Quote from johnwcowan on June 24, 2019, 2:43 pm

Is there a standard mapping of TD-compliant RM data onto JSON?

I don't think so. I did one for Andl.

The approach that first comes to my mind is to represent a relation as a JSON object with two names, "heading" and "data", where the value of the first is another JSON object mapping attribute names into attribute type names, and the value of the second is a JSON array of JSON objects that map attribute names into attribute values.  A tuple is represented similarly, except that the value of "data" is a single object rather than an array of objects. A third name "type" could be added to make the distinction between relations and tuples more explicit.

No, this is not the right way. JSON is not a full encoding of data and schema, it's just the data. A tuple is an object, a POSSREP is an object, a relation is an array of objects. No need to get cute.

As for TD integers, rationals, strings, and booleans become JSON numbers, numbers, strings, and booleans respectively.  TD integers and rationals too large or too precise to be represented as JSON numbers (which are 64-bit floats) can be transferred as strings, since the type of every datum is known and JSON's manifestly typed values are just a convenience.

No, JSON does not define a storage layout. JSON numbers are integers, fixed point or floating point values with any number of digits permitted. See json.org.

Custom scalar values are JSON objects with two names, "name" for the string name of the scalar type, and "possrep" for an object that maps from named selector arguments to their values.  The value of "name" must be recognized by the client.  Semi-standardized types like timestamps may be represented by strings like "2019-06-24T00:47:16.107122953Z or equivalently as:

{"name": "timestamp", "possrep: {"seconds": 1561351636, "nanoseconds": 107122953}}

No, you're just complicating things. For date/times and any other microformats the value has to be in a JSON simple type (string or number) and the two parties have to agree on how to format and parse. ISO is good for dates, so the value is just "2019-06-24T00:47:16.107122953Z".

JSON is just a serialisation format. It's up to the user to ensure that the two parties are singing to the same schema.

Andl - A New Database Language - andl.org
Quote from dandl on June 24, 2019, 10:22 pm

No, this is not the right way. JSON is not a full encoding of data and schema, it's just the data. A tuple is an object, a POSSREP is an object, a relation is an array of objects. No need to get cute.

A relation containing a relation or tuple has to have some representation of it.  What representation do you use?

No, JSON does not define a storage layout. JSON numbers are integers, fixed point or floating point values with any number of digits permitted. See json.org.

That is not a standards document.  There are two standards-track documents describing JSON, ECMA-404 and RFC 8259.  The first agrees with you: "JSON is agnostic about the semantics of numbers".  The second, however, says in section 6:

This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these
provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available.

This is rather carefully worded not to actually forbid what ECMA-404 allows.  But since interoperability is what JSON is all about, restricting numbers to the range and precision of binary64 floats is a Good Thing.

No, you're just complicating things. For date/times and any other microformats the value has to be in a JSON simple type (string or number) and the two parties have to agree on how to format and parse. ISO is good for dates, so the value is just "2019-06-24T00:47:16.107122953Z".

Two things I should like to know:

  1. What is the evidence for this "has to be"?
  2. Why do some people consider it civil behavior on this forum to contradict other posters in this peremptory fashion, rather than simply giving their own opinions, with or without supporting arguments?

 

Page 1 of 2Next