The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Aggregation in TD and Rel

According to the TD spec I have, there are 5 aggregation functions for numbers: COUNT, SUM, AVG, MAX, MIN. Questions:

  • Is that it? Does Rel have more? Should there be?
  • Is it possible to aggregate user-defined types (such as DATE or POINT)?

 

Andl - A New Database Language - andl.org

It needs to be a commutative and associative operator, since the operands are unordered.  Looking at the list of reducible APL operators, you could add product for numbers and conjunction and disjunction for booleans.

In addition, here are some possibly useful derived functions: sum of squares, harmonic sum, geometric mean, harmonic mean, population and sample variance, population and sample standard deviation, and range (max - min).  The median and mode are also useful, but more-than-linear to compute.  Skewness and kurtosis are more remote possibilities.
Quote from dandl on March 28, 2020, 5:44 am

According to the TD spec I have, there are 5 aggregation functions for numbers: COUNT, SUM, AVG, MAX, MIN. Questions:

  • Is that it? Does Rel have more? Should there be?
  • Is it possible to aggregate user-defined types (such as DATE or POINT)?

 

It provides the ones defined in Tutorial D and no more.  These:

sys.OperatorsBuiltin WHERE STARTS_WITH(Name, "AGGREGATE")
Name
CHARACTER
Signature
CHARACTER
ReturnsType
CHARACTER
Definition
CHARACTER
AGGREGATE_AND AGGREGATE_AND(ARRAY TUPLE {}) BOOLEAN // Logical AND of r
AGGREGATE_AND(r ARRAY OF TUPLE {AGGREGAND BOOLEAN, AGGREGATION_SERIAL INT}) RETURNS BOOLEAN
AGGREGATE_SUM_RATIONAL AGGREGATE_SUM_RATIONAL(ARRAY TUPLE {}) RATIONAL // RATIONAL sum of r
AGGREGATE_SUM_RATIONAL(r ARRAY OF TUPLE {AGGREGAND RATIONAL, AGGREGATION_SERIAL INT}) RETURNS RATIONAL
AGGREGATE_OR AGGREGATE_OR(ARRAY TUPLE {}) BOOLEAN // Logical OR of r
AGGREGATE_OR(r ARRAY OF TUPLE {AGGREGAND BOOLEAN, AGGREGATION_SERIAL INT}) RETURNS BOOLEAN
AGGREGATE_UNION AGGREGATE_UNION(ARRAY TUPLE {}) RELATION {} // UNION of r
AGGREGATE_UNION(r ARRAY OF TUPLE {AGGREGAND RELATION {*}, AGGREGATION_SERIAL INT}) RETURNS RELATION {*}
AGGREGATE_D_UNION AGGREGATE_D_UNION(ARRAY TUPLE {}) RELATION {} // disjoint-UNION of r
AGGREGATE_DUNION(r ARRAY OF TUPLE {AGGREGAND RELATION {*}, AGGREGATION_SERIAL INT}) RETURNS RELATION {*}
AGGREGATE_EQUIV AGGREGATE_EQUIV(ARRAY TUPLE {}) BOOLEAN // Logical EQUIV (aka '=') of r
AGGREGATE_EQUIV(r ARRAY OF TUPLE {AGGREGAND BOOLEAN, AGGREGATION_SERIAL INT}) RETURNS BOOLEAN
AGGREGATE_MAX AGGREGATE_MAX(ARRAY TUPLE {}) RATIONAL // Maximum of r
AGGREGATE_MAX(r ARRAY OF TUPLE {AGGREGAND ALPHA, AGGREGATION_SERIAL INT}) RETURNS ALPHA
AGGREGATE_MIN AGGREGATE_MIN(ARRAY TUPLE {}) RATIONAL // Minimum of r
AGGREGATE_MIN(r ARRAY OF TUPLE {AGGREGAND ALPHA, AGGREGATION_SERIAL INT}) RETURNS ALPHA
AGGREGATE_XUNION AGGREGATE_XUNION(ARRAY TUPLE {}) RELATION {} // exclusive-UNION of r
AGGREGATE_XUNION(r ARRAY OF TUPLE {AGGREGAND RELATION {*}, AGGREGATION_SERIAL INT}) RETURNS RELATION {*}
AGGREGATE_INTERSECT AGGREGATE_INTERSECT(ARRAY TUPLE {}) RELATION {} // INTERSECT of r
AGGREGATE_INTERSECT(r ARRAY OF TUPLE {AGGREGAND RELATION {*}, AGGREGATION_SERIAL INT}) RETURNS RELATION {*}
AGGREGATE_AVG_INTEGER AGGREGATE_AVG_INTEGER(ARRAY TUPLE {}) RATIONAL // Arithmetic mean of r
AGGREGATE_AVG_INTEGER(r ARRAY OF TUPLE {AGGREGAND INT, AGGREGATION_SERIAL INT}) RETURNS RATIONAL
AGGREGATE_AVG_RATIONAL AGGREGATE_AVG_RATIONAL(ARRAY TUPLE {}) RATIONAL // Arithmetic mean of r
AGGREGATE_AVG_INTEGER(r ARRAY OF TUPLE {AGGREGAND RATIONAL, AGGREGATION_SERIAL INT}) RETURNS RATIONAL
AGGREGATE_SUM_INTEGER AGGREGATE_SUM_INTEGER(ARRAY TUPLE {}) INTEGER // INTEGER sum of r
AGGREGATE_SUM_INTEGER(r ARRAY OF TUPLE {AGGREGAND INT, AGGREGATION_SERIAL INT}) RETURNS INTEGER
AGGREGATE_XOR AGGREGATE_XOR(ARRAY TUPLE {}) BOOLEAN // Logical exclusive-OR of r
AGGREGATE_XOR(r ARRAY OF TUPLE {AGGREGAND BOOLEAN, AGGREGATION_SERIAL INT}) RETURNS BOOLEAN

It is possible to aggregate user-defined types, but that may involve some work to define what the aggregates mean, by either aggregating on some POSSREP component and/or providing -- where necessary -- the additional operators needed to support aggregating them. See https://reldb.org/c/wp-content/uploads/2016/06/User-Defined-Aggregate-Operators-in-Tutorial-D-and-Rel.pdf

I note in passing that the operators I use most often, in both Rel and SQL, are COUNT, MAX and MIN. Others barely get a look in.

 

 

 

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on March 28, 2020, 8:51 am
Quote from dandl on March 28, 2020, 5:44 am

According to the TD spec I have, there are 5 aggregation functions for numbers: COUNT, SUM, AVG, MAX, MIN. Questions:

  • Is that it? Does Rel have more? Should there be?
  • Is it possible to aggregate user-defined types (such as DATE or POINT)?

It provides the ones defined in Tutorial D and no more.  These:

Thanks Dave. I thought that was so. I'm doing aggregation for my Knime nodes, and just thought I'd check.

Andl has generalised aggregation with a fold() function, but that's beyond what I'm trying to do here.

It is possible to aggregate user-defined types, but that may involve some work to define what the aggregates mean, by either aggregating on some POSSREP component and/or providing -- where necessary -- the additional operators needed to support aggregating them. See https://reldb.org/c/wp-content/uploads/2016/06/User-Defined-Aggregate-Operators-in-Tutorial-D-and-Rel.pdf

I recall this. It's a really interesting approach, not one I've seen elsewhere. I'm tempted to use it here.

MAX and MIN for dates is the main one I think.

I note in passing that the operators I use most often, in both Rel and SQL, are COUNT, MAX and MIN. Others barely get a look in.

I use SUM a fair bit if there are money values in the table. Seems kind of reasonable.

Andl - A New Database Language - andl.org
Quote from dandl on March 28, 2020, 12:41 pm
Quote from Dave Voorhis on March 28, 2020, 8:51 am
Quote from dandl on March 28, 2020, 5:44 am

According to the TD spec I have, there are 5 aggregation functions for numbers: COUNT, SUM, AVG, MAX, MIN. Questions:

  • Is that it? Does Rel have more? Should there be?
  • Is it possible to aggregate user-defined types (such as DATE or POINT)?

It provides the ones defined in Tutorial D and no more.  These:

Thanks Dave. I thought that was so. I'm doing aggregation for my Knime nodes, and just thought I'd check.

Doesn't KNIME provide that sort of thing already?

In other words, can't the output of a relational pipeline be passed to whatever KNIME already provides for statistical and aggregate operations?

Andl has generalised aggregation with a fold() function, but that's beyond what I'm trying to do here.

It is possible to aggregate user-defined types, but that may involve some work to define what the aggregates mean, by either aggregating on some POSSREP component and/or providing -- where necessary -- the additional operators needed to support aggregating them. See https://reldb.org/c/wp-content/uploads/2016/06/User-Defined-Aggregate-Operators-in-Tutorial-D-and-Rel.pdf

I recall this. It's a really interesting approach, not one I've seen elsewhere. I'm tempted to use it here.

MAX and MIN for dates is the main one I think.

I note in passing that the operators I use most often, in both Rel and SQL, are COUNT, MAX and MIN. Others barely get a look in.

I use SUM a fair bit if there are money values in the table. Seems kind of reasonable.

Yes. If I had to rank them, COUNT, MAX and MIN get used constantly. SUM gets used sometimes. The rest, almost never.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Thanks Dave. I thought that was so. I'm doing aggregation for my Knime nodes, and just thought I'd check.

Doesn't KNIME provide that sort of thing already?

In other words, can't the output of a relational pipeline be passed to whatever KNIME already provides for statistical and aggregate operations?

Knime overlaps quite a bit in various areas. My theme is to provide a consistent set of node operators that more or less cover the same territory as SQL, but from an RA perspective so no NULLs and no duplicate rows. They could be used as an education tool, or to replace existing SQL when the input is not an RDBMS. Hence a minimal set of aggregation functions, and not really any motivation to add more.

Andl - A New Database Language - andl.org