What is the purpose of SUMMARIZE PER?
Quote from johnwcowan on August 30, 2019, 12:58 pmI understand SUMMARIZE BY well enough; you are specifying attributes of a table to be summarized based on the unique values (taken jointly) of the remaining attributes, basically SQL summary expressions and GROUP BY (modulo nulls). If you have an employee table with department and salary, you can get total (say) salaries by department with
SELECT SUM(salary), department FROM employees GROUP BY department
. And I understand the mechanics of SUMMARIZE PER well enough from the description in the Big Book. But when do you want to use such an operator?
I understand SUMMARIZE BY well enough; you are specifying attributes of a table to be summarized based on the unique values (taken jointly) of the remaining attributes, basically SQL summary expressions and GROUP BY (modulo nulls). If you have an employee table with department and salary, you can get total (say) salaries by department with SELECT SUM(salary), department FROM employees GROUP BY department
. And I understand the mechanics of SUMMARIZE PER well enough from the description in the Big Book. But when do you want to use such an operator?
Quote from dandl on August 30, 2019, 2:03 pmI can't answer your question, but I can say I don't see much point in either. The compiler can easily infer what to do, given the use of any aggregating open expression. In Andl if I write:
SP .{ S#, P#, total := fold(+,QTY) } SP .{ S#, P#, total := sum(QTY) } // equivalent, if sum predefined employees .{ department, sum(salary) }Then the compiler has no problem figuring out what to do. Or am I missing something?
I can't answer your question, but I can say I don't see much point in either. The compiler can easily infer what to do, given the use of any aggregating open expression. In Andl if I write:
SP .{ S#, P#, total := fold(+,QTY) } SP .{ S#, P#, total := sum(QTY) } // equivalent, if sum predefined employees .{ department, sum(salary) }
Then the compiler has no problem figuring out what to do. Or am I missing something?
Quote from Dave Voorhis on August 30, 2019, 2:19 pmQuote from johnwcowan on August 30, 2019, 12:58 pmI understand SUMMARIZE BY well enough; you are specifying attributes of a table to be summarized based on the unique values (taken jointly) of the remaining attributes, basically SQL summary expressions and GROUP BY (modulo nulls). If you have an employee table with department and salary, you can get total (say) salaries by department with
SELECT SUM(salary), department FROM employees GROUP BY department
. And I understand the mechanics of SUMMARIZE PER well enough from the description in the Big Book. But when do you want to use such an operator?As shown below, use of PER allows you to summarise over values not found in the relation being summarised. Conversely, use of BY only summarises over values found in the relation being summarised.
SUMMARIZE SP BY {S#}: {SQTY := SUM(THE_QTY(QTY))}
S#
S#SQTY
INTEGERS#("S1") 1300 S#("S2") 700 S#("S3") 200 S#("S4") 900
SUMMARIZE SP PER (S {S#}): {SQTY := SUM(THE_QTY(QTY))}
S#
S#SQTY
INTEGERS#("S1") 1300 S#("S2") 700 S#("S3") 200 S#("S4") 900 S#("S5") 0
Quote from johnwcowan on August 30, 2019, 12:58 pmI understand SUMMARIZE BY well enough; you are specifying attributes of a table to be summarized based on the unique values (taken jointly) of the remaining attributes, basically SQL summary expressions and GROUP BY (modulo nulls). If you have an employee table with department and salary, you can get total (say) salaries by department with
SELECT SUM(salary), department FROM employees GROUP BY department
. And I understand the mechanics of SUMMARIZE PER well enough from the description in the Big Book. But when do you want to use such an operator?
As shown below, use of PER allows you to summarise over values not found in the relation being summarised. Conversely, use of BY only summarises over values found in the relation being summarised.
S# S# |
SQTY INTEGER |
---|---|
S#("S1") | 1300 |
S#("S2") | 700 |
S#("S3") | 200 |
S#("S4") | 900 |
SUMMARIZE SP PER (S {S#}): {SQTY := SUM(THE_QTY(QTY))}
S# S# |
SQTY INTEGER |
---|---|
S#("S1") | 1300 |
S#("S2") | 700 |
S#("S3") | 200 |
S#("S4") | 900 |
S#("S5") | 0 |
Quote from johnwcowan on August 30, 2019, 3:19 pmThanks, makes sense now. So it can't be reduced to a join, unfortunately, because a join will eliminate S5.
Thanks, makes sense now. So it can't be reduced to a join, unfortunately, because a join will eliminate S5.
Quote from dandl on September 1, 2019, 12:30 amCan't one easily be written in terms of the other? Is having two such similar operations really justified?
I know it can be done with image relations.
EXTEND S : { TOTQ := SUM ( ‼SP , QTY ) }
Can't one easily be written in terms of the other? Is having two such similar operations really justified?
I know it can be done with image relations.
EXTEND S : { TOTQ := SUM ( ‼SP , QTY ) }
Quote from dandl on September 1, 2019, 7:11 amAnd this is how it looks in Andl.
S .select{ S#, sqty := ( {{*}} join SP) .select{ fold(+,QTY) } }The term {{*}} is the current tuple expressed as a relation, so that it can take part in a join, and thus create an RVA, which can then be summed. It's like image relation, but more general. Then the final result (the sum) is 'lifted' out of its containing tuple as the result of the expression. I suspect similar code can be written in TD using GROUP and an attribute extractor.
As a side-note, this is a case where a full join is not needed and a simpler algorithm would suffice. If this were expressed in a suitable algebra I would expect it to be possible to make that transformation (discarding unneeded attributes). That's where I expect a formal treatment to pay for its keep.
And this is how it looks in Andl.
S .select{ S#, sqty := ( {{*}} join SP) .select{ fold(+,QTY) } }
The term {{*}} is the current tuple expressed as a relation, so that it can take part in a join, and thus create an RVA, which can then be summed. It's like image relation, but more general. Then the final result (the sum) is 'lifted' out of its containing tuple as the result of the expression. I suspect similar code can be written in TD using GROUP and an attribute extractor.
As a side-note, this is a case where a full join is not needed and a simpler algorithm would suffice. If this were expressed in a suitable algebra I would expect it to be possible to make that transformation (discarding unneeded attributes). That's where I expect a formal treatment to pay for its keep.
Quote from Dave Voorhis on September 1, 2019, 8:02 amQuote from dandl on September 1, 2019, 12:30 amCan't one easily be written in terms of the other? Is having two such similar operations really justified?
I know it can be done with image relations.
EXTEND S : { TOTQ := SUM ( SP , QTY ) }I vaguely recall past discussion where Hugh, perhaps acting in proxy for CJ Date, proposed dropping SUMMARIZE from Tutorial D in favour of image relations. I resisted. I appreciate the brevity of image relations, but I find SUMMARIZE more intuitive. It's true that the BY form can be written using PER -- SUMMARIZE r PER(r {A1, A2, ..., An}) ... is SUMMARIZE r BY {A1, A2, ..., An} ... -- but having both is, again, more intuitive.
Quote from dandl on September 1, 2019, 12:30 amCan't one easily be written in terms of the other? Is having two such similar operations really justified?
I know it can be done with image relations.
EXTEND S : { TOTQ := SUM ( SP , QTY ) }
I vaguely recall past discussion where Hugh, perhaps acting in proxy for CJ Date, proposed dropping SUMMARIZE from Tutorial D in favour of image relations. I resisted. I appreciate the brevity of image relations, but I find SUMMARIZE more intuitive. It's true that the BY form can be written using PER -- SUMMARIZE r PER(r {A1, A2, ..., An}) ... is SUMMARIZE r BY {A1, A2, ..., An} ... -- but having both is, again, more intuitive.
Quote from dandl on September 1, 2019, 1:43 pmJust for my curiosity, is it possible to write both BY and PER in Rel without using image relations? If so, how?
Just for my curiosity, is it possible to write both BY and PER in Rel without using image relations? If so, how?
Quote from Dave Voorhis on September 1, 2019, 1:54 pmQuote from dandl on September 1, 2019, 1:43 pmJust for my curiosity, is it possible to write both BY and PER in Rel without using image relations? If so, how?
Yes, it's documented in DTATRM, page 398 of the usual PDF. I've shown above how BY is shorthand for PER. In general, the following:
SUMMARIZE r1 PER (r2): {Z := agg(exp)}Is equivalent to:
(EXTEND r2: { Y := r1 JOIN REL {TUP {A A, B B, ..., C C}}, Z := agg(EXTEND Y: {X := exp} {X}, X) }) {ALL BUT Y}
Quote from dandl on September 1, 2019, 1:43 pmJust for my curiosity, is it possible to write both BY and PER in Rel without using image relations? If so, how?
Yes, it's documented in DTATRM, page 398 of the usual PDF. I've shown above how BY is shorthand for PER. In general, the following:
SUMMARIZE r1 PER (r2): {Z := agg(exp)}
Is equivalent to:
(EXTEND r2: { Y := r1 JOIN REL {TUP {A A, B B, ..., C C}}, Z := agg(EXTEND Y: {X := exp} {X}, X) }) {ALL BUT Y}
Quote from Dave Voorhis on September 1, 2019, 2:10 pmQuote from Dave Voorhis on September 1, 2019, 8:02 amI vaguely recall past discussion where Hugh, perhaps acting in proxy for CJ Date, proposed dropping SUMMARIZE from Tutorial D in favour of image relations.
Maybe I've misremembered, or maybe this is in addition, but Chapter 14 of DBE (page 259 of the usual PDF) argues in favour of image relations and against SUMMARIZE.
It also includes a better definition of SUMMARIZE on page 152. This:
SUMMARIZE r PER ( p ) : { X := SUM ( XX ) }Is this:
EXTEND ( p ) : { X := SUM ( ( ( r ) MATCHING RELATION { TUPLE { B1 B1 , B2 B2 , ... , Bm Bm } } ) { ALL BUT B1 , B2 , ... , Bm } , XX }
Quote from Dave Voorhis on September 1, 2019, 8:02 amI vaguely recall past discussion where Hugh, perhaps acting in proxy for CJ Date, proposed dropping SUMMARIZE from Tutorial D in favour of image relations.
Maybe I've misremembered, or maybe this is in addition, but Chapter 14 of DBE (page 259 of the usual PDF) argues in favour of image relations and against SUMMARIZE.
It also includes a better definition of SUMMARIZE on page 152. This:
SUMMARIZE r PER ( p ) : { X := SUM ( XX ) }
Is this:
EXTEND ( p ) : { X := SUM ( ( ( r ) MATCHING RELATION { TUPLE { B1 B1 , B2 B2 , ... , Bm Bm } } ) { ALL BUT B1 , B2 , ... , Bm } , XX }