The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

What is the purpose of SUMMARIZE PER?

12

I understand SUMMARIZE BY well enough; you are specifying attributes of a table to be summarized based on the unique values (taken jointly) of the remaining attributes, basically SQL summary expressions and GROUP BY (modulo nulls).  If you have an employee table with department and salary, you can get total (say) salaries by department with SELECT SUM(salary), department FROM employees GROUP BY department.  And I understand the mechanics of SUMMARIZE PER well enough from the description in the Big Book.  But when do you want to use such an operator?

I can't answer your question, but I can say I don't see much point in either. The compiler can easily infer what to do, given the use of any aggregating open expression. In Andl if I write:

SP .{ S#, P#, total := fold(+,QTY) }
SP .{ S#, P#, total := sum(QTY) }  // equivalent, if sum predefined
employees .{ department, sum(salary) }

Then the compiler has no problem figuring out what to do. Or am I missing something?

Andl - A New Database Language - andl.org
Quote from johnwcowan on August 30, 2019, 12:58 pm

I understand SUMMARIZE BY well enough; you are specifying attributes of a table to be summarized based on the unique values (taken jointly) of the remaining attributes, basically SQL summary expressions and GROUP BY (modulo nulls).  If you have an employee table with department and salary, you can get total (say) salaries by department with SELECT SUM(salary), department FROM employees GROUP BY department.  And I understand the mechanics of SUMMARIZE PER well enough from the description in the Big Book.  But when do you want to use such an operator?

As shown below, use of PER allows you to summarise over values not found in the relation being summarised. Conversely, use of BY only summarises over values found in the relation being summarised.

SUMMARIZE SP BY {S#}: {SQTY := SUM(THE_QTY(QTY))}
S#
S#
SQTY
INTEGER
S#("S1") 1300
S#("S2") 700
S#("S3") 200
S#("S4") 900

SUMMARIZE SP PER (S {S#}): {SQTY := SUM(THE_QTY(QTY))}
S#
S#
SQTY
INTEGER
S#("S1") 1300
S#("S2") 700
S#("S3") 200
S#("S4") 900
S#("S5") 0
I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Thanks, makes sense now.  So it can't be reduced to a join, unfortunately, because a join will eliminate S5.

Can't one easily be written in terms of the other? Is having two such similar operations really justified?

I know it can be done with image relations.

EXTEND S : { TOTQ := SUM ( ‼SP , QTY ) }
Andl - A New Database Language - andl.org

And this is how it looks in Andl.

S .select{ S#, sqty := ( {{*}} join SP) .select{ fold(+,QTY) } }

The term {{*}} is the current tuple expressed as a relation, so that it can take part in a join, and thus create an RVA, which can then be summed. It's like image relation, but more general. Then the final result (the sum) is 'lifted' out of its containing tuple as the result of the expression. I suspect similar code can be written in TD using GROUP and an attribute extractor.

As a side-note, this is a case where a full join is not needed and a simpler algorithm would suffice. If this were expressed in a suitable algebra I would expect it to be possible to make that transformation (discarding unneeded attributes). That's where I expect a formal treatment to pay for its keep.

Andl - A New Database Language - andl.org
Quote from dandl on September 1, 2019, 12:30 am

Can't one easily be written in terms of the other? Is having two such similar operations really justified?

I know it can be done with image relations.

EXTEND S : { TOTQ := SUM ( ‼SP , QTY ) }

I vaguely recall past discussion where Hugh, perhaps acting in proxy for CJ Date, proposed dropping SUMMARIZE from Tutorial D in favour of image relations. I resisted. I appreciate the brevity of image relations, but I find SUMMARIZE more intuitive. It's true that the BY form can be written using PER -- SUMMARIZE r PER(r {A1, A2, ..., An}) ... is SUMMARIZE BY {A1, A2, ..., An} ... -- but having both is, again, more intuitive.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Just for my curiosity, is it possible to write both BY and PER in Rel without using image relations? If so, how?

Andl - A New Database Language - andl.org
Quote from dandl on September 1, 2019, 1:43 pm

Just for my curiosity, is it possible to write both BY and PER in Rel without using image relations? If so, how?

Yes, it's documented in DTATRM, page 398 of the usual PDF. I've shown above how BY is shorthand for PER. In general, the following:

SUMMARIZE r1 PER (r2): {Z := agg(exp)}

Is equivalent to:

(EXTEND r2: {
  Y := r1 JOIN REL {TUP {A A, B B, ..., C C}}, 
  Z := agg(EXTEND Y: {X := exp} {X}, X)
}) {ALL BUT Y}

 

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on September 1, 2019, 8:02 am

I vaguely recall past discussion where Hugh, perhaps acting in proxy for CJ Date, proposed dropping SUMMARIZE from Tutorial D in favour of image relations.

Maybe I've misremembered, or maybe this is in addition, but Chapter 14 of DBE (page 259 of the usual PDF) argues in favour of image relations and against SUMMARIZE.

It also includes a better definition of SUMMARIZE on page 152. This:

SUMMARIZE r PER ( p ) : { X := SUM ( XX ) }

Is this:

EXTEND ( p ) : { X := SUM ( ( ( r ) MATCHING
  RELATION { TUPLE { B1 B1 , B2 B2 , ... , Bm Bm } } )
    { ALL BUT B1 , B2 , ... , Bm } , XX }

 

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
12