The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

SUMMARIZE PER, OUTER JOIN and image relations

Some time back we had some discussion around this topic. The question is how best to deal with queries where particular tuples are 'projected out' or 'joined out', but we want those lost tuples to contribute to the final result. SQL allows preserving the rows but filling in with NULLs; we don't allow that.

What I recall was broadly two main strategies. I had a feeling there were more, but I can't find them.

  1. Bundle up attributes that might go missing into an RVA, so missing data can be represented as an empty relation.
  2. Back fill the final query with the missing rows and pre-computed data.

Strategy 1 covers image relations. It also covers SUMMARIZE PER, which generates a temporary RVA behind the scenes. You can see this directly in the expansion offered by DBE. This one suffers from the difficulty that the aggregating function is required to return an acceptable value when passed an empty relation (what's the AVERAGE value of an empty argument list?). [SQL has a slightly different problem: the aggregating function may have to deal with NULLs.]

Strategy 2 is awkward to code, but gets the right result and is immune to the empty relation problem. I've seen variants of this used in SQL.

But are there others? I thought there were, but memory fails me.

As a side note, App-A glosses over the issue of SUMMARIZE, but again whatever the relcon looks like, it must have an RVA as per Strategy 1.

Andl - A New Database Language - andl.org
Quote from dandl on February 15, 2020, 10:34 am

Some time back we had some discussion around this topic. The question is how best to deal with queries where particular tuples are 'projected out' or 'joined out', but we want those lost tuples to contribute to the final result. SQL allows preserving the rows but filling in with NULLs; we don't allow that.

What I recall was broadly two main strategies. I had a feeling there were more, but I can't find them.

  1. Bundle up attributes that might go missing into an RVA, so missing data can be represented as an empty relation.
  2. Back fill the final query with the missing rows and pre-computed data.

Strategy 1 covers image relations. It also covers SUMMARIZE PER, which generates a temporary RVA behind the scenes. You can see this directly in the expansion offered by DBE. This one suffers from the difficulty that the aggregating function is required to return an acceptable value when passed an empty relation (what's the AVERAGE value of an empty argument list?). [SQL has a slightly different problem: the aggregating function may have to deal with NULLs.]

Strategy 2 is awkward to code, but gets the right result and is immune to the empty relation problem. I've seen variants of this used in SQL.

But are there others? I thought there were, but memory fails me.

As a side note, App-A glosses over the issue of SUMMARIZE, but again whatever the relcon looks like, it must have an RVA as per Strategy 1.

Re strategy 1, I don't really understand "bundle up attributes that might go missing" but in any case I can't see how you think this solves the problem with the likes of AVG.

Re strategy 2, what do you mean by "right answer"?

It seems to me that all cases need the use of r1 UNION r2 in which one of the operands (r2, say) is an ad hoc arrangement generating tuples of the same type as those of r1.  Here's my attempt to grapple with the AVG problem:

(EXTEND (SUMMARIZE r1 PER(r2 MATCHING r1) : {x := AVG(y)}) : {rx := rel{tup{x x}}}){ALL BUT x}
UNION
EXTEND (r2 NOT MATCHING r1) : {rx := REL{x RAT}{}}

Is that the kind of thing you meant?  I tried it out in Rel using my "Beyond the Four Fours" database, with Recalcitrant and Cases for r1 and r2, respectively.  This is what I got:

i
INTEGER
j
INTEGER
k
INTEGER
l
INTEGER
rx
RELATION

x
RATIONAL
1 1 1 1
x
RATIONAL
65.1
1 1 1 2
x
RATIONAL
72.28571428571429
1 1 1 3
x
RATIONAL
78.16666666666667
1 1 1 5
x
RATIONAL
66.66666666666667
1 1 1 6
x
RATIONAL
90.2
1 1 1 7
x
RATIONAL
77.75
1 1 1 8
x
RATIONAL
66.85714285714286
1 1 1 9
x
RATIONAL
63.666666666666664
1 1 2 2
x
RATIONAL
77.0
1 1 3 3
x
RATIONAL
91.5
1 1 6 7
x
RATIONAL
92.0
1 1 7 7
x
RATIONAL
93.4
1 1 8 8
x
RATIONAL
95.0
1 2 2 2
x
RATIONAL
80.0
1 2 2 7
x
RATIONAL
89.0
1 2 2 8
x
RATIONAL
93.0
1 2 8 8
x
RATIONAL
95.0
1 3 3 3
x
RATIONAL
84.0
1 3 3 6
x
RATIONAL
85.0
1 3 3 7
x
RATIONAL
95.0
1 3 6 7
x
RATIONAL
89.0
1 6 6 6
x
RATIONAL
91.5
1 6 6 7
x
RATIONAL
92.0
1 7 7 7
x
RATIONAL
69.7
1 7 7 8
x
RATIONAL
70.0
1 8 8 8
x
RATIONAL
54.0
2 2 2 2
x
RATIONAL
70.76470588235294
2 2 2 3
x
RATIONAL
88.66666666666667
2 2 2 4
x
RATIONAL
90.0
2 2 2 5
x
RATIONAL
81.4
2 2 2 6
x
RATIONAL
82.1
2 2 2 7
x
RATIONAL
91.5
2 2 2 8
x
RATIONAL
95.0
2 2 3 6
x
RATIONAL
97.0
2 2 3 7
x
RATIONAL
94.0
2 2 5 5
x
RATIONAL
73.0
2 2 6 6
x
RATIONAL
80.0
2 2 7 7
x
RATIONAL
93.25
2 2 8 8
x
RATIONAL
86.0
2 3 6 6
x
RATIONAL
81.0
2 3 6 8
x
RATIONAL
97.0
2 5 5 5
x
RATIONAL
82.0
2 5 8 8
x
RATIONAL
99.0
2 6 6 6
x
RATIONAL
76.77777777777777
2 6 6 7
x
RATIONAL
89.0
2 6 6 8
x
RATIONAL
62.0
2 6 7 7
x
RATIONAL
98.0
2 7 7 7
x
RATIONAL
80.83333333333333
2 7 7 8
x
RATIONAL
95.0
2 8 8 8
x
RATIONAL
71.57142857142857
3 3 3 3
x
RATIONAL
80.625
3 3 3 5
x
RATIONAL
79.0
3 3 3 6
x
RATIONAL
91.42857142857143
3 3 3 7
x
RATIONAL
84.14285714285714
3 3 6 6
x
RATIONAL
79.83333333333333
3 3 6 7
x
RATIONAL
92.0
3 3 7 7
x
RATIONAL
87.0
3 5 5 5
x
RATIONAL
83.0
3 6 6 6
x
RATIONAL
83.25
3 6 6 7
x
RATIONAL
93.5
3 6 7 7
x
RATIONAL
92.0
3 7 7 7
x
RATIONAL
74.5
3 8 8 8
x
RATIONAL
56.333333333333336
4 6 6 6
x
RATIONAL
84.125
4 6 7 7
x
RATIONAL
97.0
4 6 8 8
x
RATIONAL
69.0
4 7 7 7
x
RATIONAL
83.0
5 5 5 5
x
RATIONAL
78.0
5 5 6 6
x
RATIONAL
97.0
5 5 7 7
x
RATIONAL
94.0
5 6 6 6
x
RATIONAL
94.5
5 7 7 7
x
RATIONAL
76.66666666666667
6 6 6 6
x
RATIONAL
71.28571428571429
6 6 6 7
x
RATIONAL
88.5
6 6 6 8
x
RATIONAL
59.75
6 6 6 9
x
RATIONAL
88.0
6 6 7 7
x
RATIONAL
84.8
6 6 8 8
x
RATIONAL
69.5
6 6 9 9
x
RATIONAL
88.0
6 7 7 7
x
RATIONAL
83.63636363636364
6 8 8 8
x
RATIONAL
66.6
7 7 7 7
x
RATIONAL
54.6
7 7 7 8
x
RATIONAL
56.9
7 7 7 9
x
RATIONAL
68.0
7 7 8 8
x
RATIONAL
44.0
7 8 8 8
x
RATIONAL
66.83333333333333
8 8 8 8
x
RATIONAL
59.25
8 8 8 9
x
RATIONAL
43.0
1 1 1 4
x
RATIONAL
1 1 2 3
x
RATIONAL
1 1 2 4
x
RATIONAL
1 1 2 5
x
RATIONAL
1 1 2 6
x
RATIONAL
1 1 2 7
x
RATIONAL
1 1 2 8
x
RATIONAL
1 1 2 9
x
RATIONAL
1 1 3 4
x
RATIONAL
1 1 3 5
x
RATIONAL
1 1 3 6
x
RATIONAL
1 1 3 7
x
RATIONAL
1 1 3 8
x
RATIONAL
1 1 3 9
x
RATIONAL
1 1 4 4
x
RATIONAL
1 1 4 5
x
RATIONAL
1 1 4 6
x
RATIONAL
1 1 4 7
x
RATIONAL
1 1 4 8
x
RATIONAL
1 1 4 9
x
RATIONAL
1 1 5 5
x
RATIONAL
1 1 5 6
x
RATIONAL
1 1 5 7
x
RATIONAL
1 1 5 8
x
RATIONAL
1 1 5 9
x
RATIONAL
1 1 6 6
x
RATIONAL
1 1 6 8
x
RATIONAL
1 1 6 9
x
RATIONAL
1 1 7 8
x
RATIONAL
1 1 7 9
x
RATIONAL
1 1 8 9
x
RATIONAL
1 1 9 9
x
RATIONAL
1 2 2 3
x
RATIONAL
1 2 2 4
x
RATIONAL
1 2 2 5
x
RATIONAL
1 2 2 6
x
RATIONAL
1 2 2 9
x
RATIONAL
1 2 3 3
x
RATIONAL
1 2 3 4
x
RATIONAL
1 2 3 5
x
RATIONAL
1 2 3 6
x
RATIONAL
1 2 3 7
x
RATIONAL
1 2 3 8
x
RATIONAL
1 2 3 9
x
RATIONAL
1 2 4 4
x
RATIONAL
1 2 4 5
x
RATIONAL
1 2 4 6
x
RATIONAL
1 2 4 7
x
RATIONAL
1 2 4 8
x
RATIONAL
1 2 4 9
x
RATIONAL
1 2 5 5
x
RATIONAL
1 2 5 6
x
RATIONAL
1 2 5 7
x
RATIONAL
1 2 5 8
x
RATIONAL
1 2 5 9
x
RATIONAL
1 2 6 6
x
RATIONAL
1 2 6 7
x
RATIONAL
1 2 6 8
x
RATIONAL
1 2 6 9
x
RATIONAL
1 2 7 7
x
RATIONAL
1 2 7 8
x
RATIONAL
1 2 7 9
x
RATIONAL
1 2 8 9
x
RATIONAL
1 2 9 9
x
RATIONAL
1 3 3 4
x
RATIONAL
1 3 3 5
x
RATIONAL
1 3 3 8
x
RATIONAL
1 3 3 9
x
RATIONAL
1 3 4 4
x
RATIONAL
1 3 4 5
x
RATIONAL
1 3 4 6
x
RATIONAL
1 3 4 7
x
RATIONAL
1 3 4 8
x
RATIONAL
1 3 4 9
x
RATIONAL
1 3 5 5
x
RATIONAL
1 3 5 6
x
RATIONAL
1 3 5 7
x
RATIONAL
1 3 5 8
x
RATIONAL
1 3 5 9
x
RATIONAL
1 3 6 6
x
RATIONAL
1 3 6 8
x
RATIONAL
1 3 6 9
x
RATIONAL
1 3 7 7
x
RATIONAL
1 3 7 8
x
RATIONAL
1 3 7 9
x
RATIONAL
1 3 8 8
x
RATIONAL
1 3 8 9
x
RATIONAL
1 3 9 9
x
RATIONAL
1 4 4 4
x
RATIONAL
1 4 4 5
x
RATIONAL
1 4 4 6
x
RATIONAL
1 4 4 7
x
RATIONAL
1 4 4 8
x
RATIONAL
1 4 4 9
x
RATIONAL
1 4 5 5
x
RATIONAL
1 4 5 6
x
RATIONAL
1 4 5 7
x
RATIONAL
1 4 5 8
x
RATIONAL
1 4 5 9
x
RATIONAL
1 4 6 6
x
RATIONAL
1 4 6 7
x
RATIONAL
1 4 6 8
x
RATIONAL
1 4 6 9
x
RATIONAL
1 4 7 7
x
RATIONAL
1 4 7 8
x
RATIONAL
1 4 7 9
x
RATIONAL
1 4 8 8
x
RATIONAL
1 4 8 9
x
RATIONAL
1 4 9 9
x
RATIONAL
1 5 5 5
x
RATIONAL
1 5 5 6
x
RATIONAL
1 5 5 7
x
RATIONAL
1 5 5 8
x
RATIONAL
1 5 5 9
x
RATIONAL
1 5 6 6
x
RATIONAL
1 5 6 7
x
RATIONAL
1 5 6 8
x
RATIONAL
1 5 6 9
x
RATIONAL
1 5 7 7
x
RATIONAL
1 5 7 8
x
RATIONAL
1 5 7 9
x
RATIONAL
1 5 8 8
x
RATIONAL
1 5 8 9
x
RATIONAL
1 5 9 9
x
RATIONAL
1 6 6 8
x
RATIONAL
1 6 6 9
x
RATIONAL
1 6 7 7
x
RATIONAL
1 6 7 8
x
RATIONAL
1 6 7 9
x
RATIONAL
1 6 8 8
x
RATIONAL
1 6 8 9
x
RATIONAL
1 6 9 9
x
RATIONAL
1 7 7 9
x
RATIONAL
1 7 8 8
x
RATIONAL
1 7 8 9
x
RATIONAL
1 7 9 9
x
RATIONAL
1 8 8 9
x
RATIONAL
1 8 9 9
x
RATIONAL
1 9 9 9
x
RATIONAL
2 2 2 9
x
RATIONAL
2 2 3 3
x
RATIONAL
2 2 3 4
x
RATIONAL
2 2 3 5
x
RATIONAL
2 2 3 8
x
RATIONAL
2 2 3 9
x
RATIONAL
2 2 4 4
x
RATIONAL
2 2 4 5
x
RATIONAL
2 2 4 6
x
RATIONAL
2 2 4 7
x
RATIONAL
2 2 4 8
x
RATIONAL
2 2 4 9
x
RATIONAL
2 2 5 6
x
RATIONAL
2 2 5 7
x
RATIONAL
2 2 5 8
x
RATIONAL
2 2 5 9
x
RATIONAL
2 2 6 7
x
RATIONAL
2 2 6 8
x
RATIONAL
2 2 6 9
x
RATIONAL
2 2 7 8
x
RATIONAL
2 2 7 9
x
RATIONAL
2 2 8 9
x
RATIONAL
2 2 9 9
x
RATIONAL
2 3 3 3
x
RATIONAL
2 3 3 4
x
RATIONAL
2 3 3 5
x
RATIONAL
2 3 3 6
x
RATIONAL
2 3 3 7
x
RATIONAL
2 3 3 8
x
RATIONAL
2 3 3 9
x
RATIONAL
2 3 4 4
x
RATIONAL
2 3 4 5
x
RATIONAL
2 3 4 6
x
RATIONAL
2 3 4 7
x
RATIONAL
2 3 4 8
x
RATIONAL
2 3 4 9
x
RATIONAL
2 3 5 5
x
RATIONAL
2 3 5 6
x
RATIONAL
2 3 5 7
x
RATIONAL
2 3 5 8
x
RATIONAL
2 3 5 9
x
RATIONAL
2 3 6 7
x
RATIONAL
2 3 6 9
x
RATIONAL
2 3 7 7
x
RATIONAL
2 3 7 8
x
RATIONAL
2 3 7 9
x
RATIONAL
2 3 8 8
x
RATIONAL
2 3 8 9
x
RATIONAL
2 3 9 9
x
RATIONAL
2 4 4 4
x
RATIONAL
2 4 4 5
x
RATIONAL
2 4 4 6
x
RATIONAL
2 4 4 7
x
RATIONAL
2 4 4 8
x
RATIONAL
2 4 4 9
x
RATIONAL
2 4 5 5
x
RATIONAL
2 4 5 6
x
RATIONAL
2 4 5 7
x
RATIONAL
2 4 5 8
x
RATIONAL
2 4 5 9
x
RATIONAL
2 4 6 6
x
RATIONAL
2 4 6 7
x
RATIONAL
2 4 6 8
x
RATIONAL
2 4 6 9
x
RATIONAL
2 4 7 7
x
RATIONAL
2 4 7 8
x
RATIONAL
2 4 7 9
x
RATIONAL
2 4 8 8
x
RATIONAL
2 4 8 9
x
RATIONAL
2 4 9 9
x
RATIONAL
2 5 5 6
x
RATIONAL
2 5 5 7
x
RATIONAL
2 5 5 8
x
RATIONAL
2 5 5 9
x
RATIONAL
2 5 6 6
x
RATIONAL
2 5 6 7
x
RATIONAL
2 5 6 8
x
RATIONAL
2 5 6 9
x
RATIONAL
2 5 7 7
x
RATIONAL
2 5 7 8
x
RATIONAL
2 5 7 9
x
RATIONAL
2 5 8 9
x
RATIONAL
2 5 9 9
x
RATIONAL
2 6 6 9
x
RATIONAL
2 6 7 8
x
RATIONAL
2 6 7 9
x
RATIONAL
2 6 8 8
x
RATIONAL
2 6 8 9
x
RATIONAL
2 6 9 9
x
RATIONAL
2 7 7 9
x
RATIONAL
2 7 8 8
x
RATIONAL
2 7 8 9
x
RATIONAL
2 7 9 9
x
RATIONAL
2 8 8 9
x
RATIONAL
2 8 9 9
x
RATIONAL
2 9 9 9
x
RATIONAL
3 3 3 4
x
RATIONAL
3 3 3 8
x
RATIONAL
3 3 3 9
x
RATIONAL
3 3 4 4
x
RATIONAL
3 3 4 5
x
RATIONAL
3 3 4 6
x
RATIONAL
3 3 4 7
x
RATIONAL
3 3 4 8
x
RATIONAL
3 3 4 9
x
RATIONAL
3 3 5 5
x
RATIONAL
3 3 5 6
x
RATIONAL
3 3 5 7
x
RATIONAL
3 3 5 8
x
RATIONAL
3 3 5 9
x
RATIONAL
3 3 6 8
x
RATIONAL
3 3 6 9
x
RATIONAL
3 3 7 8
x
RATIONAL
3 3 7 9
x
RATIONAL
3 3 8 8
x
RATIONAL
3 3 8 9
x
RATIONAL
3 3 9 9
x
RATIONAL
3 4 4 4
x
RATIONAL
3 4 4 5
x
RATIONAL
3 4 4 6
x
RATIONAL
3 4 4 7
x
RATIONAL
3 4 4 8
x
RATIONAL
3 4 4 9
x
RATIONAL
3 4 5 5
x
RATIONAL
3 4 5 6
x
RATIONAL
3 4 5 7
x
RATIONAL
3 4 5 8
x
RATIONAL
3 4 5 9
x
RATIONAL
3 4 6 6
x
RATIONAL
3 4 6 7
x
RATIONAL
3 4 6 8
x
RATIONAL
3 4 6 9
x
RATIONAL
3 4 7 7
x
RATIONAL
3 4 7 8
x
RATIONAL
3 4 7 9
x
RATIONAL
3 4 8 8
x
RATIONAL
3 4 8 9
x
RATIONAL
3 4 9 9
x
RATIONAL
3 5 5 6
x
RATIONAL
3 5 5 7
x
RATIONAL
3 5 5 8
x
RATIONAL
3 5 5 9
x
RATIONAL
3 5 6 6
x
RATIONAL
3 5 6 7
x
RATIONAL
3 5 6 8
x
RATIONAL
3 5 6 9
x
RATIONAL
3 5 7 7
x
RATIONAL
3 5 7 8
x
RATIONAL
3 5 7 9
x
RATIONAL
3 5 8 8
x
RATIONAL
3 5 8 9
x
RATIONAL
3 5 9 9
x
RATIONAL
3 6 6 8
x
RATIONAL
3 6 6 9
x
RATIONAL
3 6 7 8
x
RATIONAL
3 6 7 9
x
RATIONAL
3 6 8 8
x
RATIONAL
3 6 8 9
x
RATIONAL
3 6 9 9
x
RATIONAL
3 7 7 8
x
RATIONAL
3 7 7 9
x
RATIONAL
3 7 8 8
x
RATIONAL
3 7 8 9
x
RATIONAL
3 7 9 9
x
RATIONAL
3 8 8 9
x
RATIONAL
3 8 9 9
x
RATIONAL
3 9 9 9
x
RATIONAL
4 4 4 4
x
RATIONAL
4 4 4 5
x
RATIONAL
4 4 4 6
x
RATIONAL
4 4 4 7
x
RATIONAL
4 4 4 8
x
RATIONAL
4 4 4 9
x
RATIONAL
4 4 5 5
x
RATIONAL
4 4 5 6
x
RATIONAL
4 4 5 7
x
RATIONAL
4 4 5 8
x
RATIONAL
4 4 5 9
x
RATIONAL
4 4 6 6
x
RATIONAL
4 4 6 7
x
RATIONAL
4 4 6 8
x
RATIONAL
4 4 6 9
x
RATIONAL
4 4 7 7
x
RATIONAL
4 4 7 8
x
RATIONAL
4 4 7 9
x
RATIONAL
4 4 8 8
x
RATIONAL
4 4 8 9
x
RATIONAL
4 4 9 9
x
RATIONAL
4 5 5 5
x
RATIONAL
4 5 5 6
x
RATIONAL
4 5 5 7
x
RATIONAL
4 5 5 8
x
RATIONAL
4 5 5 9
x
RATIONAL
4 5 6 6
x
RATIONAL
4 5 6 7
x
RATIONAL
4 5 6 8
x
RATIONAL
4 5 6 9
x
RATIONAL
4 5 7 7
x
RATIONAL
4 5 7 8
x
RATIONAL
4 5 7 9
x
RATIONAL
4 5 8 8
x
RATIONAL
4 5 8 9
x
RATIONAL
4 5 9 9
x
RATIONAL
4 6 6 7
x
RATIONAL
4 6 6 8
x
RATIONAL
4 6 6 9
x
RATIONAL
4 6 7 8
x
RATIONAL
4 6 7 9
x
RATIONAL
4 6 8 9
x
RATIONAL
4 6 9 9
x
RATIONAL
4 7 7 8
x
RATIONAL
4 7 7 9
x
RATIONAL
4 7 8 8
x
RATIONAL
4 7 8 9
x
RATIONAL
4 7 9 9
x
RATIONAL
4 8 8 8
x
RATIONAL
4 8 8 9
x
RATIONAL
4 8 9 9
x
RATIONAL
4 9 9 9
x
RATIONAL
5 5 5 6
x
RATIONAL
5 5 5 7
x
RATIONAL
5 5 5 8
x
RATIONAL
5 5 5 9
x
RATIONAL
5 5 6 7
x
RATIONAL
5 5 6 8
x
RATIONAL
5 5 6 9
x
RATIONAL
5 5 7 8
x
RATIONAL
5 5 7 9
x
RATIONAL
5 5 8 8
x
RATIONAL
5 5 8 9
x
RATIONAL
5 5 9 9
x
RATIONAL
5 6 6 7
x
RATIONAL
5 6 6 8
x
RATIONAL
5 6 6 9
x
RATIONAL
5 6 7 7
x
RATIONAL
5 6 7 8
x
RATIONAL
5 6 7 9
x
RATIONAL
5 6 8 8
x
RATIONAL
5 6 8 9
x
RATIONAL
5 6 9 9
x
RATIONAL
5 7 7 8
x
RATIONAL
5 7 7 9
x
RATIONAL
5 7 8 8
x
RATIONAL
5 7 8 9
x
RATIONAL
5 7 9 9
x
RATIONAL
5 8 8 8
x
RATIONAL
5 8 8 9
x
RATIONAL
5 8 9 9
x
RATIONAL
5 9 9 9
x
RATIONAL
6 6 7 8
x
RATIONAL
6 6 7 9
x
RATIONAL
6 6 8 9
x
RATIONAL
6 7 7 8
x
RATIONAL
6 7 7 9
x
RATIONAL
6 7 8 8
x
RATIONAL
6 7 8 9
x
RATIONAL
6 7 9 9
x
RATIONAL
6 8 8 9
x
RATIONAL
6 8 9 9
x
RATIONAL
6 9 9 9
x
RATIONAL
7 7 8 9
x
RATIONAL
7 7 9 9
x
RATIONAL
7 8 8 9
x
RATIONAL
7 8 9 9
x
RATIONAL
7 9 9 9
x
RATIONAL
8 8 9 9
x
RATIONAL
8 9 9 9
x
RATIONAL
9 9 9 9
x
RATIONAL

Hugh

Coauthor of The Third Manifesto and related books.
Quote from Hugh on February 15, 2020, 1:10 pm
Quote from dandl on February 15, 2020, 10:34 am

Some time back we had some discussion around this topic. The question is how best to deal with queries where particular tuples are 'projected out' or 'joined out', but we want those lost tuples to contribute to the final result. SQL allows preserving the rows but filling in with NULLs; we don't allow that.

What I recall was broadly two main strategies. I had a feeling there were more, but I can't find them.

  1. Bundle up attributes that might go missing into an RVA, so missing data can be represented as an empty relation.
  2. Back fill the final query with the missing rows and pre-computed data.

Strategy 1 covers image relations. It also covers SUMMARIZE PER, which generates a temporary RVA behind the scenes. You can see this directly in the expansion offered by DBE. This one suffers from the difficulty that the aggregating function is required to return an acceptable value when passed an empty relation (what's the AVERAGE value of an empty argument list?). [SQL has a slightly different problem: the aggregating function may have to deal with NULLs.]

Strategy 2 is awkward to code, but gets the right result and is immune to the empty relation problem. I've seen variants of this used in SQL.

But are there others? I thought there were, but memory fails me.

As a side note, App-A glosses over the issue of SUMMARIZE, but again whatever the relcon looks like, it must have an RVA as per Strategy 1.

Re strategy 1, I don't really understand "bundle up attributes that might go missing" but in any case I can't see how you think this solves the problem with the likes of AVG.

In the Supplier database, if you join S to P via SP, supplier S5 will be 'joined out' because there is no S5 in SP. A naive aggregation of QTY will omit S5, when our desired result is zero. SUMMARIZE PER constructs an RVA behind the scenes associating S5 with an empty list of parts supplied. The aggregation comes out right provided the aggregating function delivers the desired result when passed an empty list. In the image relation approach in DTATRM, that use of an RVA is explicit.

And no, it doesn't solve the problem of AVG unless AVG is defined to return a value (such as zero or NaN) for an empty list.

Re strategy 2, what do you mean by "right answer"?

It seems to me that all cases need the use of r1 UNION r2 in which one of the operands (r2, say) is an ad hoc arrangement generating tuples of the same type as those of r1.

Agreed. That's exactly what I meant by Strategy 2. This can solve the AVG problem by inserting a preset value.

Here's my attempt to grapple with the AVG problem:

(EXTEND (SUMMARIZE r1 PER(r2 MATCHING r1) : {x := AVG(y)}) : {rx := rel{tup{x x}}}){ALL BUT x}
UNION
EXTEND (r2 NOT MATCHING r1) : {rx := REL{x RAT}{}}

Is that the kind of thing you meant?

I would expect SUMMARIZE PER to deliver the desired result on its own. In this construct I would expect to see SUMMARIZE BY with NOT MATCHING, EXTEND, UNION.

Andl - A New Database Language - andl.org
Quote from dandl on February 15, 2020, 11:22 pm
Quote from Hugh on February 15, 2020, 1:10 pm
Quote from dandl on February 15, 2020, 10:34 am

Some time back we had some discussion around this topic. The question is how best to deal with queries where particular tuples are 'projected out' or 'joined out', but we want those lost tuples to contribute to the final result. SQL allows preserving the rows but filling in with NULLs; we don't allow that.

What I recall was broadly two main strategies. I had a feeling there were more, but I can't find them.

  1. Bundle up attributes that might go missing into an RVA, so missing data can be represented as an empty relation.
  2. Back fill the final query with the missing rows and pre-computed data.

Strategy 1 covers image relations. It also covers SUMMARIZE PER, which generates a temporary RVA behind the scenes. You can see this directly in the expansion offered by DBE. This one suffers from the difficulty that the aggregating function is required to return an acceptable value when passed an empty relation (what's the AVERAGE value of an empty argument list?). [SQL has a slightly different problem: the aggregating function may have to deal with NULLs.]

Strategy 2 is awkward to code, but gets the right result and is immune to the empty relation problem. I've seen variants of this used in SQL.

But are there others? I thought there were, but memory fails me.

As a side note, App-A glosses over the issue of SUMMARIZE, but again whatever the relcon looks like, it must have an RVA as per Strategy 1.

Re strategy 1, I don't really understand "bundle up attributes that might go missing" but in any case I can't see how you think this solves the problem with the likes of AVG.

In the Supplier database, if you join S to P via SP, supplier S5 will be 'joined out' because there is no S5 in SP. A naive aggregation of QTY will omit S5, when our desired result is zero. SUMMARIZE PER constructs an RVA behind the scenes associating S5 with an empty list of parts supplied. The aggregation comes out right provided the aggregating function delivers the desired result when passed an empty list. In the image relation approach in DTATRM, that use of an RVA is explicit.

And no, it doesn't solve the problem of AVG unless AVG is defined to return a value (such as zero or NaN) for an empty list.

Zero? Zero?

NaN, sure. Undefined, where the type is a union of RATIONAL and Undefined, sure.

But zero?!?

Abomination.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on February 15, 2020, 11:47 pm
Quote from dandl on February 15, 2020, 11:22 pm
Quote from Hugh on February 15, 2020, 1:10 pm
Quote from dandl on February 15, 2020, 10:34 am

Some time back we had some discussion around this topic. The question is how best to deal with queries where particular tuples are 'projected out' or 'joined out', but we want those lost tuples to contribute to the final result. SQL allows preserving the rows but filling in with NULLs; we don't allow that.

What I recall was broadly two main strategies. I had a feeling there were more, but I can't find them.

  1. Bundle up attributes that might go missing into an RVA, so missing data can be represented as an empty relation.
  2. Back fill the final query with the missing rows and pre-computed data.

Strategy 1 covers image relations. It also covers SUMMARIZE PER, which generates a temporary RVA behind the scenes. You can see this directly in the expansion offered by DBE. This one suffers from the difficulty that the aggregating function is required to return an acceptable value when passed an empty relation (what's the AVERAGE value of an empty argument list?). [SQL has a slightly different problem: the aggregating function may have to deal with NULLs.]

Strategy 2 is awkward to code, but gets the right result and is immune to the empty relation problem. I've seen variants of this used in SQL.

But are there others? I thought there were, but memory fails me.

As a side note, App-A glosses over the issue of SUMMARIZE, but again whatever the relcon looks like, it must have an RVA as per Strategy 1.

Re strategy 1, I don't really understand "bundle up attributes that might go missing" but in any case I can't see how you think this solves the problem with the likes of AVG.

In the Supplier database, if you join S to P via SP, supplier S5 will be 'joined out' because there is no S5 in SP. A naive aggregation of QTY will omit S5, when our desired result is zero. SUMMARIZE PER constructs an RVA behind the scenes associating S5 with an empty list of parts supplied. The aggregation comes out right provided the aggregating function delivers the desired result when passed an empty list. In the image relation approach in DTATRM, that use of an RVA is explicit.

And no, it doesn't solve the problem of AVG unless AVG is defined to return a value (such as zero or NaN) for an empty list.

Zero? Zero?

NaN, sure. Undefined, where the type is a union of RATIONAL and Undefined, sure.

But zero?!?

Abomination.

Not my problem, but at least the choice is up front.

But I still don't have an answer: apart from temp RVA and back fill UNION, is there another strategy for doing what SQL might handle with an outer join? I thought we discussed one, but I can't find anything.

Maybe it wasn't aggregation? DIVIDE? Should supplier S5 be treated differently in a query about "suppliers who supply every purple part"? The strategy in the PER form is just to take a different path (via S rather than SP).

I have a note to myself about 'outer join for substitution' but I don't know what it refers to. Any idea?

Andl - A New Database Language - andl.org
Quote from dandl on February 16, 2020, 12:44 am
Quote from Dave Voorhis on February 15, 2020, 11:47 pm
Quote from dandl on February 15, 2020, 11:22 pm
Quote from Hugh on February 15, 2020, 1:10 pm
Quote from dandl on February 15, 2020, 10:34 am

Some time back we had some discussion around this topic. The question is how best to deal with queries where particular tuples are 'projected out' or 'joined out', but we want those lost tuples to contribute to the final result. SQL allows preserving the rows but filling in with NULLs; we don't allow that.

What I recall was broadly two main strategies. I had a feeling there were more, but I can't find them.

  1. Bundle up attributes that might go missing into an RVA, so missing data can be represented as an empty relation.
  2. Back fill the final query with the missing rows and pre-computed data.

Strategy 1 covers image relations. It also covers SUMMARIZE PER, which generates a temporary RVA behind the scenes. You can see this directly in the expansion offered by DBE. This one suffers from the difficulty that the aggregating function is required to return an acceptable value when passed an empty relation (what's the AVERAGE value of an empty argument list?). [SQL has a slightly different problem: the aggregating function may have to deal with NULLs.]

Strategy 2 is awkward to code, but gets the right result and is immune to the empty relation problem. I've seen variants of this used in SQL.

But are there others? I thought there were, but memory fails me.

As a side note, App-A glosses over the issue of SUMMARIZE, but again whatever the relcon looks like, it must have an RVA as per Strategy 1.

Re strategy 1, I don't really understand "bundle up attributes that might go missing" but in any case I can't see how you think this solves the problem with the likes of AVG.

In the Supplier database, if you join S to P via SP, supplier S5 will be 'joined out' because there is no S5 in SP. A naive aggregation of QTY will omit S5, when our desired result is zero. SUMMARIZE PER constructs an RVA behind the scenes associating S5 with an empty list of parts supplied. The aggregation comes out right provided the aggregating function delivers the desired result when passed an empty list. In the image relation approach in DTATRM, that use of an RVA is explicit.

And no, it doesn't solve the problem of AVG unless AVG is defined to return a value (such as zero or NaN) for an empty list.

Zero? Zero?

NaN, sure. Undefined, where the type is a union of RATIONAL and Undefined, sure.

But zero?!?

Abomination.

Not my problem, but at least the choice is up front.

But I still don't have an answer: apart from temp RVA and back fill UNION, is there another strategy for doing what SQL might handle with an outer join? I thought we discussed one, but I can't find anything.

Maybe it wasn't aggregation? DIVIDE? Should supplier S5 be treated differently in a query about "suppliers who supply every purple part"? The strategy in the PER form is just to take a different path (via S rather than SP).

I have a note to myself about 'outer join for substitution' but I don't know what it refers to. Any idea?

When this came up before, I think it's where we deviated into a discussion about how outer join could be implemented in Tutorial D.

I've always used a union of {the tuples with x → result, the tuples without x → indicator } where result & indicator are either an RVA, or a union type of result | indicator.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on February 15, 2020, 11:22 pm
Quote from Hugh on February 15, 2020, 1:10 pm
Quote from dandl on February 15, 2020, 10:34 am

Some time back we had some discussion around this topic. The question is how best to deal with queries where particular tuples are 'projected out' or 'joined out', but we want those lost tuples to contribute to the final result. SQL allows preserving the rows but filling in with NULLs; we don't allow that.

What I recall was broadly two main strategies. I had a feeling there were more, but I can't find them.

  1. Bundle up attributes that might go missing into an RVA, so missing data can be represented as an empty relation.
  2. Back fill the final query with the missing rows and pre-computed data.

Strategy 1 covers image relations. It also covers SUMMARIZE PER, which generates a temporary RVA behind the scenes. You can see this directly in the expansion offered by DBE. This one suffers from the difficulty that the aggregating function is required to return an acceptable value when passed an empty relation (what's the AVERAGE value of an empty argument list?). [SQL has a slightly different problem: the aggregating function may have to deal with NULLs.]

Strategy 2 is awkward to code, but gets the right result and is immune to the empty relation problem. I've seen variants of this used in SQL.

But are there others? I thought there were, but memory fails me.

As a side note, App-A glosses over the issue of SUMMARIZE, but again whatever the relcon looks like, it must have an RVA as per Strategy 1.

Re strategy 1, I don't really understand "bundle up attributes that might go missing" but in any case I can't see how you think this solves the problem with the likes of AVG.

In the Supplier database, if you join S to P via SP, supplier S5 will be 'joined out' because there is no S5 in SP. A naive aggregation of QTY will omit S5, when our desired result is zero. SUMMARIZE PER constructs an RVA behind the scenes associating S5 with an empty list of parts supplied. The aggregation comes out right provided the aggregating function delivers the desired result when passed an empty list. In the image relation approach in DTATRM, that use of an RVA is explicit.

And no, it doesn't solve the problem of AVG unless AVG is defined to return a value (such as zero or NaN) for an empty list.

Re strategy 2, what do you mean by "right answer"?

It seems to me that all cases need the use of r1 UNION r2 in which one of the operands (r2, say) is an ad hoc arrangement generating tuples of the same type as those of r1.

Agreed. That's exactly what I meant by Strategy 2. This can solve the AVG problem by inserting a preset value.

Here's my attempt to grapple with the AVG problem:

(EXTEND (SUMMARIZE r1 PER(r2 MATCHING r1) : {x := AVG(y)}) : {rx := rel{tup{x x}}}){ALL BUT x}
UNION
EXTEND (r2 NOT MATCHING r1) : {rx := REL{x RAT}{}}

Is that the kind of thing you meant?

I would expect SUMMARIZE PER to deliver the desired result on its own. In this construct I would expect to see SUMMARIZE BY with NOT MATCHING, EXTEND, UNION.

Please explain how you expect SUMMARIZE PER to do that on its own.

Yes, I could have use BY in the first operand but you were concerned with the PER variety and I wanted a more general solution.

Hugh

Coauthor of The Third Manifesto and related books.
Quote from Dave Voorhis on February 16, 2020, 9:38 am
Quote from dandl on February 16, 2020, 12:44 am
Quote from Dave Voorhis on February 15, 2020, 11:47 pm
Quote from dandl on February 15, 2020, 11:22 pm
Quote from Hugh on February 15, 2020, 1:10 pm
Quote from dandl on February 15, 2020, 10:34 am

Some time back we had some discussion around this topic. The question is how best to deal with queries where particular tuples are 'projected out' or 'joined out', but we want those lost tuples to contribute to the final result. SQL allows preserving the rows but filling in with NULLs; we don't allow that.

What I recall was broadly two main strategies. I had a feeling there were more, but I can't find them.

  1. Bundle up attributes that might go missing into an RVA, so missing data can be represented as an empty relation.
  2. Back fill the final query with the missing rows and pre-computed data.

Strategy 1 covers image relations. It also covers SUMMARIZE PER, which generates a temporary RVA behind the scenes. You can see this directly in the expansion offered by DBE. This one suffers from the difficulty that the aggregating function is required to return an acceptable value when passed an empty relation (what's the AVERAGE value of an empty argument list?). [SQL has a slightly different problem: the aggregating function may have to deal with NULLs.]

Strategy 2 is awkward to code, but gets the right result and is immune to the empty relation problem. I've seen variants of this used in SQL.

But are there others? I thought there were, but memory fails me.

As a side note, App-A glosses over the issue of SUMMARIZE, but again whatever the relcon looks like, it must have an RVA as per Strategy 1.

Re strategy 1, I don't really understand "bundle up attributes that might go missing" but in any case I can't see how you think this solves the problem with the likes of AVG.

In the Supplier database, if you join S to P via SP, supplier S5 will be 'joined out' because there is no S5 in SP. A naive aggregation of QTY will omit S5, when our desired result is zero. SUMMARIZE PER constructs an RVA behind the scenes associating S5 with an empty list of parts supplied. The aggregation comes out right provided the aggregating function delivers the desired result when passed an empty list. In the image relation approach in DTATRM, that use of an RVA is explicit.

And no, it doesn't solve the problem of AVG unless AVG is defined to return a value (such as zero or NaN) for an empty list.

Zero? Zero?

NaN, sure. Undefined, where the type is a union of RATIONAL and Undefined, sure.

But zero?!?

Abomination.

Not my problem, but at least the choice is up front.

But I still don't have an answer: apart from temp RVA and back fill UNION, is there another strategy for doing what SQL might handle with an outer join? I thought we discussed one, but I can't find anything.

Maybe it wasn't aggregation? DIVIDE? Should supplier S5 be treated differently in a query about "suppliers who supply every purple part"? The strategy in the PER form is just to take a different path (via S rather than SP).

I have a note to myself about 'outer join for substitution' but I don't know what it refers to. Any idea?

When this came up before, I think it's where we deviated into a discussion about how outer join could be implemented in Tutorial D.

I've always used a union of {the tuples with x → result, the tuples without x → indicator } where result & indicator are either an RVA, or a union type of result | indicator.

Also... There's a natural inclination to try to meaningfully combine valid aggregate results, like averages, with aggregate results where a result is meaningless -- like an average of nothing.

Perhaps it shouldn't be a natural inclination.

When I seem to be heading in that direction, I reexamine the requirements that lead to trying to combine meaningful results with meaningless results. The result is often a new UX design that completely separates the average-ables from the non-average-ables. That often turns out to be much more pleasant to use, because it emphasises a natural distinction between the two groups that is otherwise undesirably blurred by combining them.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Here's my attempt to grapple with the AVG problem:

(EXTEND (SUMMARIZE r1 PER(r2 MATCHING r1) : {x := AVG(y)}) : {rx := rel{tup{x x}}}){ALL BUT x}
UNION
EXTEND (r2 NOT MATCHING r1) : {rx := REL{x RAT}{}}

Is that the kind of thing you meant?

I would expect SUMMARIZE PER to deliver the desired result on its own. In this construct I would expect to see SUMMARIZE BY with NOT MATCHING, EXTEND, UNION.

Please explain how you expect SUMMARIZE PER to do that on its own.

Yes, I could have use BY in the first operand but you were concerned with the PER variety and I wanted a more general solution.

My poor phrasing: I meant I would have expected to see either the PER version on its own (because AVG already does the right thing) or the BY version plus UNION (because it does not). I don't understand the PER version used with UNION, and would appreciate some clarification.

Andl - A New Database Language - andl.org
Quote from dandl on February 16, 2020, 11:39 pm

Here's my attempt to grapple with the AVG problem:

(EXTEND (SUMMARIZE r1 PER(r2 MATCHING r1) : {x := AVG(y)}) : {rx := rel{tup{x x}}}){ALL BUT x}
UNION
EXTEND (r2 NOT MATCHING r1) : {rx := REL{x RAT}{}}

Is that the kind of thing you meant?

I would expect SUMMARIZE PER to deliver the desired result on its own. In this construct I would expect to see SUMMARIZE BY with NOT MATCHING, EXTEND, UNION.

Please explain how you expect SUMMARIZE PER to do that on its own.

Yes, I could have use BY in the first operand but you were concerned with the PER variety and I wanted a more general solution.

My poor phrasing: I meant I would have expected to see either the PER version on its own (because AVG already does the right thing) or the BY version plus UNION (because it does not). I don't understand the PER version used with UNION, and would appreciate some clarification.

SUMMARIZE r1 PER(r2 MATCHING r1) : {x := AVG(y)} takes the average y value for all r1 tuples for which r2 has a match.  It is equivalent to SUMMARIZE r1 BY attrs in the case where every r1 tuple matches some r2 tuple, otherwise not equivalent.  Call this expression t1.  Then:

EXTEND t1 : {rx := rel{tup{x x}}}){ALL BUT x} adds attribute rx of type RELATION{x RATIONAL} to t1.  The rx value in each tuple of the result is the singleton relation whose x value is the x value in that tuple of t1.  The projection {ALL BUT x} removes the x attribute.  The result, t2, has heading the attributes of r2 plus rx of the type I gave.

EXTEND (r2 NOT MATCHING r1) : {rx := REL{x RAT}{}} takes those r2 tuples that have no counterpart in t2 and extends them with an rx value of the same type as rx in t2, that value being an empty relation.  Call this t3.  Then t2 UNION t3 shows all the average y values that are shown in t2 plus all the cases where no such value exists.

What more can I say?  Well this: I have no idea what you mean by "the PER version on its own (because AVG already does the right thing)".  In my book, AVG does the "right thing" to a nonempty list of numbers but not to the empty list because it will raise an exception.  The BY version of TD's SUMMARIZE is irrelevant.

Hugh

 

Coauthor of The Third Manifesto and related books.