The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

What do set-based operations buy us?

Quote from Dave Voorhis on February 20, 2021, 3:46 pm
Quote from dandl on February 20, 2021, 10:22 am

I have always been firmly in the static typing camp, but I have to acknowledge that the research doesn't necessarily support a very strong stance. I mentioned how Uncle Bob has swung over and the basis of the argument seems to be that TDD mandates about the same amount of tests whatever your type system, but you avoid having to do all the type declaration stuff in a dynamic system. (On trying to look up solid arguments I found this which was somewhat interesting: https://labs.ig.com/static-typing-promise )

You still haven't quoted any research of any authority. What I see here is bunk.

I suppose it depends what you define as "authority."

In some circles, Bob Martin is considered as valid an authority as you can get. In others, he's ignored in favour of Knuth and other academic notables. In yet other circles, if it doesn't come straight from Microsoft, ignore it. Etc.

In computing in general we have a bit of a problem identifying authority, which is perhaps not a bad thing.

The problem with this argument (apart from the close resemblance to a religious war) is that it lumps every kind of programming in together. It's like comparing apples to oranges, battleships and the colour yellow. You just can't.

Speaking for myself, I like to have a scripting language for massaging text files or iterating over files. For this purpose I like Ruby, and I hate C#.

Then I like to have a language at the level of bits and bytes, to diddle with ports and memory, use protocols. I like C++, Python is a dog.

Then I write a few thousand lines of code for a language compiler and VM, hack it until it works, refactor until it's right. I want a static type compiler, modules and interfaces, assertions and tests (but not TDD). C# fits the bill. And so on.

When you show me research that reflects that, you have my attention.

There is a problem that research of this sort often starts with a subject group consisting of a cohort of final-year university computer science (or software engineering, if you're lucky) students, which is about as non-reflective of real industry practitioners as you can get.

Not that it matters, anyway. See my other post re language preferences. If definitive research was presented to condemn dynamic typing, would you give up Ruby?

Or if definitive research condemned static typing, would you give up C# or C++?

I think I know the answer, and it would be the same answer for the majority of us.

No, I'll never give up Ruby.  Simply cannot do that because I've never even used it.  And I'll gladly give up REXX.  And at any rate the implications you mentioned are irrelevant because in both cases the antecedent is false.  Both types of typing have their uses so neither of them will ever be "condemned".  It's just that you need to understand Smout's conjecture that systems that make it easy [for their users] to make errors will also make it easy [for those same users] to correct them, and systems that make it difficult [for their users] to make errors will also make it difficult [for those same users] to correct them.  And the implications of that conjecture (even if those implications may be a bit of a stretch) on how easy it is going to be to correct errors made in a system based on dynamic typing system vs. those made in a system based on static typing.

Quote from Erwin on February 20, 2021, 7:28 pm
Quote from Dave Voorhis on February 20, 2021, 3:46 pm
Quote from dandl on February 20, 2021, 10:22 am

I have always been firmly in the static typing camp, but I have to acknowledge that the research doesn't necessarily support a very strong stance. I mentioned how Uncle Bob has swung over and the basis of the argument seems to be that TDD mandates about the same amount of tests whatever your type system, but you avoid having to do all the type declaration stuff in a dynamic system. (On trying to look up solid arguments I found this which was somewhat interesting: https://labs.ig.com/static-typing-promise )

You still haven't quoted any research of any authority. What I see here is bunk.

I suppose it depends what you define as "authority."

In some circles, Bob Martin is considered as valid an authority as you can get. In others, he's ignored in favour of Knuth and other academic notables. In yet other circles, if it doesn't come straight from Microsoft, ignore it. Etc.

In computing in general we have a bit of a problem identifying authority, which is perhaps not a bad thing.

The problem with this argument (apart from the close resemblance to a religious war) is that it lumps every kind of programming in together. It's like comparing apples to oranges, battleships and the colour yellow. You just can't.

Speaking for myself, I like to have a scripting language for massaging text files or iterating over files. For this purpose I like Ruby, and I hate C#.

Then I like to have a language at the level of bits and bytes, to diddle with ports and memory, use protocols. I like C++, Python is a dog.

Then I write a few thousand lines of code for a language compiler and VM, hack it until it works, refactor until it's right. I want a static type compiler, modules and interfaces, assertions and tests (but not TDD). C# fits the bill. And so on.

When you show me research that reflects that, you have my attention.

There is a problem that research of this sort often starts with a subject group consisting of a cohort of final-year university computer science (or software engineering, if you're lucky) students, which is about as non-reflective of real industry practitioners as you can get.

Not that it matters, anyway. See my other post re language preferences. If definitive research was presented to condemn dynamic typing, would you give up Ruby?

Or if definitive research condemned static typing, would you give up C# or C++?

I think I know the answer, and it would be the same answer for the majority of us.

No, I'll never give up Ruby.  Simply cannot do that because I've never even used it.  And I'll gladly give up REXX.  And at any rate the implications you mentioned are irrelevant because in both cases the antecedent is false.  Both types of typing have their uses so neither of them will ever be "condemned".

I did present it as a rhetorical hypothetical. In short, we're not going to give up our preferred languages because some study says language x demonstrates fewer/more/larger/smaller whatevers than language y.

I'm not convinced dynamic typing has genuine uses (assuming competent developers) as much as it has places where it's acceptable. UNIX/Linux shell scripts, for example, are mostly string twiddling with some strings occasionally recognised as a number -- usually for counting, or incrementing a value to make a unique name, etc. -- so type annotations would be mostly redundant (they'd be 99% 'string') and the usual other static type restrictions and requirements unnecessarily annoying. Not necessarily bad -- and they'd actually be helpful in big scripts -- but annoying.

Which is fine. It's shell scripting.

The problem usually starts when some application developer in the back row claims that static typing is too inflexible to handle <insert application development problem here> which it can actually handle just fine -- the real problem being the claimant's programming ability, not static typing -- and because the claim comes from someone who has the ear of Management, soon Python or Ruby or PHP or shell scripts are being used for large-scale application development, where they really don't belong. They're more suited to something akin to shell scripting.

It's just that you need to understand Smout's conjecture that systems that make it easy [for their users] to make errors will also make it easy [for those same users] to correct them, and systems that make it difficult [for their users] to make errors will also make it difficult [for those same users] to correct them.  And the implications of that conjecture (even if those implications may be a bit of a stretch) on how easy it is going to be to correct errors made in a system based on dynamic typing system vs. those made in a system based on static typing.

That alludes to a real problem: Proper use of static typing is difficult for weak programmers, and a certain heavy stream of mainstream programming relies on weak programmers.

In fact, everything is difficult for weak programmers, but dynamically-typed languages have less, for lack of a better term, stuff, so maybe there's consequently less difficulty for weak programmers.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on February 20, 2021, 8:18 pm
Quote from Erwin on February 20, 2021, 7:28 pm
Quote from Dave Voorhis on February 20, 2021, 3:46 pm
Quote from dandl on February 20, 2021, 10:22 am

I have always been firmly in the static typing camp, but I have to acknowledge that the research doesn't necessarily support a very strong stance. I mentioned how Uncle Bob has swung over and the basis of the argument seems to be that TDD mandates about the same amount of tests whatever your type system, but you avoid having to do all the type declaration stuff in a dynamic system. (On trying to look up solid arguments I found this which was somewhat interesting: https://labs.ig.com/static-typing-promise )

You still haven't quoted any research of any authority. What I see here is bunk.

I suppose it depends what you define as "authority."

In some circles, Bob Martin is considered as valid an authority as you can get. In others, he's ignored in favour of Knuth and other academic notables. In yet other circles, if it doesn't come straight from Microsoft, ignore it. Etc.

What struck me from the Uncle Bob piece you linked is how out of touch he is. In 2016 he's just discovered 'Option' types (aka Maybe); and it's his quoted example for "assiduous about types". They've been in ML since the '80's, Haskell since 1989; and arguably they're the same idea as in Pascal discriminated unions from the '70's.

I would have thought a language construct that (in effect) stops you dereferencing Nil pointers is hardly 'assiduous'.

In computing in general we have a bit of a problem identifying authority, which is perhaps not a bad thing.

 

 

I did present it as a rhetorical hypothetical. In short, we're not going to give up our preferred languages because some study says language x demonstrates fewer/more/larger/smaller whatevers than language y.

I'm not convinced dynamic typing has genuine uses (assuming competent developers) as much as it has places where it's acceptable. UNIX/Linux shell scripts, for example, are mostly string twiddling with some strings occasionally recognised as a number -- usually for counting, or incrementing a value to make a unique name, etc. -- so type annotations would be mostly redundant (they'd be 99% 'string') and the usual other static type restrictions and requirements unnecessarily annoying. Not necessarily bad -- and they'd actually be helpful in big scripts -- but annoying.

Isn't that because String is the universal type, rather than that typing is not applicable? The System/38 scripting language kept up some sort of differentiation between directory names vs file names vs error messages vs numbers vs Booleans. Your "99% 'string'" comment seems to be going back to regarding 'type' as tantamount to PhysRep. Whereas 'type' in richly-typed languages is much more to express PossRep/logical representation.

 

That alludes to a real problem: Proper use of static typing is difficult for weak programmers, and a certain heavy stream of mainstream programming relies on weak programmers.

In fact, everything is difficult for weak programmers, but dynamically-typed languages have less, for lack of a better term, stuff, so maybe there's consequently less difficulty for weak programmers.

I still don't see how any programmer could be happy with a language that's entirely comfortable with adding a quantity to a price, multiplying the sum by a length, and putting the result in a weight. I'm talking about SQL, and its miasma that 'type' means PhysRep.

Quote from AntC on February 20, 2021, 10:02 pm
Quote from Dave Voorhis on February 20, 2021, 8:18 pm
Quote from Erwin on February 20, 2021, 7:28 pm
Quote from Dave Voorhis on February 20, 2021, 3:46 pm
Quote from dandl on February 20, 2021, 10:22 am

I have always been firmly in the static typing camp, but I have to acknowledge that the research doesn't necessarily support a very strong stance. I mentioned how Uncle Bob has swung over and the basis of the argument seems to be that TDD mandates about the same amount of tests whatever your type system, but you avoid having to do all the type declaration stuff in a dynamic system. (On trying to look up solid arguments I found this which was somewhat interesting: https://labs.ig.com/static-typing-promise )

You still haven't quoted any research of any authority. What I see here is bunk.

I suppose it depends what you define as "authority."

In some circles, Bob Martin is considered as valid an authority as you can get. In others, he's ignored in favour of Knuth and other academic notables. In yet other circles, if it doesn't come straight from Microsoft, ignore it. Etc.

What struck me from the Uncle Bob piece you linked is how out of touch he is. In 2016 he's just discovered 'Option' types (aka Maybe); and it's his quoted example for "assiduous about types". They've been in ML since the '80's, Haskell since 1989; and arguably they're the same idea as in Pascal discriminated unions from the '70's.

I would have thought a language construct that (in effect) stops you dereferencing Nil pointers is hardly 'assiduous'.

Yes. But some venerate "Uncle Bob" for documenting the SOLID OO principles, Agile boosterism and TDD, though he's more of a promoter than an originator of any of it (except for some of the SOLID principles, I think.) His venerators would wave that away as irrelevant type stuff, so why should he know it? Why aren't you using Smalltalk or Python so you don't have to do that verbose math stuff?

Sigh. Etc.

In computing in general we have a bit of a problem identifying authority, which is perhaps not a bad thing.

I did present it as a rhetorical hypothetical. In short, we're not going to give up our preferred languages because some study says language x demonstrates fewer/more/larger/smaller whatevers than language y.

I'm not convinced dynamic typing has genuine uses (assuming competent developers) as much as it has places where it's acceptable. UNIX/Linux shell scripts, for example, are mostly string twiddling with some strings occasionally recognised as a number -- usually for counting, or incrementing a value to make a unique name, etc. -- so type annotations would be mostly redundant (they'd be 99% 'string') and the usual other static type restrictions and requirements unnecessarily annoying. Not necessarily bad -- and they'd actually be helpful in big scripts -- but annoying.

Isn't that because String is the universal type, rather than that typing is not applicable?

That was what I intended, yes. I didn't intend it to indicate that typing is not applicable, because it... sort of... is. But in shell scripting, you generally don't care.

Which, I suppose, is the ideal that dynamic typing proponents aspire to -- programming where type doesn't really matter because you're orchestrating the big picture. Or something. I get the impression it's rarely given in-depth consideration, but shell scripting is sometimes mentioned as a model.

The System/38 scripting language kept up some sort of differentiation between directory names vs file names vs error messages vs numbers vs Booleans. Your "99% 'string'" comment seems to be going back to regarding 'type' as tantamount to PhysRep. Whereas 'type' in richly-typed languages is much more to express PossRep/logical representation.

No, my point was only that in shell scripting you don't need to consider it. The notion of physical vs logical representation doesn't enter into it, at least not as the developer perceives it. You think in terms of getting the return value from running this program, and checking to see if it's greater than zero; and if so, generating a new file name by incrementing 'COUNTER' and passing it to that program to become its output filename, etc.

Types? What?

...

That alludes to a real problem: Proper use of static typing is difficult for weak programmers, and a certain heavy stream of mainstream programming relies on weak programmers.

In fact, everything is difficult for weak programmers, but dynamically-typed languages have less, for lack of a better term, stuff, so maybe there's consequently less difficulty for weak programmers.

I still don't see how any programmer could be happy with a language that's entirely comfortable with adding a quantity to a price, multiplying the sum by a length, and putting the result in a weight. I'm talking about SQL, and its miasma that 'type' means PhysRep.

As a strong programmer, that's important to you. You expect to be able to easily express some restriction that prevents you from adding a quantity to a price, multiplying the sum by a length, and putting the result in a weight. You appreciate the value of static type checking.

To a weak programmer, it's more than enough cognitive load to consciously avoid adding a quantity to a price, multiplying the sum by a length, and putting the result in a weight. "But why would you add a quantity to a price?" might be a typical response to being told the benefits of static type checking. Or worse -- and I've heard this one more than a few times -- "What if you want to add a quantity to a price?"

To the weak programmer, the effort to identify types so that the compiler prevents you from adding a quantity to a price, multiplying the sum by a length, and putting the result in a weight is cognitive overload.

Hence the popularity of scripting languages that evade it, with dynamic (and stringly-) typed languages. Hence Python and friends. Hence SQL.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from tobega on February 20, 2021, 12:51 pm
Quote from dandl on February 20, 2021, 10:22 am

I have always been firmly in the static typing camp, but I have to acknowledge that the research doesn't necessarily support a very strong stance. I mentioned how Uncle Bob has swung over and the basis of the argument seems to be that TDD mandates about the same amount of tests whatever your type system, but you avoid having to do all the type declaration stuff in a dynamic system. (On trying to look up solid arguments I found this which was somewhat interesting: https://labs.ig.com/static-typing-promise )

You still haven't quoted any research of any authority. What I see here is bunk.

The problem with this argument (apart from the close resemblance to a religious war) is that it lumps every kind of programming in together. It's like comparing apples to oranges, battleships and the colour yellow. You just can't.

Speaking for myself, I like to have a scripting language for massaging text files or iterating over files. For this purpose I like Ruby, and I hate C#.

Then I like to have a language at the level of bits and bytes, to diddle with ports and memory, use protocols. I like C++, Python is a dog.

Then I write a few thousand lines of code for a language compiler and VM, hack it until it works, refactor until it's right. I want a static type compiler, modules and interfaces, assertions and tests (but not TDD). C# fits the bill. And so on.

When you show me research that reflects that, you have my attention.

 

 

 

You have a point about different languages being suited for different scenarios.

As regards your asking for solid research that supports my claim that the advantages of static typing are a lot less significant than what we generally want to believe, that is understandable and I suppose quite reasonable. Except I have no interest in proving anything to you and it really is your choice whether you take me seriously or not.

I am, however, interested in learning things, so your personal preferences listed above are of some interest given that they reflect your experience. And if you should want to make a claim that static typing is vastly superior in some sense, I would be happy to see what research you can provide to support that claim. It should be easy to find if it were  true, certainly a lot easier than for me to provide dozens of papers that I've come across over dozens of years that fail to prove a huge advantage of static typing.

The claim I would make is that for substantial pieces of software under active use and development and change over a prolonged period of time, on platforms that permit it, stronger typing is always the better choice, usually by a good margin. The margin is at its greatest for 'system' software such as compilers and major utilities, and for 'product' software such as games, ERP, POS and the like. The test of that claim would be to find pairs of products, comparable in most respects, at least 100KLOC, but one static and one dynamic. Comparing them would involve comparing the rate of opening and closing issues, new features versus bugs, and the effort involved.

I only know one big dynamic product: VS Code written in JS, but I assume there are others. I don't think there are very many, and that says something.

Andl - A New Database Language - andl.org
Quote from AntC on February 20, 2021, 10:02 pm
Quote from Dave Voorhis on February 20, 2021, 8:18 pm
Quote from Erwin on February 20, 2021, 7:28 pm
Quote from Dave Voorhis on February 20, 2021, 3:46 pm
Quote from dandl on February 20, 2021, 10:22 am

I have always been firmly in the static typing camp, but I have to acknowledge that the research doesn't necessarily support a very strong stance. I mentioned how Uncle Bob has swung over and the basis of the argument seems to be that TDD mandates about the same amount of tests whatever your type system, but you avoid having to do all the type declaration stuff in a dynamic system. (On trying to look up solid arguments I found this which was somewhat interesting: https://labs.ig.com/static-typing-promise )

You still haven't quoted any research of any authority. What I see here is bunk.

I suppose it depends what you define as "authority."

In some circles, Bob Martin is considered as valid an authority as you can get. In others, he's ignored in favour of Knuth and other academic notables. In yet other circles, if it doesn't come straight from Microsoft, ignore it. Etc.

What struck me from the Uncle Bob piece you linked is how out of touch he is. In 2016 he's just discovered 'Option' types (aka Maybe); and it's his quoted example for "assiduous about types". They've been in ML since the '80's, Haskell since 1989; and arguably they're the same idea as in Pascal discriminated unions from the '70's.

Well, Uncle Bob has been doing real work and devoted his efforts to maximizing the value-add for his customers that just want to use the resulting applications for their work; and educating others to be more efficient.

In that there isn't much time for looking into what's coming out of the ivory towers. Things like ML and Haskell are largely irrelevant on the ground. There's no point in even the developers looking into them, even if they spent their free time to do it, because you know ahead of time that you will never be able to convince management to use them.

2016 might have been when he retired or something to that effect so he had more time to look around.

 

I would have thought a language construct that (in effect) stops you dereferencing Nil pointers is hardly 'assiduous'.

In computing in general we have a bit of a problem identifying authority, which is perhaps not a bad thing.

 

 

I did present it as a rhetorical hypothetical. In short, we're not going to give up our preferred languages because some study says language x demonstrates fewer/more/larger/smaller whatevers than language y.

I'm not convinced dynamic typing has genuine uses (assuming competent developers) as much as it has places where it's acceptable. UNIX/Linux shell scripts, for example, are mostly string twiddling with some strings occasionally recognised as a number -- usually for counting, or incrementing a value to make a unique name, etc. -- so type annotations would be mostly redundant (they'd be 99% 'string') and the usual other static type restrictions and requirements unnecessarily annoying. Not necessarily bad -- and they'd actually be helpful in big scripts -- but annoying.

Isn't that because String is the universal type, rather than that typing is not applicable? The System/38 scripting language kept up some sort of differentiation between directory names vs file names vs error messages vs numbers vs Booleans. Your "99% 'string'" comment seems to be going back to regarding 'type' as tantamount to PhysRep. Whereas 'type' in richly-typed languages is much more to express PossRep/logical representation.

 

That alludes to a real problem: Proper use of static typing is difficult for weak programmers, and a certain heavy stream of mainstream programming relies on weak programmers.

In fact, everything is difficult for weak programmers, but dynamically-typed languages have less, for lack of a better term, stuff, so maybe there's consequently less difficulty for weak programmers.

I still don't see how any programmer could be happy with a language that's entirely comfortable with adding a quantity to a price, multiplying the sum by a length, and putting the result in a weight. I'm talking about SQL, and its miasma that 'type' means PhysRep.

Indeed, but now we're talking stronger type systems than those normally considered, which Ada has but got lost in the mad rush to C and its later derivatives. In java you can of course wrap an integer in a class and validate bounds and give it a more precise meaning, but it should be easier to do, now it's just a bit too much for even those who care to actually do it. In java we have previously been somewhat saved by the fact that interfaces have carried semantic meaning so two functions with the same signature are not equivalent if they are part of different interfaces. I suppose that is now lost with lambdas and method references.

Quote from dandl on February 20, 2021, 11:54 pm
Quote from tobega on February 20, 2021, 12:51 pm
Quote from dandl on February 20, 2021, 10:22 am

I have always been firmly in the static typing camp, but I have to acknowledge that the research doesn't necessarily support a very strong stance. I mentioned how Uncle Bob has swung over and the basis of the argument seems to be that TDD mandates about the same amount of tests whatever your type system, but you avoid having to do all the type declaration stuff in a dynamic system. (On trying to look up solid arguments I found this which was somewhat interesting: https://labs.ig.com/static-typing-promise )

You still haven't quoted any research of any authority. What I see here is bunk.

The problem with this argument (apart from the close resemblance to a religious war) is that it lumps every kind of programming in together. It's like comparing apples to oranges, battleships and the colour yellow. You just can't.

Speaking for myself, I like to have a scripting language for massaging text files or iterating over files. For this purpose I like Ruby, and I hate C#.

Then I like to have a language at the level of bits and bytes, to diddle with ports and memory, use protocols. I like C++, Python is a dog.

Then I write a few thousand lines of code for a language compiler and VM, hack it until it works, refactor until it's right. I want a static type compiler, modules and interfaces, assertions and tests (but not TDD). C# fits the bill. And so on.

When you show me research that reflects that, you have my attention.

 

 

 

You have a point about different languages being suited for different scenarios.

As regards your asking for solid research that supports my claim that the advantages of static typing are a lot less significant than what we generally want to believe, that is understandable and I suppose quite reasonable. Except I have no interest in proving anything to you and it really is your choice whether you take me seriously or not.

I am, however, interested in learning things, so your personal preferences listed above are of some interest given that they reflect your experience. And if you should want to make a claim that static typing is vastly superior in some sense, I would be happy to see what research you can provide to support that claim. It should be easy to find if it were  true, certainly a lot easier than for me to provide dozens of papers that I've come across over dozens of years that fail to prove a huge advantage of static typing.

The claim I would make is that for substantial pieces of software under active use and development and change over a prolonged period of time, on platforms that permit it, stronger typing is always the better choice, usually by a good margin. The margin is at its greatest for 'system' software such as compilers and major utilities, and for 'product' software such as games, ERP, POS and the like. The test of that claim would be to find pairs of products, comparable in most respects, at least 100KLOC, but one static and one dynamic. Comparing them would involve comparing the rate of opening and closing issues, new features versus bugs, and the effort involved.

I think you are probably correct, the consensus seems to be that types scale better than tests. A caveat is that to replace tests we generally need a much stronger type system than most languages provide, and/or it needs to be less arcane to use (and perhaps also actually run faster, I gather that the proving languages can take hours to do the type checking/proof).

A study similar to what you propose was done on github data around 2017 and got a bit of buzz. It seemed to show an advantage for functional programming (but not, I think, for static typing). However a second team failed to replicate and also uncovered several difficulties in making the data comparable.

I only know one big dynamic product: VS Code written in JS, but I assume there are others. I don't think there are very many, and that says something.

YouTube is another, but I suppose that would have to be studied by someone at Google.

On the other hand, if you're mostly building microservices, you have several entirely separate small projects, so the effects of scale probably don't matter in the same way.

Quote from Erwin on February 20, 2021, 7:03 pm
Quote from tobega on February 11, 2021, 8:54 am

From an article I came across (https://towardsdatascience.com/apache-arrow-read-dataframe-with-zero-memory-69634092b1a):

Traditionally, data is stored on disk in a row-by-row manner. Columnar storage was born out of the necessity to analyze large datasets and aggregate them efficiently. Data analytics is less interested in rows of data (e.g. one customer transaction, one call log, …) but on aggregations thereof (e.g. total amount spent by customer, total call minutes by region, …).

which tied into a thought I had been pondering about the efficiency of checking for set-ness all the time, i.e. is the SQL bag way more efficient in general? What does a set-algebra buy us of value?

When it comes to storage, having each tuple be unique is just one more constraint, I suppose. I  feel there could be problems when we involve blobs/clobs. Or what about just filenames, where the referenced files have identical contents? If we sometimes have to account for tricky notions of equality and distinctness, is it better to always have to consider these things?

Getting back to data analytics, that is of course vastly different from managing a supplier database. You have data pumping in at an alarming rate, so it's maybe just a completely different thing that should have different tools.

I have skimmed most of the responses and I don't think what I want to mention in connection with this has already been touched on.  Apologies if it has and I'm redundantly stating facts ( :-) ) about bag algebra.

Bag algebra "creates" issues of "how to handle the duplicates" for every single operator of the relation algebra for which an equivalent/analogue/... is to be included in the bag algebra.  Every single one.

When mimicking the operators of the RA in some "bag algebra" to be newly devised, the following basic principles determine the "conceptual integrity" (thank you Fred Brooks) of that new algebra :

  • can a tuple be "matched" more than once with tuples from the other argument of the [binary] operator at hand ?
  • Are you even going to "match" at all ?
  • what about commutativity of the operator at hand ?
  • ... (myself hoping to find out as I'm writing this)

For example, when INTERSECTing a bag of two identical tuples (say, B2) with a bag of three identical tuples (say, B3) that also happen to identical to those in B2, are you going to make INTERSECT return a bag of cardinality 2 or of cardinality 3 ?  Don't think for a moment that either option won't have its use cases.

  • if the answer to bullet 1 is Y, then the result is of cardinality 2 and B2 INTERSECT B3 === B3 INTERSECT B2   (BA operator is commutative)
  • if the answer to bullet 1 is N, then on the intuitive face of it it seems like B2 INTERSECT B3 must yield cardinality 2 and B3 INTERSECT B2 must yield cardinality 3 (operator not commutative).

When UNIONing B2 with B3 :

  • If the answer to bullet 2 is N, then you're going to return a bag of cardinality 5, but note that B2 UNION B2 \= B2 !!!
  • If the answer to bullet 2 is Y, then you're going to return a bag of cardinality 3, and B2 UNION B2 === B2
  • (observe that in both cases, B2 INTERSECT B3 === B3 INTERSECT B2   (BA operator is commutative)

When MINUSing the two :

  • If the answer to bullet 1 is Y, then B2 MINUS B3 === B3 MINUS B2 === FI (empty bag)
  • If the answer to bullet 1 is N, then B3 MINUS B2 will yield a bag of cardinality 1.

For each of the three foregoing operators, think (and I mean long and thoroughly) about ***how*** you would formulate the predicate of the INTERSECT/UNION/MIUNUS expression such that its extension is indeed the resulting bag value (as is indeed the case with the operators of the ***relational*** algebra).

JOIN is even worse.

  • Say B21 = { TUP {A1 'A' B2A2 'Z'} TUP{A1 'A' B2A2 'Z'} }
  • Say B22 = { TUP {A1 'A' B2A2 'Z'} TUP{A1 'A' B2A2 'X'} }   (in fact a relation)
  • Say B31 = { TUP {A1 'A' B3A2 'Z'} TUP{A1 'A' B3A2 'Z'} TUP{A1 'A' B3A2 'Z'} }
  • Say B32 = { TUP {A1 'A' B3A2 'Z'} TUP{A1 'A' B3A2 'Y'} TUP{A1 'A' B3A2 'X'} }   (in fact a relation)

When JOINing, we assume the answer to bullet 2 is Y because at least superficially it doesn't seem to make sense to "not match at all" in a JOIN operator.  Furthermore, the most intuitive option for defining what it means to "match" in a JOIN-like operator, is "equality of the sub-tuple formed by the [tuple] projection on the set of JOIN attributes of both tuples considered for matching".  Then, if the answer to bullet 1 is N (each and every tuple matched at most once) :

  • B22 JOIN B32 is indeterminate in general because there's no way of controlling/predicting/... which of the particular B32 tuples will be matched with some specific B22 tuple, so in general, you get that B22 JOIN B32 \= B22 JOIN B32   (!!!!!!!!!!!!).  Ditto for B21 JOIN B32.

So it seems like considering the analogue of a JOIN operator in "bag algebra" ***FORCES*** us to accept that the answer to bullet 1 must necessarily be Y, and then even at the very best of circumstances we're left (as a DBMS writer) with the problem of how to support the "reasonable" use cases of INTERSECT/UNION/MINUS where the answer to bullet 1 was in fact N.

I have no doubt this "analysis" is extremely superficial, but it's the best I have to offer at the moment : what "set-ness" has to offer is that it makes the ensuing, eurhm, "data-paradigm-algebra" :

  • definable for the definers
  • implementable for the implementors
  • both understandable and manageable for the users (= it becomes manageable for the user to store the entire set of operators in that part of their brains that we call in my native language "parate kennis" - that proper subset of all the things we have stored in our brains that we use on a day-to-day basis and that our brains make accessible for ourselves in a whimpse (*) )

 

(*) The first time I used that word [on this forum] it turned out that I had subconsciously conflated the words "whiff" and "glimpse" when I wasn't even aware that the former was indeed a word.  I was so proud of that after reading the comment pointing that out to me, that I've been seeking ever since for a good excuse to use it a second time :-)  Thanks for the occasion :-)

 

Thanks, Erwin, this is exactly the type of thing I've been looking for. So I can definitely strike "bag" off the list and consider other ways to provide guidance for using the powerful relational operations.

On the other hand, if you're mostly building microservices, you have several entirely separate small projects, so the effects of scale probably don't matter in the same way.

The guiding principle for APL was said to be write-only code. I read it, I don't understand it, but I can rewrite it as a one-liner so who cares? Heaps of code is in that category.

Most academics and most bloggers spend most of their time with toy programs, talking to each other or to novices. They say weird things. Not interesting.

My interest and TTM is enterprise-grade, foundational, durable, scalable software. In that arena I don't see dynamic languages offering any serious competition. But feel free to surprise me.

Andl - A New Database Language - andl.org
Quote from Dave Voorhis on February 20, 2021, 3:34 pm
Quote from tobega on February 20, 2021, 1:13 pm

As I feared we got sidelined in a typing discussion, although I suppose it has its purpose too. I'm certainly looking again what's come up on the internet about the matter.

Are there any comments on the idea that the quality of code performing relational operations can be assessed by the quality of the predicate being produced? Maybe that is just obvious to all but me?

Perhaps. Maybe that's because (to me, at least) a predicate just... Is. I'm not sure what "quality of the predicate" means.

I think an analogy for what I mean by "quality" of a predicate is what would be meant by "quality" of a name.

I once reviewed code purporting to be a framework to test that two images were equal. When I saw that the variable "pixel" actually referred to an index to the array of pixels, I asked it to be changed to pixelIndex. A few similar changes down the line it became embarassingly obvious that the framework merely tested that two arrays of equal size had the same indexes.

As for static-vs-dynamic typing, no matter how much research there is (or isn't) for one side or the other, it comes down heavily to personal preference. Some of us prefer statically-typed languages for everything, some for some things and not others, some prefer them for nothing, etc. I suspect personal language preference drives far, far, far more than we'd like to think (even with knowing it drives a lot), at least from an engineering point of view.

Indeed, it almost certainly drives far more than we'd like to think in all engineering disciplines, not just (relatively undisciplined) software engineering.

True

What things, if any, about a predicate statement could be considered to be a code smell? Can we say, for example, that a workable relation must consist of a conjuction of facts in definite form? Does a tuple need to be self-contained or can it refer to other factors not specified in the tuple in any way?

Thinking further, I suppose the "type" of a relation should include the predicate? What if it could be represented as part of the type system and the compiler could type check the derived predicates resulting from operations?

Sorry, you've lost me a bit here. I can infer things, but they're probably not what you've intended. Examples?

I am speculating quite a bit, but if we were able to specify the way the data relates to each other, as part of the type system, the inference could be done by the compiler. If that then doesn't match what you claim it to be elsewhere, we have a bug.