What do set-based operations buy us?

#51 · February 22, 2021, 7:58 pm

Quote from tobega on February 22, 2021, 7:58 pm

Quote from Dave Voorhis on February 22, 2021, 4:39 pm

Quote from tobega on February 22, 2021, 4:31 pm

Quote from Dave Voorhis on February 22, 2021, 1:51 pm

Quote from tobega on February 22, 2021, 11:59 am

Quote from Erwin on February 21, 2021, 11:15 pm

Quote from tobega on February 21, 2021, 7:52 am

if we were able to specify the way the data relates to each other, as part of the type system

I want to comment on that one because imo it betrays a fundamental lack of understanding of how we ever arrived at making these digital computers do for us what they do.

Computers compute. Nothing more and nothing less. They carry out operations of some algebra and to have an algebra in the first place requires to have a "system" of types that the algebra is defined over. No meaning, no interpretation, just the pure simple fact that 1+1=2, regardless of whether it's humans or furlongs or grains of sand in the desert.

Interpretation and "meaning" and concepts such as "data relating to other data" is ***tacked onto that system of algebraic computation***, and in the "constructionist view" of how things are built, it means the algebraic system must exist before any question of "data relating to other data" can be answered, and therefore making "data relating to other data" a problem to be solved by the type system, necessarily leads to circular dependencies in whatever it is that gets set up this way.

Or maybe I'm just too deeply entrenched in my "constructionist view"

A type system is something that is outside the calculation/program itself in an attempt to prove, or partially prove, that the program is correct.

Adding 1 + 1 = 2, with types 1m+1m = 2m, but adding 1m + 1s = what? And dividing 1m/1s = 1 m/s, another type derived from the first two. This is types in action.

Each attribute of a relation can be assigned a type to help us prove that we are doing something sane. Extending this thought, it's at least thinkable that a relation can be viewed as a type in itself, and that such a type can be something more specific than just being a collection of attributes. If it were possible to specify such things, then they could be used to more completely prove the correctness of a program.

Then the whole question is if it is worth doing the extra work of specifying types or not, which conflates with personal preferences and, when you bring in economics, a trade-off between the cost of a bug versus the cost of the extra work. Interestingly, I just came across a study that seems to indicate that it takes less time to fix type errors in a dynamic language than it takes to write the type information in the statically typed languaged, https://courses.cs.washington.edu/courses/cse590n/10au/hanenberg-oopsla2010.pdf (in there is also referenced an experiment comparing Java and Groovy, which is essentially dynamically typed Java, as Dave postulated earlier). It also seems from other sources that type errors generally are a very small part of the totality of bugs, 1-3%.

Two things:

Type errors aren't really the point. Researchers often seem to think it's all about type safety. It's not all about type safety, though for many mission critical applications if the compiler catches 1% to 3% of bugs that might otherwise not get caught by unit tests or code reviews or whatever, then that's enough to justify static typing right there -- particularly on systems where any downtime due to run-time bugs (or delays due to development bugfix blockers) may represent significant cost.

The real point is enforced readability. That's more about appropriate use of type annotations vs no type annotations; the former usually being associated with statically typed languages and the latter with dynamically typed languages, though they're notionally orthogonal. What would be more meaningful than comparing fixing type bugs vs writing type annotations would be a comparison between the time it takes to write type annotations vs the time it takes to grok complex code in their absence.

But again, no amount of presented research is likely to shift any individual developer's personal preference, and (I point out somewhat pessimistically) it's not likely to shift management choices until it's so wrapped in vendor marketing efforts as to eliminate any real value.

99% of software isn't mission critical, but you are right, it is a cost function and a trade-off. Hopefully not as cynical as when an automobile manufacturer purportedly decided that it was cheaper to pay punitive damages for deaths from brake failure than it was to recall the line.

And there is support for that:

According to Hanenberg, S., Kleinschmager, S., Robbes, R. et al. in An empirical study on the impact of static typing on software maintainability. (2014):

This paper describes an experiment that tests whether static type systems improve the maintainability of software systems, in terms of understanding undocumented code, fixing type errors, and fixing semantic errors. The results show rigorous empirical evidence that static types are indeed beneficial to these activities, except when fixing semantic errors.

(found that quote, haven't paid for the paper)

I suspect almost any working developer who's had to maintain legacy code in both statically (manifestly) typed languages and dynamically typed languages would say you don't need to buy the paper...

Because it's bleedin' obvious.

Except it's still not obvious that it's a net gain. If it takes 5 years before a javascript application collapses under its own weight, maybe it was about time to rewrite it from scratch anyway, get rid of some old cruft and freshen everything up. It depends.

I suspect the majority of the same developers would point out that dynamic typing doesn't really gain you any development time, either -- or at least it doesn't save enough keystrokes to make up for the additional mental load and readability effort (unless your keyboard skillz are really sl-o-o-o--o--o---o----w.)

Actually the developers who prefer dynamic languages are saying that it saves you time even when they don't get code completion in their IDE. It probably depends on what you are working on and also how you like to work. You get a much faster turnaround time when you can just run snippets of your code to try things out.

Perhaps it saves time if you're writing wee scripts and toy programs.

Typical applications, no.

I don't know, I haven't tried anything dynamic in anger, and it isn't my personal preference, but there people who have tried both who would disagree and yet others who would agree with you.

At the end of the day, other factors are orders of magnitude more important and more impactful.

Quote from Dave Voorhis on February 22, 2021, 4:39 pm

Quote from tobega on February 22, 2021, 4:31 pm

Quote from Dave Voorhis on February 22, 2021, 1:51 pm

Quote from tobega on February 22, 2021, 11:59 am

Quote from Erwin on February 21, 2021, 11:15 pm

Quote from tobega on February 21, 2021, 7:52 am

if we were able to specify the way the data relates to each other, as part of the type system

I want to comment on that one because imo it betrays a fundamental lack of understanding of how we ever arrived at making these digital computers do for us what they do.

Computers compute. Nothing more and nothing less. They carry out operations of some algebra and to have an algebra in the first place requires to have a "system" of types that the algebra is defined over. No meaning, no interpretation, just the pure simple fact that 1+1=2, regardless of whether it's humans or furlongs or grains of sand in the desert.

Interpretation and "meaning" and concepts such as "data relating to other data" is ***tacked onto that system of algebraic computation***, and in the "constructionist view" of how things are built, it means the algebraic system must exist before any question of "data relating to other data" can be answered, and therefore making "data relating to other data" a problem to be solved by the type system, necessarily leads to circular dependencies in whatever it is that gets set up this way.

Or maybe I'm just too deeply entrenched in my "constructionist view"

A type system is something that is outside the calculation/program itself in an attempt to prove, or partially prove, that the program is correct.

Adding 1 + 1 = 2, with types 1m+1m = 2m, but adding 1m + 1s = what? And dividing 1m/1s = 1 m/s, another type derived from the first two. This is types in action.

Each attribute of a relation can be assigned a type to help us prove that we are doing something sane. Extending this thought, it's at least thinkable that a relation can be viewed as a type in itself, and that such a type can be something more specific than just being a collection of attributes. If it were possible to specify such things, then they could be used to more completely prove the correctness of a program.

Then the whole question is if it is worth doing the extra work of specifying types or not, which conflates with personal preferences and, when you bring in economics, a trade-off between the cost of a bug versus the cost of the extra work. Interestingly, I just came across a study that seems to indicate that it takes less time to fix type errors in a dynamic language than it takes to write the type information in the statically typed languaged, https://courses.cs.washington.edu/courses/cse590n/10au/hanenberg-oopsla2010.pdf (in there is also referenced an experiment comparing Java and Groovy, which is essentially dynamically typed Java, as Dave postulated earlier). It also seems from other sources that type errors generally are a very small part of the totality of bugs, 1-3%.

Two things:

Type errors aren't really the point. Researchers often seem to think it's all about type safety. It's not all about type safety, though for many mission critical applications if the compiler catches 1% to 3% of bugs that might otherwise not get caught by unit tests or code reviews or whatever, then that's enough to justify static typing right there -- particularly on systems where any downtime due to run-time bugs (or delays due to development bugfix blockers) may represent significant cost.

The real point is enforced readability. That's more about appropriate use of type annotations vs no type annotations; the former usually being associated with statically typed languages and the latter with dynamically typed languages, though they're notionally orthogonal. What would be more meaningful than comparing fixing type bugs vs writing type annotations would be a comparison between the time it takes to write type annotations vs the time it takes to grok complex code in their absence.

But again, no amount of presented research is likely to shift any individual developer's personal preference, and (I point out somewhat pessimistically) it's not likely to shift management choices until it's so wrapped in vendor marketing efforts as to eliminate any real value.

99% of software isn't mission critical, but you are right, it is a cost function and a trade-off. Hopefully not as cynical as when an automobile manufacturer purportedly decided that it was cheaper to pay punitive damages for deaths from brake failure than it was to recall the line.

And there is support for that:

According to Hanenberg, S., Kleinschmager, S., Robbes, R. et al. in An empirical study on the impact of static typing on software maintainability. (2014):

This paper describes an experiment that tests whether static type systems improve the maintainability of software systems, in terms of understanding undocumented code, fixing type errors, and fixing semantic errors. The results show rigorous empirical evidence that static types are indeed beneficial to these activities, except when fixing semantic errors.

(found that quote, haven't paid for the paper)

I suspect almost any working developer who's had to maintain legacy code in both statically (manifestly) typed languages and dynamically typed languages would say you don't need to buy the paper...

Because it's bleedin' obvious.

Except it's still not obvious that it's a net gain. If it takes 5 years before a javascript application collapses under its own weight, maybe it was about time to rewrite it from scratch anyway, get rid of some old cruft and freshen everything up. It depends.

I suspect the majority of the same developers would point out that dynamic typing doesn't really gain you any development time, either -- or at least it doesn't save enough keystrokes to make up for the additional mental load and readability effort (unless your keyboard skillz are really sl-o-o-o--o--o---o----w.)

Actually the developers who prefer dynamic languages are saying that it saves you time even when they don't get code completion in their IDE. It probably depends on what you are working on and also how you like to work. You get a much faster turnaround time when you can just run snippets of your code to try things out.

Perhaps it saves time if you're writing wee scripts and toy programs.

Typical applications, no.

I don't know, I haven't tried anything dynamic in anger, and it isn't my personal preference, but there people who have tried both who would disagree and yet others who would agree with you.

At the end of the day, other factors are orders of magnitude more important and more impactful.

#52 · February 22, 2021, 9:24 pm

Quote from tobega on February 22, 2021, 7:58 pm

Actually the developers who prefer dynamic languages are saying that it saves you time even when they don't get code completion in their IDE. It probably depends on ... You get a much faster turnaround time when you can just run snippets of your code to try things out.

depends on ... The decision you make about what exact point in time you're going to measure "time spent". The idea an end user (who is expecting a "working product" that is, say, 99% reliable) has about that subject (point in time of actual measurement of time spent) is quite drastically different from the idea the current modern average millenial developer has about it. ("run snippets of code [that allows someone] to try things out" never was, is not now, and is never going to be, the same thing as "deliver a working product to the user where even the user ultimately agrees that it is indeed a working product".)

Pardon the rant.

Author of SIRA_PRISE

#53 · February 22, 2021, 10:43 pm

Quote from tobega on February 22, 2021, 7:58 pm

You get a much faster turnaround time when you can just run snippets of your code to try things out.

Quote from Erwin on February 22, 2021, 9:24 pm

("run snippets of code [that allows someone] to try things out" never was, is not now, and is never going to be, the same thing as "deliver a working product to the user where even the user ultimately agrees that it is indeed a working product".)

We've probably squeezed all the tasty juice out of this topic and should probably draw it to a close before we start gnawing on the rind, but these comments do make me wonder how much anti-static-typing sentiment -- at least among capable developers who can effectively code in statically-typed languages; I'm excluding weak developers who don't "get it" or who quickly paint themselves into a corner with their own incompetence -- owes to the bad old days of C++ development where tweaking a header file could set off a multi-hour build in a big, dependency-laden project.

Compared to that, PHP -- as obviously bletcherous as it was (and is) -- was like a opening the window in a dank basement to let in sunshine and fresh breezes.

Now, even C++ compilation mostly isn't a problem, and Java and C# compilation times are generally fast enough to support tweaking and code-nudging in full dynamic-language style.

Whether code-nudging is a smell or not is a different matter.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

The Forum for Discussion about The Third Manifesto and Related Matters