The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

"And the moral is, don't use Excel as a scientific database ..."

PreviousPage 3 of 5Next
Quote from dandl on January 29, 2020, 7:55 am
Quote from Dave Voorhis on January 28, 2020, 12:42 pm
Quote from dandl on January 28, 2020, 2:41 am

My bullet points are a single example of the kind of features I would like to see. There are at least dozens more, perhaps hundreds.

Yes, in Excel you can do all this and more, but you have to to it laboriously, manually. And that's what people do, because there is nothing better.

In Access you can do these things by writing code. You work on the abstract representation of  your data model, not on the data itself.

In both Excel and Access you do these things by writing code. It's the usual spreadsheet expressions and dialog box settings in Excel, VBA and dialog box settings in reports in Access. The effort is notionally the same, because you're doing the same things.

What tends to be a conceptual hurdle for many users -- at least until they've seen it done (though sometimes after, if they reject the notion) -- is that in Access you put the data in tables and the formatting in reports. Excel conflates the two inseparably, which is fine (sort of...) until you need two reports on one set of data.

Or one report on that set of data over there. Then things fall apart.

It's a useful distinction, but that's not all. I don't want to debate what coding is, except to observe that Access is a lot further along the spectrum than Excel. The distinction I'm making is earlier than yours: it's about separating the data itself from the structure imposed on it. Excel allows you to work with raw data without imposing names, types and other attributes on it: data first; then patterns and structure; then manipulation; then presentation.

[Forgot to address this in my previous post...]

It's not just a useful distinction, it's a necessary one.

Only, Excel doesn't make that distinction -- or makes it awkward to implement (multiple "Sheets" being a typical approach, but not the only one; none are good) -- and it only lets you work with "raw" data in the same idiom as some of the worst weakly-typed languages: It infers type (which is fine, in and of itself) but comingles types in a manner that makes JavaScript seem rigorous and consistent. It almost invariably needs column names for all the useful things like pivot tables, filters, etc., but doesn't control where they're put -- they're decorative, unless in the right place; and (this is the worst part) it blurs presentation, manipulation, patterns & structure, and types into an indistinct mess in a completely ad hoc fashion that ultimately means more work -- usually much more work -- than it should be for anything more complex than the most trivial presentation.

Again, this would be fine purely as the endpoint of data processing -- the last step before, say, printing a report or emailing it as a PDF -- but that's "old school" interaction. Modern corporate analytics is about pipelines of processing, with the ability to tee into the pipe at any point for presentation purposes and no "final" endpoint because output is always input to something else.

That's what needs to be facilitated -- powerful processing pipelines with presentations (aka reports) able to be tee'd off at any point -- not yet more Excel-style conflation of processing, presentation, patterns, manipulation, you-name-it in one big ball of mud.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on January 29, 2020, 7:55 am

it's about separating the data itself from the structure imposed on it. Excel allows you to work with raw data without imposing names, types and other attributes on it: data first; then patterns and structure; then manipulation; then presentation.

There is this this man who relatively frequently keeps repeating that it is ***impossible*** to "separate the data from its structure" because if you "remove the structure" you no longer have "data", the only thing you have left is random noise.  And he is right about that.  There is never ever any "data first, patterns and structure second and only after that".  From the moment I start typing "Erwin Smout TAB 7/11/1963" in a spreadsheet I ***already have*** the idea in my mind of what it means to me, as well as of the fact that that first thing is going to be a text representing a name, and that second thing a date.  And the trouble starts at times such as when I type "1/2" (FEB 1st) or "3/4" (APR 3d) in that second thing.  If I'm typing in bulk, it risks going (and passing !) unnoticed.

Quote from Erwin on January 29, 2020, 1:07 pm
Quote from dandl on January 29, 2020, 7:55 am

it's about separating the data itself from the structure imposed on it. Excel allows you to work with raw data without imposing names, types and other attributes on it: data first; then patterns and structure; then manipulation; then presentation.

There is this this man who relatively frequently keeps repeating that it is ***impossible*** to "separate the data from its structure" because if you "remove the structure" you no longer have "data", the only thing you have left is random noise.  And he is right about that.  There is never ever any "data first, patterns and structure second and only after that".  From the moment I start typing "Erwin Smout TAB 7/11/1963" in a spreadsheet I ***already have*** the idea in my mind of what it means to me, as well as of the fact that that first thing is going to be a text representing a name, and that second thing a date.  And the trouble starts at times such as when I type "1/2" (FEB 1st) or "3/4" (APR 3d) in that second thing.  If I'm typing in bulk, it risks going (and passing !) unnoticed.

You may do that, but most people don't AFAIK. It goes like this:

  1. Create a spreadsheet or a page in an existing spreadsheet.
  2. Use the top two or three rows to add a title and make some notes: "TTM forum users, Notes on people who use the forum, source sourced from: ...".
  3. Insert a row of column names: "Forum name, real name, first/last posted, no of posts, active?, company/institution, D implementation (if any), salary".
  4. Start adding data. Excel will auto-recognise date, number, money fields, autofill yes/no.
  5. Convert to List.
  6. Oops: separate first/last posted, add title.
  7. Add more data, as available.
  8. Add calculated columns: days since last post, rate of posting (/month).
  9. Sort descending by rate of posting.
  10. Add more data, as available.

You can't do that with Access or Rel or any SQL-based product I know of.

Andl - A New Database Language - andl.org
Quote from Dave Voorhis on January 29, 2020, 1:00 pm
Quote from dandl on January 29, 2020, 7:55 am
Quote from Dave Voorhis on January 28, 2020, 12:42 pm
Quote from dandl on January 28, 2020, 2:41 am

My bullet points are a single example of the kind of features I would like to see. There are at least dozens more, perhaps hundreds.

Yes, in Excel you can do all this and more, but you have to to it laboriously, manually. And that's what people do, because there is nothing better.

In Access you can do these things by writing code. You work on the abstract representation of  your data model, not on the data itself.

In both Excel and Access you do these things by writing code. It's the usual spreadsheet expressions and dialog box settings in Excel, VBA and dialog box settings in reports in Access. The effort is notionally the same, because you're doing the same things.

What tends to be a conceptual hurdle for many users -- at least until they've seen it done (though sometimes after, if they reject the notion) -- is that in Access you put the data in tables and the formatting in reports. Excel conflates the two inseparably, which is fine (sort of...) until you need two reports on one set of data.

Or one report on that set of data over there. Then things fall apart.

It's a useful distinction, but that's not all. I don't want to debate what coding is, except to observe that Access is a lot further along the spectrum than Excel. The distinction I'm making is earlier than yours: it's about separating the data itself from the structure imposed on it. Excel allows you to work with raw data without imposing names, types and other attributes on it: data first; then patterns and structure; then manipulation; then presentation.

[Forgot to address this in my previous post...]

It's not just a useful distinction, it's a necessary one.

But I already agreed, I just added one.

Only, Excel doesn't make that distinction -- or makes it awkward to implement (multiple "Sheets" being a typical approach, but not the only one; none are good) -- and it only lets you work with "raw" data in the same idiom as some of the worst weakly-typed languages: It infers type (which is fine, in and of itself) but comingles types in a manner that makes JavaScript seem rigorous and consistent. It almost invariably needs column names for all the useful things like pivot tables, filters, etc., but doesn't control where they're put -- they're decorative, unless in the right place; and (this is the worst part) it blurs presentation, manipulation, patterns & structure, and types into an indistinct mess in a completely ad hoc fashion that ultimately means more work -- usually much more work -- than it should be for anything more complex than the most trivial presentation.

Again, this would be fine purely as the endpoint of data processing -- the last step before, say, printing a report or emailing it as a PDF -- but that's "old school" interaction. Modern corporate analytics is about pipelines of processing, with the ability to tee into the pipe at any point for presentation purposes and no "final" endpoint because output is always input to something else.

That's what needs to be facilitated -- powerful processing pipelines with presentations (aka reports) able to be tee'd off at any point -- not yet more Excel-style conflation of processing, presentation, patterns, manipulation, you-name-it in one big ball of mud.

No argument about the need for something better, but I don't see anything remotely as accessible for those early stages of creating and rearranging tables. Access makes it way too hard to get started, and too hard to just move things around. Given how old Excel and Access are, surely it's time for something better? AFAICT 'modern corporate analytics' is quite well served by heavyweight products extracting profit from big data. That's not my interest.

Google just released a search engine for millions of datasets, and I'm betting that lots of very competent people are going to download those datasets into Excel first, just to get a look at the data and do a bit of poking around. I know I would. Later I might import selected tables into something like Access for serious reporting, but not at design end of the process.

Andl - A New Database Language - andl.org
Quote from dandl on January 30, 2020, 4:24 am
Quote from Erwin on January 29, 2020, 1:07 pm
Quote from dandl on January 29, 2020, 7:55 am

it's about separating the data itself from the structure imposed on it. Excel allows you to work with raw data without imposing names, types and other attributes on it: data first; then patterns and structure; then manipulation; then presentation.

There is this this man who relatively frequently keeps repeating that it is ***impossible*** to "separate the data from its structure" because if you "remove the structure" you no longer have "data", the only thing you have left is random noise.  And he is right about that.  There is never ever any "data first, patterns and structure second and only after that".  From the moment I start typing "Erwin Smout TAB 7/11/1963" in a spreadsheet I ***already have*** the idea in my mind of what it means to me, as well as of the fact that that first thing is going to be a text representing a name, and that second thing a date.  And the trouble starts at times such as when I type "1/2" (FEB 1st) or "3/4" (APR 3d) in that second thing.  If I'm typing in bulk, it risks going (and passing !) unnoticed.

You may do that, but most people don't AFAIK. It goes like this:

  1. Create a spreadsheet or a page in an existing spreadsheet.
  2. Use the top two or three rows to add a title and make some notes: "TTM forum users, Notes on people who use the forum, source sourced from: ...".
  3. Insert a row of column names: "Forum name, real name, first/last posted, no of posts, active?, company/institution, D implementation (if any), salary".
  4. Start adding data. Excel will auto-recognise date, number, money fields, autofill yes/no.
  5. Convert to List.
  6. Oops: separate first/last posted, add title.
  7. Add more data, as available.
  8. Add calculated columns: days since last post, rate of posting (/month).
  9. Sort descending by rate of posting.
  10. Add more data, as available.

You can't do that with Access or Rel or any SQL-based product I know of.

Of course you can't. You explicitly precluded it in Step 1, "Create a spreadsheet or a page in an existing spreadsheet."

Indeed, your list is a brief summary of spreadsheetish interaction. I don't expect a post-database/post-spreadsheet product to be a spreadsheet.

My tool is more like this:

  1. Enter some data in a grid.
  2. Give the grid a name.
  3. Change the column headings to "Forum name, real name, first/last posted, no of posts, active?, company/institution, D implementation (if any), salary."
  4. Start adding data to the grid. It will auto-recognise date, number, money fields, autofill yes/no. You can select them too, from options available at the top of each column.
  5. Oops: separate first/last posted, add title.
  6. Add more data to the grid, as available.
  7. Create an extend filter. Link it to the new grid.
  8. Add calculated columns to the extend filter: days since last post, rate of posting (/month).
  9. Add a sort filter. Link it to the extend filter.
  10. Add more data to the grid, as available.
I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Of course you can't. You explicitly precluded it in Step 1, "Create a spreadsheet or a page in an existing spreadsheet."

Indeed, your list is a brief summary of spreadsheetish interaction. I don't expect a post-database/post-spreadsheet product to be a spreadsheet.

Grid=~spreadsheet. Just names.

My tool is more like this:

  1. Enter some data in a grid.
  2. Give the grid a name.
  3. Change the column headings to "Forum name, real name, first/last posted, no of posts, active?, company/institution, D implementation (if any), salary."
  4. Start adding data to the grid. It will auto-recognise date, number, money fields, autofill yes/no. You can select them too, from options available at the top of each column.
  5. Oops: separate first/last posted, add title.
  6. Add more data to the grid, as available.
  7. Create an extend filter. Link it to the new grid.
  8. Add calculated columns to the extend filter: days since last post, rate of posting (/month).
  9. Add a sort filter. Link it to the extend filter.
  10. Add more data to the grid, as available.

I'm fine with that. I think there is a level above: a 'table of tables', with columns like: Name, Description, Source, etc. And perhaps another, Datasets=collections of tables.

And yes, I agree entirely with the idea of keeping the data as just data, and picking from lists, and having separate views (filters). Sort, select, join, all the TTM/SQL stuff we know and love, formatting and header/footer/total/subtotal reports like Access, charts and pivots like Excel just fit naturally on top, but separated.

More ideas.

  1. Tables all the way down. Datasets, tables, fields, rows, views/filters, report groups, etc all look like tables. I'm not sure how far that goes in how the user interacts, but a fair way I think.
  2. Instant update, total undo. Change a column data type, see the results instantly. Don't like it? Undo. Time travel.
  3. A canvas tool rather than a single grid. Show multiple tables, and links between them. See the fields grid alongside the rows grid for a table. Edit either.

That's a big job, UI-wide, just to get to square one.

Andl - A New Database Language - andl.org
Quote from dandl on January 30, 2020, 4:36 am

lots of very competent people are going to download those datasets into Excel first, just to get a look at the data and do a bit of poking around. I know I would. Later I might import selected tables into something like Access for serious reporting, but not at design end of the process.

It's not the poking around that's the problem.  It's sticking to the poking tool when it's time to move on to the non-poking tools for the non-poking stage.  But people don't do that because they feel like they've invested time in something "that works" and it feels like wasting a "valuable investment" to just throw it away.  And then they start incurring the costs and the losses that are predictable to those who know better.

Quote from Erwin on January 30, 2020, 7:44 am
Quote from dandl on January 30, 2020, 4:36 am

lots of very competent people are going to download those datasets into Excel first, just to get a look at the data and do a bit of poking around. I know I would. Later I might import selected tables into something like Access for serious reporting, but not at design end of the process.

It's not the poking around that's the problem.  It's sticking to the poking tool when it's time to move on to the non-poking tools for the non-poking stage.  But people don't do that because they feel like they've invested time in something "that works" and it feels like wasting a "valuable investment" to just throw it away.  And then they start incurring the costs and the losses that are predictable to those who know better.

Yes.

That went on constantly at my previous workplace. The Excel people were usually still poking around whilst those comfortable with Rel and MS Access (i.e., me) were pumping out -- or finished pumping out -- useful reports and analyses.

The deceptive ease of Excel is just that -- deceptive. It's not good for much, except -- as previously mentioned -- creating presentations or summaries at the ultimate endpoint of a data processing pipeline. It's awful for everything else in the pipeline, and increasingly irrelevant as "ultimate endpoints" are going away in favour of my output data becoming your input data.

 

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

The role of the programmer is to create a reproducible process in the form of a program for a computer. The process will be executed many times, with varying inputs. The terms 'non-poking stage' and 'pipeline' reflect that. The program is akin to a tool such as a hammer or chisel that can be applied to different pieces of timber.

There is another different role, which is to create a single work of some kind by employing a variety of tools. The games designer uses Maya to create an animated model. The architect uses AuoCAD to produce a set of drawings. The author uses Word to write a paper. The data worker uses XXX to create and publish/distribute a data set.

Excel is a rotten fit for the programmer tool, but arguable the best generally available desktop tool to fill the XXX slot. It's certainly the most widely used.

So take off your programmer hats and put on a data worker hat and tell me something better than Excel for doing that job.

Jupyter is worth a look, although definitely too much coding. But this is closer: https://en.wikipedia.org/wiki/Orange_(software).

Andl - A New Database Language - andl.org
Quote from dandl on January 31, 2020, 12:42 am

The role of the programmer is to create a reproducible process in the form of a program for a computer. The process will be executed many times, with varying inputs. The terms 'non-poking stage' and 'pipeline' reflect that. The program is akin to a tool such as a hammer or chisel that can be applied to different pieces of timber.

There is another different role, which is to create a single work of some kind by employing a variety of tools. The games designer uses Maya to create an animated model. The architect uses AuoCAD to produce a set of drawings. The author uses Word to write a paper. The data worker uses XXX to create and publish/distribute a data set.

Excel is a rotten fit for the programmer tool, but arguable the best generally available desktop tool to fill the XXX slot. It's certainly the most widely used.

So take off your programmer hats and put on a data worker hat and tell me something better than Excel for doing that job.

Jupyter is worth a look, although definitely too much coding. But this is closer: https://en.wikipedia.org/wiki/Orange_(software).

I don't think any professional "data worker" considers Excel to be the best tool for anything; at best it's a usable tool that everyone knows that consumes and emits a file format that everyone recognises. A few minutes ago, I coincidentally saw an apt quote:

"Excel is a bit like a Swiss Army Knife; very handy and can be turned to almost any task… … but it would never be your first choice for carving a roast."

Orange has been around for a long time -- I've had multiple students try it and be disappointed by it. I think Knime (https://www.knime.com) is currently the preferred open source choice in that space. My other half Nikki -- who until recently was heavily in the data science / analytics space before moving into a business analysis role -- used to grumble about it regularly, preferring commercial tools like Alteryx.

 

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
PreviousPage 3 of 5Next