Kaldanes and high-speed joins
Quote from Hugh on January 29, 2019, 3:11 pmI've received an email pointing me at https://github.com/charlesone/Kaldanes. It's not something I feel able to study or comment on nowadays but I thought it might be of interest to some people here.
Hugh
I've received an email pointing me at https://github.com/charlesone/Kaldanes. It's not something I feel able to study or comment on nowadays but I thought it might be of interest to some people here.
Hugh
Quote from dandl on January 31, 2019, 12:19 amGiven how quiet the list has been in recent times I hate to see a post go past in deathly silence.
But I do struggle to see the interest. I like the weird Sci-Fi reference (I read the book a very long time ago) but it just seems an arcane way to speed up some joins on some data for some applications. Or did I miss something?
Given how quiet the list has been in recent times I hate to see a post go past in deathly silence.
But I do struggle to see the interest. I like the weird Sci-Fi reference (I read the book a very long time ago) but it just seems an arcane way to speed up some joins on some data for some applications. Or did I miss something?
Quote from cjohnson on January 31, 2019, 9:30 amRelative increases in performance and efficiency over everything previous are one thing. Microsecond times to complete joins across large numbers of potentially enormous tables are ... I hate to say the trite phrase, but that would be a "game-changer".
Think of it this way, these servers could live in dark silicon and fire up and complete their task, having projection and selection predicates pushed down to them in the request, and streaming rowsets back in 1/100 of a second against mmapped data on a memory fabric. This could allow a single server to saturate the NIC while using little power. A datacenter could be shrunken to fit in a rack in every internet point of presence.
Oh, and yes, new kinds of generic computing in C++ using deep inference do appear arcane, that is until they become the mainstream, which they will because of the performance and efficiency boost. Believe me, I'm no genius, and I was able to systematize these methods, so anyone can do this. It only remains to apply these techniques to graph computing, min-cut and max-flow, etc. That will become (IMHO) a revolution that you cannot miss.
The mental picture of sorting heads was a good mnemonic to the data access scale-up methods. It also fits the demographic of many decision makers, luckily.
Relative increases in performance and efficiency over everything previous are one thing. Microsecond times to complete joins across large numbers of potentially enormous tables are ... I hate to say the trite phrase, but that would be a "game-changer".
Think of it this way, these servers could live in dark silicon and fire up and complete their task, having projection and selection predicates pushed down to them in the request, and streaming rowsets back in 1/100 of a second against mmapped data on a memory fabric. This could allow a single server to saturate the NIC while using little power. A datacenter could be shrunken to fit in a rack in every internet point of presence.
Oh, and yes, new kinds of generic computing in C++ using deep inference do appear arcane, that is until they become the mainstream, which they will because of the performance and efficiency boost. Believe me, I'm no genius, and I was able to systematize these methods, so anyone can do this. It only remains to apply these techniques to graph computing, min-cut and max-flow, etc. That will become (IMHO) a revolution that you cannot miss.
The mental picture of sorting heads was a good mnemonic to the data access scale-up methods. It also fits the demographic of many decision makers, luckily.
Quote from cjohnson on February 9, 2019, 7:43 pmOne of the reasons the Kaldanes approach scales arbitrarily better than C++ standard library programs is presented here.
One of the reasons the Kaldanes approach scales arbitrarily better than C++ standard library programs is presented here.
Quote from Erwin on February 11, 2019, 12:28 pmAvoid disk, keep all in memory is not a revolutionary idea. And it has its limits. Avoid memory, keep it in the CPU cache is not a revolutionary idea. And it has its limits. Avoid context switching is not a revolutionary idea. And it has its limits. Highly optimized strategies for string processing could be interesting, but don't have so much to do with JOINs in general (perhaps up to 99% of all real-life JOINs may well be not over strings to boot - avoid strings, keep it numeric is also an age-old idea known to benefit efficiency). They're just one more implementation technique. And I seriously doubt any such strategy would suddenly "make an entire datacenter fit in a rack". I also doubt performance improvements starting at the point where the data has gotten into memory are going to make any significant difference in large data banks in general, seeing as in those environments the major portion of the cost is still getting the data in memory in the first place.
Avoid disk, keep all in memory is not a revolutionary idea. And it has its limits. Avoid memory, keep it in the CPU cache is not a revolutionary idea. And it has its limits. Avoid context switching is not a revolutionary idea. And it has its limits. Highly optimized strategies for string processing could be interesting, but don't have so much to do with JOINs in general (perhaps up to 99% of all real-life JOINs may well be not over strings to boot - avoid strings, keep it numeric is also an age-old idea known to benefit efficiency). They're just one more implementation technique. And I seriously doubt any such strategy would suddenly "make an entire datacenter fit in a rack". I also doubt performance improvements starting at the point where the data has gotten into memory are going to make any significant difference in large data banks in general, seeing as in those environments the major portion of the cost is still getting the data in memory in the first place.
Quote from cjohnson on February 12, 2019, 4:31 amI agree with that last point: "the major portion of the cost is still getting the data in memory in the first place." And the data I pointed to in the posting I pointed to makes the case that the containers and the objects in them formed from the C++ standard library are the culprit committing that crime.
It really does not have to take all that long to load a database into memory with single-allocation slabs, it's mighty quick without the extra overhead of fine-grained thread-mutable objects and fine-grained thread-mutable containers that are not linear array offset indexed. Array indexing, offset multiplication and pointer addition are all constant time. Add in C++ generic programming at compile time and all you have left is the data movement and that is cache pre-fetched in slabs.
The C++ standard library is way too slow and scales way too horribly, to survive into the next generation of computer architecture: memory-centric computing. The future is constructed of ... nothing ... no buffering, no marshaling of data, into no network messages, and no file writes. What we have now will be fine for little compute, though, just not too much of it.
The large shared memory systems are coming, in fits and starts, with Intel (of all things) leading the way with 3D XPoint, basically doing a warmed over version of old-style virtual memory (and not shared, either.) Sooner or later, the crossbar fabrication problem will be solved and someone will be using "ions for memory" in deep density shared across nodes with the Gen-Z memory fabric.
For now, 3D XPoint is sufficient to make my point. The days of marshaling and multiplexing data, shipping buffers, and de-multiplexing and de-marshaling data are numbered. Stonebraker made this very case in pushing VoltDB, and we are standing on the shoulders of that giant (and listening very closely.)
I agree with that last point: "the major portion of the cost is still getting the data in memory in the first place." And the data I pointed to in the posting I pointed to makes the case that the containers and the objects in them formed from the C++ standard library are the culprit committing that crime.
It really does not have to take all that long to load a database into memory with single-allocation slabs, it's mighty quick without the extra overhead of fine-grained thread-mutable objects and fine-grained thread-mutable containers that are not linear array offset indexed. Array indexing, offset multiplication and pointer addition are all constant time. Add in C++ generic programming at compile time and all you have left is the data movement and that is cache pre-fetched in slabs.
The C++ standard library is way too slow and scales way too horribly, to survive into the next generation of computer architecture: memory-centric computing. The future is constructed of ... nothing ... no buffering, no marshaling of data, into no network messages, and no file writes. What we have now will be fine for little compute, though, just not too much of it.
The large shared memory systems are coming, in fits and starts, with Intel (of all things) leading the way with 3D XPoint, basically doing a warmed over version of old-style virtual memory (and not shared, either.) Sooner or later, the crossbar fabrication problem will be solved and someone will be using "ions for memory" in deep density shared across nodes with the Gen-Z memory fabric.
For now, 3D XPoint is sufficient to make my point. The days of marshaling and multiplexing data, shipping buffers, and de-multiplexing and de-marshaling data are numbered. Stonebraker made this very case in pushing VoltDB, and we are standing on the shoulders of that giant (and listening very closely.)