Answer to: Recommended data structures/algorithms for checking peoples' availability schedules
Score: 4
One way to do it is to represent each user and each event as a bit map. Essentially, each half hour corresponds to a bit, the entire day corresponding to 6 bytes, one week being 42 bytes. Regarding the bits, 1 means the person is available, and 0—that the person is not available. For the events, 1 means the event is going on, and 0—quite the opposite.
For instance:
u₁ = 0b00011011100...
means that the user is available between 1:30 a.m. UTC and 2:30 a.m. UTC on Monday, then busy for half an hour, then available for an hour and a half, then busy again... The patten will go on for the remaining 336 bits (7×24×2).
Similarly, an event can be represented a bit map.
e₁ = 0b00000001100...
means that it's a one hour event that starts at 3:30 a.m. UTC on Monday. The user from the previous example can attend it, as u₁ ∧ e₁ = e₁. This very basic formula is all it gets to see if the user can attend an event.
Now, all operations become operations on bit maps (XOR, etc.), which are usually very fast and still relatively space efficient—you can use SIMD, or do the computation in GPU—essentially rely to any techniques that are used when working with graphics. Chances are, you can just load everything in memory at once, do the computation you need, then save the result.
My original answer was suggesting to use bit maps in a very narrow way: essentially, the database should store the schedules and the events in a structured way. But then, Hans-Martin Mosner
said in his comment:
In addition to being a very viable solution when keeping the bit maps in memory, this would likely even work well in the database as PostgreSQL has bit string data types with boolean operations (including aggregate functions) which may lend themselves well to combining recurrent schedules, assigned events and holiday exemptions for users through SQL statements. The list of people available to be booked on an event could be returned from a single database query.
In parallel, I was thinking (and overthinking) about the performance of my approach, but even more about the readability. Playing with code, it appeared that bit maps are not just a hack to get better performance. Instead:
They are actually the most intuitive thing when you think about schedules. Even the UI would likely represent the schedule as a bit map, where the user would be able to toggle half-hours on and off.
The code to manipulate bit maps—at least this is my guess—would be much simpler compared to code that would have to handle start time, duration and end time. Although the answer by JimmyJames could prove that I'm wrong.
Back to the subject of performance, I was wondering, how long would it actually take to compute how many people can participate to every event. After all, if bit maps are used, the problem is as simple as un ∧ en = en, and this sort of things is what computers are very efficient at.
So I did a few attempts that can be found at this GitHub repository.
The tests were running on a Intel i5-10600 3.30 GHz CPU, AMD Radeon RX 5700 XT, Force MP510, using PostgreSQL 15, Python 3.13.5. The measurements were done by running, from Bash, a given Python or C++ program, that is expected to:
Load 15,000 users and 5,000 events from the database.
Do its magic.
Output 5,000 lines containing the identifier of the event and the number of users that are available during the event.
The collected time is the “real” metric from time command.
The identifiers of the events range from 1 to 5,000 (inclusive). This is important, since one of the programs cheats by not reading the identifiers at all and assuming they will be in order and within this exact range.
The output of each program is stored to a temporary file (in tmpfs, so no actual disk access), and then the files are compared to check if the data they contain is correct.
The programs were ran three times, in different order.
Here are the averaged results.
The simplest approach, in Python, performed in 4.2 seconds.
users_slots = list(load_users_slots(connection))
return {
e.id: sum(1 for u in users_slots if u & e.slots == e.slots)
for e
in load_events(connection)}
It has a benefit of being extremely easy. Also, Python's int can grow forever, which means that a single value can store all 42 bytes. Working with integers instead of having to think about the underlying data structure simplified things a lot.
Pandas was better, with 2.7 seconds. The code was slightly more complex, but still very simple.
Map-reduce with multiprocessing.Pool and 5 pages in parallel made it even faster, with 1.0 seconds. I decided to keep it simple and load all the 15,000 users every time. Not exactly sure how PostgreSQL handles cache, but is was very impressive to see that it's not the loading of the users which was the bottleneck here.
Map-reduce with shared memory for the users made the code way too complex. Using a temporary file instead made the code slightly simpler, but the performance gain is close to zero: just a few milliseconds.
At this stage, I wanted to try SIMD, and so I moved to C++. The plain C++ approach gave me 0.96 seconds.
Having to loop through the 42 bytes was silly, so I decided to rely on SIMD, representing the value as a series of three 128 bit values with SSE or two values: an AVX2 256 bits value plus a 128 bit value. In both cases, I got 0.15 seconds.
But one can get very similar performance by using five int64_t values, and letting C++ compiler use SIMD under the hood. I got 0.16 seconds.
When using threads, it takes only 0.07 seconds with four threads. Playing with the number of threads revealed this:
Number of threads
Result
5
60 ms.
8
48 ms.
10
44 ms.
12
50 ms.
100
80 ms.
200
125 ms.
500
pqxx::broken_connection (sorry!)
And finally, the GPU approach. After all, this is what GPUs are all about, right?
Well, the performance was not so great. I got 0.19 seconds with HIP, given that around 95 milliseconds are wasted at startup (and so can be excluded from the benchmark, as one can run a program that pays the cost of 95 ms. once, and then processes the requests for matches as they arrive). Nevertheless, even considering a metric of 0.09 seconds**, this is more than the 0.04 seconds I got with simple threads.
Conclusion
Bit maps are an intuitive way to represent schedules.
There is a way to reach outstanding performance with minimal effort. Being able to match 5,000 events with 15,000 users in 44 milliseconds on a single machine is pretty much a good result, I would say.
No need to be stupidly clever: many of my attempts at optimizing stuff appeared to be overly complex and actually much, much slower compared to a simple multithreading.
The observed results are not surprising when you think about the actual thing the computer should do. In fact, u₁ ∧ e₁ = e₁ is an extremely basic operation—there is no need for a GPU for that, as a modern CPU can easily do a lot of those operations per second. What is much more important here, however, is how the actual data gets transferred from one place to another—essentially from the database to the location where the computation would be performed. In my GPU example, this transfer was the most complex, and therefore slowest, thing, which is why its performance was poorer.
In real world example and if one imagines that there are now one billion users and one billion events, the simple multithreading approach could be combined with a map-reduce on multiple servers, each server having its own local copy of all the users (and sharded copy of the events). This way, horizontal scalability could ensure optimal performance even for extremely large data.
View Question ↗
Question
Parent Entity
Score: 5 • Views: 595
Site: softwareengineering
Other Comments / Reviews
SaaS Metrics