Question Details

No question body available.

Tags

algorithms data-structures scheduling

Answers (7)

November 26, 2025 Score: 4 Rep: 139,531 Quality: Medium Completeness: 100%

One way to do it is to represent each user and each event as a bit map. Essentially, each half hour corresponds to a bit, the entire day corresponding to 6 bytes, one week being 42 bytes. Regarding the bits, 1 means the person is available, and 0—that the person is not available. For the events, 1 means the event is going on, and 0—quite the opposite.

For instance:

u₁ = 0b00011011100...

means that the user is available between 1:30 a.m. UTC and 2:30 a.m. UTC on Monday, then busy for half an hour, then available for an hour and a half, then busy again... The patten will go on for the remaining 336 bits (7×24×2).

Similarly, an event can be represented a bit map.

e₁ = 0b00000001100...

means that it's a one hour event that starts at 3:30 a.m. UTC on Monday. The user from the previous example can attend it, as u₁ ∧ e₁ = e₁. This very basic formula is all it gets to see if the user can attend an event.

Now, all operations become operations on bit maps (XOR, etc.), which are usually very fast and still relatively space efficient—you can use SIMD, or do the computation in GPU—essentially rely to any techniques that are used when working with graphics. Chances are, you can just load everything in memory at once, do the computation you need, then save the result.

My original answer was suggesting to use bit maps in a very narrow way: essentially, the database should store the schedules and the events in a structured way. But then, Hans-Martin Mosner said in his comment:

In addition to being a very viable solution when keeping the bit maps in memory, this would likely even work well in the database as PostgreSQL has bit string data types with boolean operations (including aggregate functions) which may lend themselves well to combining recurrent schedules, assigned events and holiday exemptions for users through SQL statements. The list of people available to be booked on an event could be returned from a single database query.

In parallel, I was thinking (and overthinking) about the performance of my approach, but even more about the readability. Playing with code, it appeared that bit maps are not just a hack to get better performance. Instead:

  • They are actually the most intuitive thing when you think about schedules. Even the UI would likely represent the schedule as a bit map, where the user would be able to toggle half-hours on and off.

  • The code to manipulate bit maps—at least this is my guess—would be much simpler compared to code that would have to handle start time, duration and end time. Although the answer by JimmyJames could prove that I'm wrong.

Back to the subject of performance, I was wondering, how long would it actually take to compute how many people can participate to every event. After all, if bit maps are used, the problem is as simple as unen = en, and this sort of things is what computers are very efficient at.

So I did a few attempts that can be found at this GitHub repository.

The tests were running on a Intel i5-10600 3.30 GHz CPU, AMD Radeon RX 5700 XT, Force MP510, using PostgreSQL 15, Python 3.13.5. The measurements were done by running, from Bash, a given Python or C++ program, that is expected to:

  1. Load 15,000 users and 5,000 events from the database.
  2. Do its magic.
  3. Output 5,000 lines containing the identifier of the event and the number of users that are available during the event.
  4. The collected time is the “real” metric from time command.

The identifiers of the events range from 1 to 5,000 (inclusive). This is important, since one of the programs cheats by not reading the identifiers at all and assuming they will be in order and within this exact range.

The output of each program is stored to a temporary file (in tmpfs, so no actual disk access), and then the files are compared to check if the data they contain is correct.

The programs were ran three times, in different order.

Here are the averaged results.

The simplest approach, in Python, performed in 4.2 seconds.

usersslots = list(loadusersslots(connection))
return {
    e.id: sum(1 for u in usersslots if u & e.slots == e.slots)
    for e
    in loadevents(connection)}

It has a benefit of being extremely easy. Also, Python's int can grow forever, which means that a single value can store all 42 bytes. Working with integers instead of having to think about the underlying data structure simplified things a lot.

Pandas was better, with 2.7 seconds. The code was slightly more complex, but still very simple.

Map-reduce with multiprocessing.Pool and 5 pages in parallel made it even faster, with 1.0 seconds. I decided to keep it simple and load all the 15,000 users every time. Not exactly sure how PostgreSQL handles cache, but is was very impressive to see that it's not the loading of the users which was the bottleneck here.

Map-reduce with shared memory for the users made the code way too complex. Using a temporary file instead made the code slightly simpler, but the performance gain is close to zero: just a few milliseconds.

At this stage, I wanted to try SIMD, and so I moved to C++. The plain C++ approach gave me 0.96 seconds.

Having to loop through the 42 bytes was silly, so I decided to rely on SIMD, representing the value as a series of three 128 bit values with SSE or two values: an AVX2 256 bits value plus a 128 bit value. In both cases, I got 0.15 seconds.

But one can get very similar performance by using five int64t values, and letting C++ compiler use SIMD under the hood. I got 0.16 seconds.

When using threads, it takes only 0.07 seconds with four threads. Playing with the number of threads revealed this:

Number of threads Result
5 60 ms.
8 48 ms.
10 44 ms.
12 50 ms.
100 80 ms.
200 125 ms.
500 pqxx::brokenconnection (sorry!)

And finally, the GPU approach. After all, this is what GPUs are all about, right?

Well, the performance was not so great. I got 0.19 seconds with HIP, given that around 95 milliseconds are wasted at startup (and so can be excluded from the benchmark, as one can run a program that pays the cost of 95 ms. once, and then processes the requests for matches as they arrive). Nevertheless, even considering a metric of 0.09 seconds, this is more than the 0.04 seconds I got with simple threads.

Conclusion

  1. Bit maps are an intuitive way to represent schedules.

  2. There is a way to reach outstanding performance with minimal effort. Being able to match 5,000 events with 15,000 users in 44 milliseconds on a single machine is pretty much a good result, I would say.

  3. No need to be stupidly clever: many of my attempts at optimizing stuff appeared to be overly complex and actually much, much slower compared to a simple multithreading.

  4. The observed results are not surprising when you think about the actual thing the computer should do. In fact, u₁ ∧ e₁ = e₁ is an extremely basic operation—there is no need for a GPU for that, as a modern CPU can easily do a lot of those operations per second. What is much more important here, however, is how the actual data gets transferred from one place to another—essentially from the database to the location where the computation would be performed. In my GPU example, this transfer was the most complex, and therefore slowest, thing, which is why its performance was poorer.

  5. In real world example and if one imagines that there are now one billion users and one billion events, the simple multithreading approach could be combined with a map-reduce on multiple servers, each server having its own local copy of all the users (and sharded copy of the events). This way, horizontal scalability could ensure optimal performance even for extremely large data.

November 30, 2025 Score: 4 Rep: 31,152 Quality: Medium Completeness: 100%

I've been overthinking this problem. I think there's a simple answer that is easy to implement and should perform more than adequately for your needs.

This is easily solved using Postgres. The high-level logical structure is this:

relational structure - option A

I have my doubts about whether it's worthwhile but if you want to normalize this structure, you can add another table:

relational structure - option B

And then what you want is the difference of two sets:

  1. users with availability that contains the event
  2. users with assignments that conflict with the event

That is, you want the list of users for 1 without any of the users from 2.

Postgres has an EXCEPT clause for this purpose. It works similarly to a UNION clause in that you use it between two compatible select statements. But instead of combining the results, the results from the second query are removed. With the simple design above (the first diagram) the query would look like this:

SELECT a.id FROM availability a WHERE a.start
November 26, 2025 Score: 3 Rep: 220,779 Quality: Medium Completeness: 50%

At the moment, we run through all the users, their availability slots, and their assigned events in order to determine whether they're available for a specific event.

The first thing you want to get out of this process is the repeated checking of the assigned events of a user:

  • A user has availability slots (time intervals, pairs of from-to values).

  • When a users gets assigned to an event, his/hers availability slots are reduced (maybe one availability slot will be split). As a result, a user has a changed set of availability slots. Fullstop. No further need to look where these availability slots were coming from.

  • The availability slots of a user don't overlap, so they can be strictly ordered in a simple array. This makes it possible to look for the starting time of the specific event by a binary search and find out if it is falling into one of the availability slots (since availability slots don't overlap, this results in one slot at maximum).

  • It remains to check whether the end time of the specific event fits into the found availability slot, or not.

Average order of running time per single query: in theory, this will be O(log(A) x U), where A is the average number of availability slots per user, and U the number of users. Beware, when you need to retrieve all the availability slots first for each user, for each query, the retrieval itself becomes O(A), and the binary search will gain you nothing. Ideally, you will cache the availability slots for the users once beforehand and then run multiple event queries. If you can't, the running time order becomes O(A x U), which is what you seem to have now.

As long as you don't have millions of users where only a small percentage are eligible for a certain event, I don't see any need for further optimizing the data structures. Just run through each user, apply the test above (which should be quick), and return the found users. Whenever an event gets assigned to a user, update the availability slots of that user.

According to your edit: it seems you do not even need to run your queries against the full set of all users, only against smaller subsets. That makes it even more feasible to determine availble users by just iterating over the candidates.

November 28, 2025 Score: 2 Rep: 4,556 Quality: Low Completeness: 80%

Doing search service side is a very bad idea. You would have to keep assignments synchronized with DB doing error-prone cache invalidation and potentially moving significant volumes of data around.

Do not do full scans service-side. Use DB to do searches:

CREATE TABLE workers
(
  name VARCHAR NOT NULL PRIMARY KEY
)
CREATE TABLE assignments
(
  worker VARCHAR NOT NULL REFERENCES workers(name),
  during tsrange NOT NULL,
  CONSTRAINT overlapping_times EXCLUDE USING GIST (
        worker WITH =,
        during WITH &&
    )
)
INSERT INTO workers VALUES
  ('John'),
  ('Mary')
INSERT INTO assignments VALUES
  ('John', '[2021-01-01 00:00:01, 2021-01-01 00:09:01)'),
  ('John', '[2021-01-01 00:13:01, 2021-01-01 00:14:01)'),
  ('Mary', '[2021-01-01 00:05:01, 2021-01-01 00:16:01)')

Find busy workers:

SELECT worker from assignments WHERE during && tsrange('[2021-01-01 00:10:01, 2021-01-01 00:11:01)')
worker
Mary

Find available workers:

SELECT name FROM workers LEFT JOIN assignments
  ON assignments.worker = workers.name AND during && tsrange('[2021-01-01 00:10:01, 2021-01-01 00:11:01)')
  WHERE assignments.worker IS NULL
name
John

fiddle

I'm not sure which additional indexes would be optimal for this case, but btree-gist module surely has necessary operators to build them. There is a chance, that all indexes that might be needed are already created by CONSTRAINT directives.

November 28, 2025 Score: 0 Rep: 16,589 Quality: Medium Completeness: 50%

Interesting problem. My university used simulated annealing to shift lectures, rooms, times, and students around to form a schedule which they manually tweaked before publishing for the semester.

Sounds to me like you want this to be somewhat more dynamic though.

A spatial tree feels right for this. An R*-tree would be ideal but others can do the job. That's a region Tree with Leaf-node walking.

We need two of them for a given period of time (a day, a week, a month, up to you really). The first stores the events, the second attendee's availability.

Both tree will start with a single node containing no entries, marked for the entire time that the tree represents. This is so that latter we don't have to check if nodes have consecutive time stamps, we know that the structure is dense, therefore the next node is always for the slot of time after this one.

When a user says I'm available between time x and time y, you insert them into the tree for that region of time. We follow the tree down break apart the node found if it doesn't start at the same time the attendee starts being available. Add them into the node, and each subsequent leaf node until you find the node that matches the attendees end of availability. (Break it apart if that node extends past it).

Repeat for each interval of each users time. Given your system is probably online this allows users to add new availability easily at their leisure. Bulk inserts are also possible.

Removing time is similar, just avoid the merge step. Because time is always marching forward you'll pay more to remove unnecessary nodes. Perhaps rebuild the tree offline if there has been some threshold of deletes reached, or if a tree built for say two years from now becomes next months tree.

Do all of the above for the events as well.


To find out the list of available attendees for an event. lookup up the attendees tree for the start of the time period, and grab a list of each leaf from there to leaf containing the end time. Intersect the lists on those nodes Whoever is left is available.

To find the events available, grab the events tree Per interval of available time -> Find the first leaf node that starts after the attendees becomes available, and union with each next node until it ends after the attendee becomes unavailable. Grab the nodes on either side of the union and subtract them from the union. This is the list of available events for that attendee.

Why?

  • For the events finding attendees: the fact that attendees are listed as available in a region of time is all that is needed. Intersecting with each leaf node that makes up that time ensures they are available for the entire block of time. If that's a single node great.
  • For attendees finding events: the event needs to be present at least once in the period of time they are available. But the event might start earlier or end later, that is why we subtract the node starting before and the node ending after the attendee's availability - even if those nodes are also partially in the region of availability with the attendee. Their presence there means they must start before or end after.

An optimisation might be to make the tree for each set of PreReqs that an Antendee has, or an Event needs. This allows you to reduce the sets to be explored upfront. The downside is that the event/attendee will need to be added to multiple trees for the same time period as attendees should always show up in the no prereqs tree.

Another optimisation is to make the lists of events users a bit table, and have a secondary entry table to properly reference them. This allows for quick bit ops for intersection, union, and subtraction. It also make node splitting a faster op (copy binary blob, versus duplicating references). It does add in a translation step of mask to actual data, but that isn't much of a slow down.


This works great for one off events, but you probably want to offer some better services like highlighting if there is enough event capacity for those who wish to attend, and how flexible your attendees are.

So Attendees wishlist a event type.

An organiser wants to know if a given set of times satisfies demand. When an attendee wishes for the event type we add their availabilities to a new tree just for the event type. The organiser can now see this tree plotted out as a graph over a day/week/month and can position events over periods of time where availability is enough to warrant it.

Even better with a set of classes in mind we can ask for lists of attendees. We just count how many times an attendee is available for an entire class and gives us a flexibility measure for each attendee. The organiser knows who to prioritise, or how good their offering of times is.


You should be able to do something similar for the availability of events given conflicting wishes. To guide an attendee into choosing events that give them more flexibility later.

December 1, 2025 Score: -1 Rep: 183 Quality: Low Completeness: 100%

I am not completely sure if this is applicable to your scenario a Sparse Matrix is what comes to my mind for a data structure.

I am not sure if this is usable in your scenario as

  • your data is stored inside of a data base (which limits the possibilities for custom adoptions)
  • the number of users is not fixed

A quick search showed that at least some data bases support sparse matrixes, but I am not aware how to use them, especially as the number of users can change, thus this might not match.
For a custom storage implementation that wouldn't be too complicated to do, but on a data base I simply don't know.

The idea of a sparse matrix is to not save every possible entry on a row or column, but to expect the matrix to be mainly empty and have pointers to the next (and optionally previous) occupied entry in the same axis.
The assumption to justify the sparse matrix is, that most combinations of [user, time, event] won't be filled.

In theory we could fill a 3-dimensional matrix and thus search in any axis direction:

  • next/previous user on same time slot and event
  • next/previous time slot on same user and event
  • next/previous event on same user and time

Of course, this is not yet exactly as it is needed. As you said, each person can attend to only one event at the same time and also each event is usually only a single block of adjacent time.

Thus event + time could be merged into a single entry which represents both (for each time slot, e.g. 15 or 30 minutes, thus having the same event on multiple slots) and the tricky part, to allow to search for any of the two identifiers instead of a combined identifier.
Thus we get back to a 2-dimensional matrix [user, time + event] or as a workaround to search for any individual key, make two similarly filled matrixes [user, time] and [user, event].
On such matrixes you can search

  • next/previous user on same time slot (not sure if helpful without the event)
  • next/previous user on same event (list of participants)
  • next/previous time slot on same user (user's occupied time)
  • next/previous event on same user (user's events)

I hope this helps you out and you get this running on your data base.

December 1, 2025 Score: -1 Rep: 1 Quality: Low Completeness: 30%

A sparse matrix can model this kind of data, but it is usually not practical in a database environment and not ideal when the number of users or events is dynamic.

Sparse matrices work best when:

the dimensions (rows/columns) are fixed

the matrix is mostly empty

you control the in-memory representation

In your case:

The number of users is not fixed, so the matrix dimensions change constantly.

Your data lives in a database, where you cannot easily implement the linked-node structure that sparse matrices depend on.

Most relational and NoSQL databases already store sparse data efficiently using indexes, join tables, or adjacency lists, so you gain very little by trying to emulate a sparse matrix manually.

Why a sparse matrix isn’t ideal here

Sparse matrices let you quickly traverse neighbors in each dimension (next user, next time slot, next event). But your domain has additional constraints:

A user can attend only one event per time slot

An event typically spans multiple adjacent time slots

This collapses “event × time” into a single interval, which removes the benefit of a true multidimensional sparse structure.

What works better

In a database setting, the normal and efficient design is:

users events eventtimeslots usereventparticipation (userid, eventid, timeslotid)

With proper indexes on (userid, timeslotid) and (eventid, user_id), you can efficiently query:

Which users attend an event

What times a user is busy

Which event occupies a time slot

Conflict detection across users/events/time

Databases are already optimized for exactly these operations, so a sparse matrix brings no practical advantage.

Conclusion

A sparse matrix is conceptually interesting but not a good fit:

Dimensions change dynamically

Data is managed by a database

Database indexes and join tables already give you the performance benefits you would expect from a sparse matrix

Stick with relational modeling + proper indexing — it will be simpler, scalable, and much easier to maintain.