October 24, 2025 Score: 23 Rep: 79,044 Quality: Expert Completeness: 50%

When dealing with a process that takes too much time, the tool you need from your toolbox is profiling: find out how much time is spent in each portion of the process.

As we are are talking about long time scales (in the order of hours), I would use a logging system to support me in collecting the profiling information. In concrete steps

Configure your logging system to include timestamps, if that is not already the case.
Break down the slow process in concrete steps at a granularity that you think will give you good information without having to wade through tons of logging. Preferably, each step either performs a set of calculations or an interaction over the network or an interaction with the storage device but not multiple of those. If that makes your steps too small for your comfort, you can also do those smaller steps in a second or third iteration.
Add logging to each step to identify when it starts and when it finishes
Run the process and collect the logging
Identify which steps take the most time and check for each of them if the time taken is reasonable and to be expected or if it is too long and needs to be optimized. If a step is repeated multiple times, take all repetitions into account for determining how long that step takes within the overall process.
If you identified steps that take too long, but they are still too granular to suggest optimization opportunities, then identify sub-steps within that step and repeat the profiling process.
If the target time is not a hard requirement, but based on a gut feeling of "it should take about that long", then also consider the possibility that the requirement is wrong. In this case, you can use the measurements you collected and your analysis of them as supporting evidence for why the process takes as long as it does.

October 25, 2025 Score: 7 Rep: 4,556 Quality: Medium Completeness: 30%

The database server has a good CPU and disk load. They may be not at 100%, but that's explained by inefficiencies in pipeline scheduling - CPU occasionally waits for IO and occasionally fails to schedule a disk read in advance.

Your DB IS a bottleneck. Profile and optimize queries and their scheduling.

Ensure DB always executes at least some queries - application may have an inefficient approach where results are processed in batches and requests for the next batch is not sent until previous one is processed. If possible, schedule next batch/request before processing last one. Note that if a query is ready and results are not being consumed by client, it is not being executed, which may mean your DB is underutilized.

If possible, try accessing DB in multiple threads - this may slow the task down, but may speed it up depending on task nature and index layout.

Measure and write down execution times before and after every change - otherwise you may get confused exploring bad options.

October 30, 2025 Score: 4 Rep: 29,055 Quality: Medium Completeness: 50%

You have quite a lof of angles of attack for your problem, especially as you don't give us a lot of details on the specifics of your system, its intent, and the way the various components interact with each other.

Notably, you can look at:

infrastructure aspects,
programming aspects,
configuration aspects,
data modelling aspecs.

Infrastructure

You didn't give us details on your overall architecture and infrastructure, so at this stage we can only conjecture. While your network speed may be fast, your latency may be bad.

In any case, try to measure perform a few queries between the querying system and the database server to confirm that your latency is good and stable.

October 28, 2025 Score: 2 Rep: 31,152 Quality: Medium Completeness: 50%

I don't think it's possible to know what issue is the exact cause without much greater level of detail. I don't think it's possible or desirable to provide that level of information here. I think there are a number of other possible causes you should consider, though. Here are a few that I would look into based on your description:

This is my main suspect: the client is executing queries and only beginning to process the results after the all results are fully retrieved. This is a very common mistake and I have had many experiences with developers who are committed to the (very wrong) idea that this improves performance. You may already know this but if you need me to elaborate, I will happily do so.
There's some sort of contention within the system, most likely on your client but potentially on the database side as well. For example, all your threads are synchronized on the same lock so only one executes at a time. I've seen issues in databases where things like a missing foreign key can cause unexpected lock contention.
This might sound silly but things like printing a lot to a terminal session can have really surprising negative performance impacts. Simply redirecting to a file can resolve this.

This is by no means comprehensive. These are a few of the kinds of common issues I have found when resolving performance issues. There are many more, though. For example, I was reminded the other day that JDBC, by default, uses a tiny 10 record fetch size when retrieving results. For a lot of needs, this is fine, but if you are retrieving a large number of results, it creates a lot chattiness and latency.

I had another thought of maybe low usefulness: I've been really surprised at how much async IO (or non-blocking IO) can improve performance for IO-bound systems. I know that's the point but my intuitions tend to underestimate the impact introducing non-blocking IO calls. I'm not sure how stable non-blocking database drivers are for SQLServer but on a cursory search it seems there may be some available.

October 29, 2025 Score: 2 Rep: 6,407 Quality: Low Completeness: 70%

Now, what do I do next to understand where the bottleneck is, given that none of the resources seem to be used at 100%?

A fundamental problem is latency. If the CPU makes many small requests to the disk or to a database the time may be dominated by various types of fixed costs, like the time to actually transmit the information. This time may not show up in performance metrics since both are mostly waiting for each other and has the capacity to other work if such work was available.

The solution is usually to make fewer but larger requests or queries to reduce the effect of latency. Simply starting processing of database results as they are received can also help a fair bit. But concurrency and granularity can be complicated. While it can greatly help with performance it can also be difficult to get right.

How do I figure out which one of those scenarios is correct—and eventually find how to optimize the task by improving either the hardware or the actual task?

Use the tools available to see if you can confirm or disprove any hypothesis

There are tools to simulate bandwidth restrictions and latency. If this has little effect it is probably not the network.
99% memory and 80% disk load is high enough that I would be concerned. Upgrading hardware can be a relatively cheap, and may be enough. Simulating added memory/disk/CPU load could also reveal potential bottlenecks.
If a database is involved it should be one of the primary suspects. There are specialized tools available to check for common problems, like swapping, lack of indexes, lots of small queries, lock contention and so on. Check the documentation for your database for the specifics.
There are CPU profilers to check what an application is doing and what takes most time, even if CPU time is less likely to be a problem in your case.
Just reading the source code can be illuminating, even if just to get a rough understanding of the overall code quality and what kind of problems you can expect.

Chances are that the main limitation is the software design and architecture. Computers are ridiculously fast when used well. But most software development stops when they get something that works well enough. Ensuring the solution scales well is often not considered or tested enough.

It is not that uncommon to improve performance by orders of magnitude with some fairly simple fixes. But that might require a good understanding of the application to make sure you understand the problems and ensure it still works correctly after any fixes. Gaining that understanding can be expensive if the project lacks documentation, automated tests, and the original developers are gone.

How do I find what's causing a task to be slow, when CPU, memory, disk and network are not used at 100%?

Question Details

Tags

Answers (5)

Infrastructure

Analysis Metrics

Question Information

Actions

Related Questions

Export Question Data

How do I find what&#39;s causing a task to be slow, when CPU, memory, disk and network are not used at 100%?

Question Details

Tags

Answers (5)

Infrastructure

Analysis Metrics

Question Information

Actions

Related Questions

Export Question Data

How do I find what's causing a task to be slow, when CPU, memory, disk and network are not used at 100%?