stackoverflow January 3, 2026 Rep: 213

Launch suprocesses in Python by chunks and wait until end for start the next chunk

Score

Answers

Views

23.0

Trend Score

Question Details

No question body available.

Answers (3)

January 3, 2026 Score: 1 Rep: 21,415 Quality: Medium Completeness: 80%

We execute this chunks times:

    [p.wait() for p in procs]

Inevitably there will be one of the ten workers that finishes first, and another that finishes last. In many, many practical situations there is an "interesting" amount of variation between those elapsed times. We call that the Straggler Effect, and it needlessly idles some of your cores.

Your code should be doing wait4() in parallel across the ten workers, and immediately forking off a new worker when one exits (not when all ten exit).

The usual idiom for that is to use a multiprocessing task pool, but you did not import that standard library so I have to assume you have good reasons. The next best thing would be to let make worry about those details. Create a Makefile:

OUT = \
 task00.txt \
   ...
 task99.txt \
all: $(OUT)%.txt:
        python mpirun options
        touch $@

Then make -j10 executes what you want, staying within the memory budget.

That touch is arguably an ugly wart. Perhaps the mpirun child produces a result file that make can see? Or perhaps GNU parallel is a better fit for your use case.

it = 10

nit: That's not a terrific identifier. Commonly we use it to describe an iterator or an iterable.

January 4, 2026 Score: 0 Rep: 20,056 Quality: Medium Completeness: 100%

While arguably a little wasteful, that's a huge benefit of Python: ease of development for a slower runtime

Consider instead of waiting for each chunk, a design where you

create a collection of tasks
start a thread for each parallel job
- which consume the tasks
- manages the subprocess
join all the tasks

I also highly recommend checking the result of each process and retrying or failing early if they're not completing - in this design, you can even recreate failed jobs right away by just adding it back to the queue, but I'd do this sparingly and track either individually (job contains a counter for errors) or totally (global counter or length of a collection of failed jobs, which can contain their stderr, etc.) how many failed jobs you have for your attempts or entire process to end early

from collections import deque  # friendly API, or consider queue.SimpleQueue
from subprocess import Popen, PIPE
from threading import Thread
def worker(Qjobs, Qresults):
    while True:
        job = Qjobs.pop()
        if job is None: break  # one of many exit designs
        name, command = job
        p = Popen(command, stdout=PIPE, stderr=PIPE)
        out, err = p.communicate()  # wait for completion and get the results
        Qresults.append((name, p.returncode))  # track results
def main():
    Qjobs, Qresults = deque(), deque()
    count = 9  # set from cores or arg
    #
    workers = []
    for index in range(count):
        t = Thread(worker, args=(Qjobs, Qresults))
        Qjobs.append(None)  # end each Thread
    for t in workers:
        t.start()
    # work has now started, opportunity to monitor progress too
    # wait for worker to exit
    for t in workers:
        t.join()
    # check the results

Finally, this might be easier achieved with a ThreadPool, but starting with exactly what you want all written out and then transitioning the design to be more succinct can help a great deal with understanding bad behavior inside it

def worker(command):
    try:  # I recommend collecting the output, if only briefly for analysis
        return subprocess.checkcall(command, stdout=DEVNULL, stderr=DEVNULL)
    except CalledProcessError as ex:
        return repr(ex)
jobs = [command1, command2, ...]
with multiprocessing.pool.ThreadPool() as pool:
    results = pool.map(worker, jobs)

January 4, 2026 Score: -1 Rep: 1 Quality: Low Completeness: 50%

When you cal mpirun several times at the same time, the MPI runtime tries to use all of the available cores by default!!

Try this code

import subprocess
chunks = 10
it = 10
for i in range(0, chunks):
    procs = []
    for j in range(0, it):
        # Bind this process to core j
        proc = subprocess.Popen([
            'mpirun', '-n', '1', 
            '--bind-to', 'core',
            '--map-by', 'core',
            '-cpu-set', str(j),  # This syntax varies by MPI implementation
            program, option1, option2
        ])
        procs.append(proc)
    for p in procs:
        p.wait()

Export Question Data

Export this question and its answers for further analysis or reporting.

Back to Questions

Launch suprocesses in Python by chunks and wait until end for start the next chunk

Question Details

Tags

Answers (3)

Analysis Metrics

Question Information

Actions

Related Questions

Export Question Data