Question Details

No question body available.

Tags

python parallel-processing

Answers (3)

January 3, 2026 Score: 1 Rep: 21,415 Quality: Medium Completeness: 80%

We execute this chunks times:

    [p.wait() for p in procs]

Inevitably there will be one of the ten workers that finishes first, and another that finishes last. In many, many practical situations there is an "interesting" amount of variation between those elapsed times. We call that the Straggler Effect, and it needlessly idles some of your cores.

Your code should be doing wait4() in parallel across the ten workers, and immediately forking off a new worker when one exits (not when all ten exit).

The usual idiom for that is to use a multiprocessing task pool, but you did not import that standard library so I have to assume you have good reasons. The next best thing would be to let make worry about those details. Create a Makefile:

OUT = \
 task00.txt \
   ...
 task99.txt \

all: $(OUT)

%.txt: python mpirun options touch $@

Then make -j10 executes what you want, staying within the memory budget.

That touch is arguably an ugly wart. Perhaps the mpirun child produces a result file that make can see? Or perhaps GNU parallel is a better fit for your use case.


it = 10

nit: That's not a terrific identifier. Commonly we use it to describe an iterator or an iterable.

January 4, 2026 Score: 0 Rep: 20,056 Quality: Medium Completeness: 100%

While arguably a little wasteful, that's a huge benefit of Python: ease of development for a slower runtime

Consider instead of waiting for each chunk, a design where you

  • create a collection of tasks
  • start a thread for each parallel job
    • which consume the tasks
    • manages the subprocess
  • join all the tasks

I also highly recommend checking the result of each process and retrying or failing early if they're not completing - in this design, you can even recreate failed jobs right away by just adding it back to the queue, but I'd do this sparingly and track either individually (job contains a counter for errors) or totally (global counter or length of a collection of failed jobs, which can contain their stderr, etc.) how many failed jobs you have for your attempts or entire process to end early

from collections import deque  # friendly API, or consider queue.SimpleQueue
from subprocess import Popen, PIPE
from threading import Thread

def worker(Qjobs, Qresults): while True: job = Qjobs.pop() if job is None: break # one of many exit designs name, command = job p = Popen(command, stdout=PIPE, stderr=PIPE) out, err = p.communicate() # wait for completion and get the results Qresults.append((name, p.returncode)) # track results

def main(): Qjobs, Qresults = deque(), deque() count = 9 # set from cores or arg #

workers = [] for index in range(count): t = Thread(worker, args=(Qjobs, Qresults)) Qjobs.append(None) # end each Thread for t in workers: t.start() # work has now started, opportunity to monitor progress too

# wait for worker to exit for t in workers: t.join()

# check the results

Finally, this might be easier achieved with a ThreadPool, but starting with exactly what you want all written out and then transitioning the design to be more succinct can help a great deal with understanding bad behavior inside it

def worker(command):
    try:  # I recommend collecting the output, if only briefly for analysis
        return subprocess.checkcall(command, stdout=DEVNULL, stderr=DEVNULL)
    except CalledProcessError as ex:
        return repr(ex)

jobs = [command1, command2, ...] with multiprocessing.pool.ThreadPool() as pool: results = pool.map(worker, jobs)
January 4, 2026 Score: -1 Rep: 1 Quality: Low Completeness: 50%

When you cal mpirun several times at the same time, the MPI runtime tries to use all of the available cores by default!!

Try this code

import subprocess

chunks = 10 it = 10 for i in range(0, chunks): procs = [] for j in range(0, it): # Bind this process to core j proc = subprocess.Popen([ 'mpirun', '-n', '1', '--bind-to', 'core', '--map-by', 'core', '-cpu-set', str(j), # This syntax varies by MPI implementation program, option1, option2 ]) procs.append(proc) for p in procs: p.wait()