Question Details

No question body available.

Tags

postgresql aggregate-functions

Answers (2)

October 30, 2025 Score: 1 Rep: 31,178 Quality: Medium Completeness: 80%

sum values for each type (eg, for type=1 - 1+2=3 and for type=2 - 3+4=7) and then combine it with by for example multiplying the subset results (eg, 37=21).

That sounds like you need a nested aggregate, not a partial aggregate (not necessarily):
demo at dbfiddle

select my_multiply_agg(sum_per_type)
from(select sum(value) as sum_per_type
     from my_table
     group by type) as subquery;
my_multiply_agg
21

It's using the regular, built-in sum() in a subquery (you can also use a CTE) to collect 1+2=3 for type 1 and 3+4=7 for type 2, then it multiplies those using a custom aggregate:

CREATE OR REPLACE FUNCTION my_multiply_agg_sfunc(state numeric, value bigint) 
RETURNS numeric RETURN state  value;

CREATE OR REPLACE FUNCTION mymultiplyaggcombinefunc(state1 numeric, state2 numeric) RETURNS numeric STRICT RETURN state1 * state2;

CREATE AGGREGATE mymultiplyagg(bigint) ( SFUNC = mymultiplyaggsfunc, STYPE = numeric, INITCOND = '1', COMBINEFUNC = mymultiplyaggcombinefunc, PARALLEL = SAFE );

Partial aggregation is there only to let Postgres speed things up in some scenarios (e.g. high row count, partitioning), by splitting the work between parallel workers. It's good practice to add the feature to user-defined functions, but it's not strictly necessary for what you're doing.

Also, it does not expose any way to control which rows end up in each part, which seems to be a requirement in the logic you laid out. On its own it wouldn't let you implement it.

Myself included, it seems that most people here initially focused on the partial/parallel aggregate aspect way more than the actual thing you described you're trying to do.


The functions you originally showed would always get you a zero if parallel execution kicked in. The initcond should typically be a neutral element - in multiplication, that's 1. It is 0 when you add things. You're mixing both.

Also, if the aggregate definition specifies a non-null initcond, keep in mind that that will be used not only as the initial state for each partial aggregation run, but also as the initial state for the combine function, which will be called to combine each partial result into that state.

CREATE OR REPLACE FUNCTION my
sumsfunc(state bigint, value integer) RETURNS bigint RETURN coalesce(state,0) + value;

CREATE OR REPLACE FUNCTION mysumcombinefunc(state1 bigint, state2 bigint) RETURNS bigint STRICT RETURN state1*state2;

CREATE AGGREGATE mysum(integer) ( SFUNC = mysumsfunc, STYPE = bigint, --initcond replaced by coalesce() in sfunc and making combinefunc strict COMBINEFUNC = mysumcombinefunc, PARALLEL = SAFE );

After "fixing" that, you'll find your function is nondeterministic. The order of + and is significant: 23+4 =10 but 2+3*4 =14. Since there's no way to guarantee which parallel worker will pick which subset of rows, same input can lead to different outputs at different times.

Another thing to keep in mind is that regular sum() of int returns bigint, otherwise it'd risk overflow. Since you end up multiplying sums, that's even more of a concern in your case. Above bigint, you'll need numeric, which is what I switched to, in the example up top.

These are logical problems but functionally, your functions are fine. As immediately pointed out by @Bergi, confirmed by @Laurenz Albe and demonstrated on dbfiddle, you just need more rows to trigger parallel execution. Here's your example on 300k:

explain analyze SELECT mysum(v.value) FROM mytable v;
QUERY PLAN
Finalize Aggregate (cost=48504.82..48504.83 rows=1 width=8) (actual time=5025.803..5025.879 rows=1 loops=1)

  -> Gather (cost=48504.46..48504.57 rows=1 width=8) (actual time=5020.055..5025.590 rows=2 loops=1)

        Workers Planned: 1

        Workers Launched: 1

        -> Partial Aggregate (cost=47504.46..47504.47 rows=1 width=8) (actual time=5008.662..5008.663 rows=1 loops=2)

              -> Parallel Seq Scan on mytable v (cost=0.00..3386.71 rows=176471 width=4) (actual time=0.012..81.643 rows=150000 loops=2)

Planning Time: 0.212 ms
Execution Time: 5025.948 ms

SELECT mysum(v.value) FROM mytable v;
my_sum
0
October 30, 2025 Score: 1 Rep: 258,989 Quality: Low Completeness: 40%

Your aggregate definition looks correct.

The reason that the combine function is never called is that PostgreSQL doesn't choose a parallel plan, since the table is so tiny. Insert many more rows into my_table, and PostgreSQL will consider a parallel plan. Use EXPLAIN to see the execution plan — if there is a Gather node, PostgreSQL uses a parallel plan.