Question Details

No question body available.

Tags

python pandas dataframe group-by

Answers (3)

Accepted Answer Available
Accepted Answer
September 8, 2025 Score: 1 Rep: 27,110 Quality: Medium Completeness: 40%

A possible solution, which first filters the dataframe by idxkept, and then does the grouping.

groups
filtered = df[df.index.isin(idxkept)].groupby(by=["g0", "g1"], sort=False)

In case you need to keep the unfiltered groups:

groups = df.groupby(by=["g0", "g1"], sort=False) [x[x.index.isin(idx
kept)] for _, x in groups]
September 8, 2025 Score: 1 Rep: 267,513 Quality: Medium Completeness: 80%

Once the groups are formed, it is not directly possible to filter prior to an aggregation with groupby.agg. However, you could do this with groupby.apply.

For instance, if your aggregation is groupby.sum/groupby.agg('sum'), you could use:

# without filtering groups[['data']].apply(lambda x: x.sum())

with filtering

groups[['data']].apply(lambda x: x[x.index.isin(idxkept)].sum())

Example output:

# without filtering data g0 g1 gn foo baz xxx 0.4 bar baz xxx 0.4 qux xxx 0.2

with filtering

data g0 g1 gn foo baz xxx 0.4 bar baz xxx 0.0 qux xxx 0.2

However, groupby.apply is often quite less efficient than groupby.agg (or native aggregation functions). Efficiency should be tested on the real data, but I would not be surprised that performing two groupby.agg could be faster.

Another option, assuming your aggregation functions are not sensitive to NaN, could be to join a filtered version of your data before aggregation to be able to use efficient aggregation simultaneously on the raw and filtered data:

data
cols = ['data'] groups = (df.join(df.loc[df.index.isin(idxkept), datacols] .addsuffix('filtered')) .groupby(by=['g0', 'g1', 'gn'], sort=False) ) out = groups.sum()

Output:

data datafiltered g0 g1 gn foo baz xxx 0.4 0.4 bar baz xxx 0.4 0.0 qux xxx 0.2 0.2

Intermediate output of join:

g0 g1 gn data data
filtered a foo baz xxx 0.1 0.1 b foo baz xxx 0.3 0.3 c bar baz xxx 0.4 NaN d bar qux xxx 0.2 0.2
December 11, 2025 Score: 0 Rep: 2,084 Quality: Low Completeness: 60%

Using Polars :

idxkeep = ["a", "b", "d"]

resusingpolars = dfpl.withcolumns( pl.when(pl.col('id').isin(idxkeep)) .then(pl.col('data')) .otherwise(None) .alias('datafiltered') ) ''' ┌─────┬─────┬─────┬──────┬───────────────┐ │ id ┆ g0 ┆ g1 ┆ data ┆ datafiltered │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str ┆ f64 ┆ f64 │ ╞═════╪═════╪═════╪══════╪═══════════════╡ │ a ┆ foo ┆ baz ┆ 0.1 ┆ 0.1 │ │ b ┆ foo ┆ baz ┆ 0.3 ┆ 0.3 │ │ c ┆ bar ┆ baz ┆ 0.4 ┆ null │ │ d ┆ bar ┆ qux ┆ 0.2 ┆ 0.2 │ └─────┴─────┴─────┴──────┴───────────────┘ '''

Using Pandas :

mask
pandas = dfpandas.index.isin(idxkeep) resusingpandas = dfpandas['data'].where(maskpandas)