stackoverflow September 8, 2025 Rep: 63

Remove items within pandas DataFrameGroupBy groups

Score

Answers

144

Views

23.9

Trend Score

Question Details

No question body available.

Answers (3)

Accepted Answer Available

Accepted Answer

September 8, 2025 Score: 1 Rep: 27,110 Quality: Medium Completeness: 40%

A possible solution, which first filters the dataframe by idxkept, and then does the grouping.


groupsfiltered = df[df.index.isin(idxkept)].groupby(by=["g0", "g1"], sort=False)

In case you need to keep the unfiltered groups:


groups = df.groupby(by=["g0", "g1"], sort=False)
[x[x.index.isin(idxkept)] for _, x in groups]

September 8, 2025 Score: 1 Rep: 267,513 Quality: Medium Completeness: 80%

Once the groups are formed, it is not directly possible to filter prior to an aggregation with groupby.agg. However, you could do this with groupby.apply.

For instance, if your aggregation is groupby.sum/groupby.agg('sum'), you could use:


# without filtering
groups[['data']].apply(lambda x: x.sum())
with filtering
groups[['data']].apply(lambda x: x[x.index.isin(idxkept)].sum())

Example output:

# without filtering data g0 g1 gn foo baz xxx 0.4 bar baz xxx 0.4 qux xxx 0.2 with filtering data g0 g1 gn foo baz xxx 0.4 bar baz xxx 0.0 qux xxx 0.2

However, groupby.apply is often quite less efficient than groupby.agg (or native aggregation functions). Efficiency should be tested on the real data, but I would not be surprised that performing two groupby.agg could be faster.

Another option, assuming your aggregation functions are not sensitive to NaN, could be to join a filtered version of your data before aggregation to be able to use efficient aggregation simultaneously on the raw and filtered data:


datacols = ['data']
groups = (df.join(df.loc[df.index.isin(idxkept), datacols]
                    .addsuffix('filtered'))
            .groupby(by=['g0', 'g1', 'gn'], sort=False)
         )
out = groups.sum()

Output:

data datafiltered g0 g1 gn foo baz xxx 0.4 0.4 bar baz xxx 0.4 0.0 qux xxx 0.2 0.2

Intermediate output of join:


    g0   g1   gn  data  datafiltered
a  foo  baz  xxx   0.1            0.1
b  foo  baz  xxx   0.3            0.3
c  bar  baz  xxx   0.4            NaN
d  bar  qux  xxx   0.2            0.2

December 11, 2025 Score: 0 Rep: 2,084 Quality: Low Completeness: 60%

Using Polars :

idxkeep = ["a", "b", "d"]

resusingpolars = dfpl.withcolumns( pl.when(pl.col('id').isin(idxkeep)) .then(pl.col('data')) .otherwise(None) .alias('datafiltered') ) ''' ┌─────┬─────┬─────┬──────┬───────────────┐ │ id ┆ g0 ┆ g1 ┆ data ┆ datafiltered │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str ┆ f64 ┆ f64 │ ╞═════╪═════╪═════╪══════╪═══════════════╡ │ a ┆ foo ┆ baz ┆ 0.1 ┆ 0.1 │ │ b ┆ foo ┆ baz ┆ 0.3 ┆ 0.3 │ │ c ┆ bar ┆ baz ┆ 0.4 ┆ null │ │ d ┆ bar ┆ qux ┆ 0.2 ┆ 0.2 │ └─────┴─────┴─────┴──────┴───────────────┘ '''

Using Pandas :


maskpandas = dfpandas.index.isin(idxkeep)
resusingpandas = dfpandas['data'].where(maskpandas)

Export Question Data

Export this question and its answers for further analysis or reporting.

Back to Questions

Remove items within pandas DataFrameGroupBy groups

Question Details

Tags

Answers (3)

with filtering

with filtering

Analysis Metrics

Question Information

Actions

Related Questions

Export Question Data