Question Details

No question body available.

Tags

r data.table

Answers (2)

February 13, 2026 Score: 3 Rep: 17,757 Quality: Medium Completeness: 60%

Updated -- much faster row-wise processing

For this size, a specialized Rcpp function makes sense:

Rcpp::cppFunction(" void cap(DataFrame dt, double indlimit, double agglimit) { // updates dt by reference int ncols = dt.size(); int nrows = dt.nrow();

// get pointers for each column std::vector colptrs(ncols); for (int j = 0; j < ncols; ++j) colptrs[j] = REAL(dt[j]);

for (int i = 0; i < nrows; ++i) { double runningagg = 0.0; bool capped = false;

for (int j = 0; j < ncols; ++j) { if (capped) { colptrs[j][i] = 0.0; continue; }

double val = colptrs[j][i]; if (val > indlimit) val = indlimit;

if (runningagg + val > agglimit) { val = agglimit - runningagg; runningagg = agglimit; capped = true; } else { runningagg += val; }

colptrs[j][i] = val; } } } ")

Testing

dt2 user system elapsed #> 0.03 0.00 0.03 dt2 #> A B C D E F G H I J K L #> #> 1: 20 20 20 20 14 6 0 0 0 0 0 0 #> 2: 14 6 20 20 10 20 10 0 0 0 0 0 #> 3: 20 20 20 20 20 0 0 0 0 0 0 0 #> 4: 19 20 20 14 20 7 0 0 0 0 0 0 #> 5: 5 20 20 20 20 15 0 0 0 0 0 0 #> --- #> 999996: 15 1 20 20 20 20 4 0 0 0 0 0 #> 999997: 20 6 1 20 20 20 13 0 0 0 0 0 #> 999998: 20 20 20 20 12 8 0 0 0 0 0 0 #> 999999: 20 1 20 20 20 19 0 0 0 0 0 0 #> 1000000: 20 20 20 20 20 0 0 0 0 0 0 0

Original answer

Rcpp::cppFunction(" List update
col(NumericVector x, NumericVector agg, LogicalVector capped, const double indlimit, const double agglimit) {

const int n = x.size(); double x_ptr = REAL(x); double aggptr = REAL(agg); int* capptr = LOGICAL(capped);

for (int i = 0; i < n; ++i) { if (capptr[i]) { xptr[i] = 0.0; continue; }

double xi = xptr[i];

if (xi > indlimit) xi = indlimit;

double newagg = aggptr[i] + xi;

if (newagg > agglimit) { capptr[i] = TRUE; xi -= (newagg - agglimit); }

xptr[i] = xi; aggptr[i] = newagg; }

return List::create(x, agg, capped); } ")

A function that uses updatecol:

f = indlimit) numcols
February 13, 2026 Score: 2 Rep: 34,722 Quality: Medium Completeness: 60%

There are two parts. The first part seems fine as is but I found it slightly more efficient to use pmin() directly and skip the i argument.

# First add L for integer num_rows