ROIpad ← Back to Search
stackoverflow › answer

Answer to: Creating a smoother profile of a set of number using R

Score: 4 • Accepted
Answered: Jun 21, 2025
User Rep: 167,708
I think you want a 2-step use of cummin/cummax: #' @param x numeric or integer #' @param i integer; if negative, it will be converted to `length(x)+1+i`, #' indexing from the right-side of the vector, -1 maps to `length(x)`; #' it is an error if `i == 0` or `i > length(x)` fun <- function(x, i) { if (length(x) == 0) return(x) # negative is "from right side", -1 converts to length(x) if (i < 0) i <- length(x) + 1 + i stopifnot("'abs(i)' must be between 1 and 'length(x)'" = i > 0 && i <= length(x)) i <- max(1, min(length(x), i)) x[1:i] <- cummin(x[1:i]) x[i:length(x)] <- cummax(x[i:length(x)]) x } dat = c(3, 4, 2, 10, 1, 23, 11, 44) fun(dat, 0) # Error in fun(dat, 0) : 'abs(i)' must be between 1 and 'length(x)' fun(dat, 1) # [1] 3 4 4 10 10 23 23 44 fun(dat, 4) # [1] 3 3 2 2 2 23 23 44 fun(dat, length(dat)+1) # Error in fun(dat, length(dat) + 1) : 'abs(i)' must be between 1 and 'length(x)' fun(dat, length(dat)) # [1] 3 3 2 2 1 1 1 1 fun(dat, -1) # [1] 3 3 2 2 1 1 1 1 FYI, you said but for a large set, it would take long time That is not always true, it can run faster with smaller data. While I find the logic of cummin/cummax to be easier to read and therefore easier to maintain, we can also do this in literal for-loops. fun2 <- function(x, i) { if (length(x) == 0) return(x) if (i < 0) i <- length(x) + 1 + i stopifnot("'abs(i)' must be between 1 and 'length(x)'" = i > 0 && i <= length(x)) if (i > 1) for (ind in 2:i) x[ind] <- min(x[ind], x[ind-1]) if (i < length(x)) for (ind in i:(length(x)-1)) x[ind+1] <- max(x[ind], x[ind+1]) x } bench::mark(fun(dat, 4), fun2(dat, 4)) # # A tibble: 2 × 13 # expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc # <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list> # 1 fun(dat, 4) 2.54µs 3.28µs 287165. 0B 28.7 9999 1 34.8ms <dbl [8]> <Rprofmem [0 × 3]> <bench_tm [10,000]> <tibble [10,000 × 3]> # 2 fun2(dat, 4) 1.93µs 2.5µs 392359. 0B 39.2 9999 1 25.5ms <dbl [8]> <Rprofmem [0 × 3]> <bench_tm [10,000]> <tibble [10,000 × 3]> If we look at the execution speed `itr/sec`, we see that fun2 is 36% faster. With larger data: set.seed(42) dat <- sample(1000, size=1e5, replace=TRUE) bench::mark(fun(dat, 500), fun2(dat, 500)) # # A tibble: 2 × 13 # expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc # <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list> # 1 fun(dat, 500) 490.2µs 585.9µs 1588. 1.91MB 2.21 719 1 453ms <int [100,000]> <Rprofmem [9 × 3]> <bench_tm [720]> <tibble [720 × 3]> # 2 fun2(dat, 500) 11.5ms 12.2ms 81.5 390.67KB 55.6 22 15 270ms <int [100,000]> <Rprofmem [1 × 3]> <bench_tm [37]> <tibble [37 × 3]> Clearly cummin/cummax is faster with larger data. The tipping point is likely somewhere between 8 and 10,000. (I'll leave it as an exercise to find that point.) Personally, I prefer fun over fun2, but that's my choice.
r
View Question ↗
Question
Parent Entity
Score: 3 • Views: 106
Site: stackoverflow