Answer to: Creating a smoother profile of a set of number using R
Score: 4 • Accepted
I think you want a 2-step use of cummin/cummax:
#' @param x numeric or integer
#' @param i integer; if negative, it will be converted to `length(x)+1+i`,
#' indexing from the right-side of the vector, -1 maps to `length(x)`;
#' it is an error if `i == 0` or `i > length(x)`
fun <- function(x, i) {
if (length(x) == 0) return(x)
# negative is "from right side", -1 converts to length(x)
if (i < 0) i <- length(x) + 1 + i
stopifnot("'abs(i)' must be between 1 and 'length(x)'" = i > 0 && i <= length(x))
i <- max(1, min(length(x), i))
x[1:i] <- cummin(x[1:i])
x[i:length(x)] <- cummax(x[i:length(x)])
x
}
dat = c(3, 4, 2, 10, 1, 23, 11, 44)
fun(dat, 0)
# Error in fun(dat, 0) : 'abs(i)' must be between 1 and 'length(x)'
fun(dat, 1)
# [1] 3 4 4 10 10 23 23 44
fun(dat, 4)
# [1] 3 3 2 2 2 23 23 44
fun(dat, length(dat)+1)
# Error in fun(dat, length(dat) + 1) : 'abs(i)' must be between 1 and 'length(x)'
fun(dat, length(dat))
# [1] 3 3 2 2 1 1 1 1
fun(dat, -1)
# [1] 3 3 2 2 1 1 1 1
FYI, you said
but for a large set, it would take long time
That is not always true, it can run faster with smaller data. While I find the logic of cummin/cummax to be easier to read and therefore easier to maintain, we can also do this in literal for-loops.
fun2 <- function(x, i) {
if (length(x) == 0) return(x)
if (i < 0) i <- length(x) + 1 + i
stopifnot("'abs(i)' must be between 1 and 'length(x)'" = i > 0 && i <= length(x))
if (i > 1) for (ind in 2:i) x[ind] <- min(x[ind], x[ind-1])
if (i < length(x)) for (ind in i:(length(x)-1)) x[ind+1] <- max(x[ind], x[ind+1])
x
}
bench::mark(fun(dat, 4), fun2(dat, 4))
# # A tibble: 2 × 13
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
# 1 fun(dat, 4) 2.54µs 3.28µs 287165. 0B 28.7 9999 1 34.8ms <dbl [8]> <Rprofmem [0 × 3]> <bench_tm [10,000]> <tibble [10,000 × 3]>
# 2 fun2(dat, 4) 1.93µs 2.5µs 392359. 0B 39.2 9999 1 25.5ms <dbl [8]> <Rprofmem [0 × 3]> <bench_tm [10,000]> <tibble [10,000 × 3]>
If we look at the execution speed `itr/sec`, we see that fun2 is 36% faster. With larger data:
set.seed(42)
dat <- sample(1000, size=1e5, replace=TRUE)
bench::mark(fun(dat, 500), fun2(dat, 500))
# # A tibble: 2 × 13
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
# 1 fun(dat, 500) 490.2µs 585.9µs 1588. 1.91MB 2.21 719 1 453ms <int [100,000]> <Rprofmem [9 × 3]> <bench_tm [720]> <tibble [720 × 3]>
# 2 fun2(dat, 500) 11.5ms 12.2ms 81.5 390.67KB 55.6 22 15 270ms <int [100,000]> <Rprofmem [1 × 3]> <bench_tm [37]> <tibble [37 × 3]>
Clearly cummin/cummax is faster with larger data. The tipping point is likely somewhere between 8 and 10,000. (I'll leave it as an exercise to find that point.)
Personally, I prefer fun over fun2, but that's my choice.
View Question ↗
Question
Parent Entity
Score: 3 • Views: 106
Site: stackoverflow
SaaS Metrics