我是一个大型数据集,我想计算一列的移动年度总和.它必须是确切的一年,所以我不能使用rollapply作为基于特定天数而不是实际日期. 作为一个例子,我有以下代码: dates = seq.Date(as.Date('
作为一个例子,我有以下代码:
dates = seq.Date(as.Date('2006-01-01'),as.Date('2007-12-31'),by='days') num = 1:length(dates) y = cbind(ld,num) ld num [1,] 13149 1 [2,] 13150 2 [3,] 13151 3 [4,] 13152 4 [5,] 13153 5 [6,] 13154 6
我希望滚动一年的历史总和列数.
我设法解决它的唯一方法是使用循环和数据帧的子集.这不是很有效,我希望有人可以告诉我如何使用嵌入函数来解释闰年,因为它更快.
使用embed函数,我有以下代码,只要它不是闰年就可以工作.
b = embed(y[,2],366) sums = colSums(b) a = ld[length(dates)-365:length(dates)] final = cbind(dates = a, rollsum = rev(sums)) head(final) dates rollsum [1,] 13513 66795 [2,] 13514 67160 [3,] 13515 67525 [4,] 13516 67890 [5,] 13517 68255 [6,] 13518 68620
有没有人有更有效的方法来计算基于特定日期而不是天数的移动总和?
您可以使用一年前的日期为数据添加列(计算闰年),并使用sqldf计算滚动总和.
# Sample data dates <- seq.Date(as.Date('2006-01-01'),as.Date('2007-12-31'),by='days') d <- data.frame( date = dates, value = rnorm(length(dates)) ) #d <- d[ sample(length(dates), length(dates)/2), ] # For more irregular data d <- d[ order(d$date), ] # Compute the date one year ago (you can also use lubridate, for date arithmetic) d$previous_year <- sapply( d$date, function(u) as.character(seq(u, length=2, by="-1 years")[2]) ) d$date <- as.character(d$date) # Compute the rolling sum library(sqldf) sqldf( " SELECT A.date AS date, SUM( B.value ) AS sum, MIN( B.date ) AS start, MAX( B.date ) AS end, COUNT(*) AS observations FROM d A, d B WHERE A.previous_year < B.date AND B.date <= A.date GROUP BY A.date " )