我想按组聚合日期.但是,每个观察可以属于几个组(例如,观察1属于组A和B).我找不到一个很好的方法来实现data.table.目前,我为每个可能的组创建了一个逻辑变量,如果观察属于该组,则该值
library(data.table) # Data set.seed(1) TF <- c(TRUE, FALSE) time <- rep(1:4, each = 5) df <- data.table(time = time, x = rnorm(20), groupA = sample(TF, size = 20, replace = TRUE), groupB = sample(TF, size = 20, replace = TRUE), groupC = sample(TF, size = 20, replace = TRUE)) # This should be nicer and less repetitive df[groupA == TRUE, .(A = sum(x)), by = time][ df[groupB == TRUE, .(B = sum(x)), by = time], on = "time"][ df[groupC == TRUE, .(C = sum(x)), by = time], on = "time"] # desired output time A B C 1: 1 NA 0.9432955 0.1331984 2: 2 1.2257538 0.2427420 0.1882493 3: 3 -0.1992284 -0.1992284 1.9016244 4: 4 0.5327774 0.9438362 0.9276459这是一个data.table的解决方案:
df[, lapply(.SD[, .(groupA, groupB, groupC)]*x, sum), time] # > df[, lapply(.SD[, .(groupA, groupB, groupC)]*x, sum), time] # time groupA groupB groupC # 1: 1 0.0000000 0.9432955 0.1331984 # 2: 2 1.2257538 0.2427420 0.1882493 # 3: 3 -0.1992284 -0.1992284 1.9016244 # 4: 4 0.5327774 0.9438362 0.9276459
或者(以thx到@ chinsoon12为评论)更多编程:
df[, lapply(.SD*x, sum), by=.(time), .SDcols=paste0("group", c("A","B","C"))]
如果您想要长格式的结果,您可以:
df[, colSums(.SD*x), by=.(time), .SDcols=paste0("group", c("A","B","C"))] ### with indicator for the group: df[, .(colSums(.SD*x), c("A","B","C")), by=.(time), .SDcols=paste0("group", c("A","B","C"))]