当用户以数字方式完成步骤时,is_digitally_signed将更改为YES. 我要做的是:如果任何步骤以数字方式完成,我想检索相同application_id和user_id的所有行.请检查我想要的输出. R代码复制我的数据
我要做的是:如果任何步骤以数字方式完成,我想检索相同application_id和user_id的所有行.请检查我想要的输出.
R代码复制我的数据集
df <- data.table(application_id = c(1,1,1,2,2,2,3,3,3), user_id = c(123,123,123,456,456,456,789,789,789), application_status = c("incomplete", "details_verified", "complete"), date = c("01/01/2018", "02/01/2018", "03/01/2018"), is_digitally_signed = c("NULL", "NULL", "YES", "NULL", "NULL", "NULL", "NULL", "YES", "NULL")) %>% mutate(date = as.Date(date, "%d/%m/%Y"))
带输出
df application_id user_id application_status date is_digitally_signed 1 123 incomplete 2018-01-01 NULL 1 123 details_verified 2018-01-02 NULL 1 123 complete 2018-01-03 YES 2 456 incomplete 2018-01-01 NULL 2 456 details_verified 2018-01-02 NULL 2 456 complete 2018-01-03 NULL 3 789 incomplete 2018-01-01 NULL 3 789 details_verified 2018-01-02 YES 3 789 complete 2018-01-03 NULL
我的(不成功的)努力
df %>% group_by(application_id,user_id) %>% filter_all(all.vars(. == "YES"))
期望的结果
application_id user_id application_status date is_digitally_signed 1 123 incomplete 2018-01-01 NULL 1 123 details_verified 2018-01-02 NULL 1 123 complete 2018-01-03 YES 3 789 incomplete 2018-01-01 NULL 3 789 details_verified 2018-01-02 YES 3 789 complete 2018-01-03 NULLdplyr
我们可以使用filter和any,它检查给定的组是否至少有一条is_digitally_signed ==’YES’的记录:
library(dplyr) df %>% group_by(application_id, user_id) %>% filter(any(is_digitally_signed == "YES"))
或者使用all函数来组合其中并非所有is_digitally_signed ==“NULL”的组:
df %>% group_by(application_id, user_id) %>% filter(!all(is_digitally_signed == "NULL"))
data.table
我们也可以使用data.table,因为您已经将数据作为DT加载:
library(data.table) dt = setDT(df) dt[dt[,.I[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]$V1,]
或.SD:
dt[,.SD[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]
输出:
# A tibble: 6 x 5 # Groups: application_id, user_id [2] application_id user_id application_status date is_digitally_signed <dbl> <dbl> <fct> <date> <fct> 1 1 123 incomplete 2018-01-01 NULL 2 1 123 details_verified 2018-01-02 NULL 3 1 123 complete 2018-01-03 YES 4 3 789 incomplete 2018-01-01 NULL 5 3 789 details_verified 2018-01-02 YES 6 3 789 complete 2018-01-03 NULL