当前位置 : 主页 > 网络推广 > seo >

按应用程序和用户ID分组后,检索特定文本的所有行

来源:互联网 收集:自由互联 发布时间:2021-06-16
当用户以数字方式完成步骤时,is_digitally_signed将更改为YES. 我要做的是:如果任何步骤以数字方式完成,我想检索相同application_id和user_id的所有行.请检查我想要的输出. R代码复制我的数据
当用户以数字方式完成步骤时,is_digitally_signed将更改为YES.
我要做的是:如果任何步骤以数字方式完成,我想检索相同application_id和user_id的所有行.请检查我想要的输出.

R代码复制我的数据集

df <- data.table(application_id = c(1,1,1,2,2,2,3,3,3), 
                 user_id = c(123,123,123,456,456,456,789,789,789), 
                 application_status = c("incomplete", "details_verified", "complete"),
                 date = c("01/01/2018", "02/01/2018", "03/01/2018"),
                 is_digitally_signed = c("NULL", "NULL", "YES", "NULL", "NULL", "NULL", "NULL", "YES", "NULL")) %>%
  mutate(date = as.Date(date, "%d/%m/%Y"))

带输出

df
  application_id user_id application_status       date is_digitally_signed
              1     123         incomplete  2018-01-01                NULL
              1     123   details_verified  2018-01-02                NULL
              1     123           complete  2018-01-03                 YES
              2     456         incomplete  2018-01-01                NULL
              2     456   details_verified  2018-01-02                NULL
              2     456           complete  2018-01-03                NULL
              3     789         incomplete  2018-01-01                NULL
              3     789   details_verified  2018-01-02                 YES
              3     789           complete  2018-01-03                NULL

我的(不成功的)努力

df %>% group_by(application_id,user_id) %>% filter_all(all.vars(. == "YES"))

期望的结果

application_id user_id application_status       date is_digitally_signed
              1     123         incomplete 2018-01-01                NULL
              1     123   details_verified 2018-01-02                NULL
              1     123           complete 2018-01-03                 YES
              3     789         incomplete 2018-01-01                NULL
              3     789   details_verified 2018-01-02                 YES
              3     789           complete 2018-01-03                NULL
dplyr

我们可以使用filter和any,它检查给定的组是否至少有一条is_digitally_signed ==’YES’的记录:

library(dplyr)

df %>% 
  group_by(application_id, user_id) %>%
  filter(any(is_digitally_signed == "YES"))

或者使用all函数来组合其中并非所有is_digitally_signed ==“NULL”的组:

df %>% 
  group_by(application_id, user_id) %>%
  filter(!all(is_digitally_signed == "NULL"))

data.table

我们也可以使用data.table,因为您已经将数据作为DT加载:

library(data.table)
dt = setDT(df)
dt[dt[,.I[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]$V1,]

或.SD:

dt[,.SD[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]

输出:

# A tibble: 6 x 5
# Groups:   application_id, user_id [2]
  application_id user_id application_status date       is_digitally_signed
           <dbl>   <dbl> <fct>              <date>     <fct>              
1              1     123 incomplete         2018-01-01 NULL               
2              1     123 details_verified   2018-01-02 NULL               
3              1     123 complete           2018-01-03 YES                
4              3     789 incomplete         2018-01-01 NULL               
5              3     789 details_verified   2018-01-02 YES                
6              3     789 complete           2018-01-03 NULL
网友评论