当前位置 : 主页 > 手机开发 > 其它 >

命名水果的两列上的自定义聚合

来源:互联网 收集:自由互联 发布时间:2021-06-22
我希望按名称聚合数据框的两列,具体方法如下: 通过专门汇总两列水果和零件,将零件列放入结果中 虽然Apple,Banana和Strawberry的零件价值无关紧要,一切都得到了总结,Grape和Kiwi的零件价值
我希望按名称聚合数据框的两列,具体方法如下:

>通过专门汇总两列水果和零件,将零件列放入结果中
>虽然Apple,Banana和Strawberry的零件价值无关紧要,一切都得到了总结,Grape和Kiwi的零件价值应该成为新的水果名称
>结果(在底部)应该有8个聚合行而不是20个

这听起来似乎很简单,但经过数小时的试验和错误,我没有找到任何有用的解决方案.这是一个例子:

theDF <- data.frame(dates = as.Date(c(today()+20)),
    fruits = c("Apple","Apple","Apple","Apple","Banana","Banana","Banana","Banana",
      "Strawberry","Strawberry","Strawberry","Strawberry","Grape", "Grape",
      "Grape","Grape", "Kiwi","Kiwi","Kiwi","Kiwi"),
    parts = c("Big Green Apple","Apple2","Blue Apple","XYZ Apple4",
      "Yellow Banana1","Small Banana","Banana3","Banana4",
      "Red Small Strawberry","Red StrawberryY","Big Strawberry",
       "StrawberryZ","Green Grape", "Blue Grape", "Blue Grape",
       "Blue Grape","Big Kiwi","Small Kiwi","Big Kiwi","Middle Kiwi"),
    stock = as.vector(sample(1:20)) )

当前数据框:

所需的输出:

我们可以使用data.table.如果有一些模式,如结束字符是大写字母或要删除的“部分”列中的数字,我们可以使用sub来做这个并用作分组变量和’dates’并得到’stock’的总和.

library(data.table)
setDT(theDF)[,.(stock = sum(stock)) , .(dates, fruits = sub("([0-9]|[A-Z])$", "", parts))]
#        dates      fruits stock
#1: 2016-06-19       Apple    46
#2: 2016-06-19      Banana    35
#3: 2016-06-19  Strawberry    38
#4: 2016-06-19 Green Grape    12
#5: 2016-06-19  Blue Grape    21
#6: 2016-06-19    Big Kiwi    37
#7: 2016-06-19  Small Kiwi    14 
#8: 2016-06-19 Middle Kiwi     7

或者使用dplyr,我们可以类似地实现相同的方法.

library(dplyr)
theDF %>%
    group_by(dates, fruits = sub('([0-9]|[A-Z])$', '', parts)) %>% 
    summarise(stock = sum(stock))

更新

如果没有模式并且仅基于手动识别’fruits’中的元素,则创建元素向量,使用%chin%获取’i’中的逻辑索引,赋值(:=)’parts’中的值对应到’我’到’水果’,然后通过’日期’,’水果’做组,并获得’股票’的总和.

setDT(theDF)[as.character(fruits) %chin% c("Grape", "Kiwi"),
          fruits := parts][, .(stock = sum(stock)), .(dates, fruits)]

数据

theDF <- structure(list(dates = structure(c(16971, 16971, 16971, 16971, 
16971, 16971, 16971, 16971, 16971, 16971, 16971, 16971, 16971, 
16971, 16971, 16971, 16971, 16971, 16971, 16971), class = "Date"), 
    fruits = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 5L, 
    5L, 5L, 5L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("Apple", 
    "Banana", "Grape", "Kiwi", "Strawberry"), class = "factor"), 
    parts = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 14L, 
    15L, 16L, 16L, 11L, 10L, 10L, 10L, 9L, 13L, 9L, 12L), .Label = c("Apple1", 
    "Apple2", "Apple3", "Apple4", "Banana1", "Banana2", "Banana3", 
    "Banana4", "Big Kiwi", "Blue Grape", "Green Grape", "Middle Kiwi", 
    "Small Kiwi", "StrawberryX", "StrawberryY", "StrawberryZ"
    ), class = "factor"), stock = c(8, 19, 15, 4, 6, 18, 1, 10, 
    9, 16, 11, 2, 12, 13, 5, 3, 17, 14, 20, 7)), .Names = c("dates", 
"fruits", "parts", "stock"), row.names = c(NA, -20L), class = "data.frame")
网友评论