特色栏目： python 批处理 net编程 Javascript Php Asp Css Html5 Android seo centos

性能 – 分组数据帧的最有效方法

来源：互联网收集：自由互联发布时间：2021-06-22

任何人都可以建议更有效的方法来分组数据帧而不使用SQL / indexing / data.table选项吗？我寻找类似的问题,this one建议索引选项. 以下是定时子集的方法. #Dummy datadat - data.frame(x = runif(10000

任何人都可以建议更有效的方法来分组数据帧而不使用SQL / indexing / data.table选项吗？

我寻找类似的问题,this one建议索引选项.

以下是定时子集的方法.

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Subset and time
system.time(x <- dat[dat$x > 500, ])
#   user  system elapsed 
#  0.092   0.000   0.090 
system.time(x <- dat[which(dat$x > 500), ])
#   user  system elapsed 
#  0.040   0.032   0.070 
system.time(x <- subset(dat, x > 500))
#   user  system elapsed 
#  0.108   0.004   0.109

编辑：
正如罗兰建议我使用microbenchmark.它似乎表现最佳.

library("ggplot2")
library("microbenchmark")

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Benchmark
res <- microbenchmark( dat[dat$x > 500, ],
                       dat[which(dat$x > 500), ],
                       subset(dat, x > 500))
#plot
autoplot.microbenchmark(res)

正如罗兰建议我使用microbenchmark.似乎哪个表现最好.

library("ggplot2")
library("microbenchmark")

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Benchmark
res <- microbenchmark( dat[dat$x > 500, ],
                       dat[which(dat$x > 500), ],
                       subset(dat, x > 500))
#plot
autoplot.microbenchmark(res)

上一篇：性能 – 导入大型矩阵：全部导入还是按列导入？ – MATLAB
下一篇：性能 – 如何矢量化许多排名第一的外部产品？

性能 – 分组数据帧的最有效方法

相关文章