特色栏目： python 批处理 net编程 Javascript Php Asp Css Html5 Android seo centos

如何使用R和dplyr从Redshift中检索超过100000行

来源：互联网收集：自由互联发布时间：2021-06-16

我正在分析来自Redshift数据库的数据,使用每个dplyr的连接在R中工作 – 这有效： my_db-src_postgres(host='my-cluster-blahblah.redshift.amazonaws.com', port='5439', dbname='dev',user='me', password='mypw')mytable - tb

我正在分析来自Redshift数据库的数据,使用每个dplyr的连接在R中工作 – 这有效：

my_db<-src_postgres(host='my-cluster-blahblah.redshift.amazonaws.com', port='5439', dbname='dev',user='me', password='mypw')
mytable <- tbl(my_db, "mytable")

viewstation<-mytable %>%
    filter(stationname=="something")

当我尝试将该输出转换为数据框时,所以：

thisdata<-data.frame(viewstation)

我收到错误消息,
警告信息：

Only first 100,000 results retrieved. Use n = -1 to retrieve all.

我应该在哪里设置n？

而不是使用

thisdata<-data.frame(viewstation)

使用

thisdata <- collect(viewstation)

collect()会将数据库中的所有数据拉回R.如DPLYR :: databases晕影中所述：

When working with databases, dplyr tries to be as lazy as possible. It’s lazy in two ways:

It never pulls data back to R unless you explicitly ask for it.

It delays doing any work until the last possible minute, collecting together everything you want to do then sending that to the database in one step.

上一篇：lisp – 从CCL中检索(加载)ed源代码？
下一篇：如何使用SteamWorks API检索Steam用户名？

如何使用R和dplyr从Redshift中检索超过100000行

相关文章