我正在分析来自Redshift数据库的数据,使用每个dplyr的连接在R中工作 – 这有效: my_db-src_postgres(host='my-cluster-blahblah.redshift.amazonaws.com', port='5439', dbname='dev',user='me', password='mypw')mytable - tb
my_db<-src_postgres(host='my-cluster-blahblah.redshift.amazonaws.com', port='5439', dbname='dev',user='me', password='mypw') mytable <- tbl(my_db, "mytable") viewstation<-mytable %>% filter(stationname=="something")
当我尝试将该输出转换为数据框时,所以:
thisdata<-data.frame(viewstation)
我收到错误消息,
警告信息:
Only first 100,000 results retrieved. Use n = -1 to retrieve all.
我应该在哪里设置n?
而不是使用thisdata<-data.frame(viewstation)
使用
thisdata <- collect(viewstation)
collect()会将数据库中的所有数据拉回R.如DPLYR :: databases晕影中所述:
When working with databases, dplyr tries to be as lazy as possible. It’s lazy in two ways:
It never pulls data back to R unless you explicitly ask for it.
It delays doing any work until the last possible minute, collecting together everything you want to do then sending that to the database in one step.