这应该很容易但我不能让它工作.我想阅读以下URL,这是一个CSV文件,但没有“.csv”后缀: http://www.nwrfc.noaa.gov/water_supply/ws_text.cgi?id=TDAO3wy=2013per=APR-SEPtype=ESP10 数据结构的另一个“非标准”
http://www.nwrfc.noaa.gov/water_supply/ws_text.cgi?id=TDAO3&wy=2013&per=APR-SEP&type=ESP10
数据结构的另一个“非标准”方面是文件名开头有两个以“#”开头的注释行.以下是该文件的前几行:
# Water Supply Forecast for COLUMBIA - THE DALLES DAM (TDAO3) # ESP Generated Forecasts with 10 day QPF ID,Forecast Date,Start Month,End Month
我认为语法很简单:
fname <- "http://www.nwrfc.noaa.gov/water_supply/ws_text.cgi?id=TDAO3&wy=2013&per=APR-SEP&type=ESP10" df <- read.table(fname, header=TRUE, sep=",", skip=2)
任何帮助将不胜感激.
这是使用正则表达式适当替换html标记的另一种方法x <-readLines(fname) # you want the "third" line xx <- x[3] ## replace <br> with \n xn <- gsub('<br>' ,'\n', xx) ## remove all other html tags (<pre> <body> etc) xtext <- gsub("<(.|\n)*?>","", xn) ## read in (Lines starting with # are automagically read as comments (and discarded) ## because comment.char = '#' by default mydata <- read.table(textConnection(xtext), header = TRUE, sep = ',')