当前位置 : 主页 > 网页制作 > HTTP/TCP >

从URL读取csv文件,不带“csv”后缀

来源:互联网 收集:自由互联 发布时间:2021-06-16
这应该很容易但我不能让它工作.我想阅读以下URL,这是一个CSV文件,但没有“.csv”后缀: http://www.nwrfc.noaa.gov/water_supply/ws_text.cgi?id=TDAO3wy=2013per=APR-SEPtype=ESP10 数据结构的另一个“非标准”
这应该很容易但我不能让它工作.我想阅读以下URL,这是一个CSV文件,但没有“.csv”后缀:

http://www.nwrfc.noaa.gov/water_supply/ws_text.cgi?id=TDAO3&wy=2013&per=APR-SEP&type=ESP10

数据结构的另一个“非标准”方面是文件名开头有两个以“#”开头的注释行.以下是该文件的前几行:

# Water Supply Forecast for COLUMBIA - THE DALLES DAM (TDAO3) 
# ESP Generated Forecasts with 10 day QPF
ID,Forecast Date,Start Month,End Month,90% FCST,50% FCST,10% FCST
TDAO3,2013-04-29,APR,SEP,82280.5,87857.86,93216.58
TDAO3,2013-04-28,APR,SEP,81707.62,87079.18,93104.28
TDAO3,2013-04-27,APR,SEP,81298.03,86753.18,92658.75
TDAO3,2013-04-26,APR,SEP,83142.93,88694.89,94804.66
TDAO3,2013-04-25,APR,SEP,83937.66,89378.74,95840.54
TDAO3,2013-04-24,APR,SEP,83045.52,88362.98,95224.37
TDAO3,2013-04-23,APR,SEP,82921.77,88242.32,95658.01
TDAO3,2013-04-22,APR,SEP,82992.71,88539.25,95768.61
TDAO3,2013-04-20,APR,SEP,82637.34,88036.47,95859.98
TDAO3,2013-04-19,APR,SEP,83258.96,88906.11,96523.07
TDAO3,2013-04-18,APR,SEP,82768.39,88486.72,96165.99

我认为语法很简单:

fname <- "http://www.nwrfc.noaa.gov/water_supply/ws_text.cgi?id=TDAO3&wy=2013&per=APR-SEP&type=ESP10"
df <- read.table(fname, header=TRUE, sep=",", skip=2)

任何帮助将不胜感激.

这是使用正则表达式适当替换html标记的另一种方法

x <-readLines(fname)
# you want the "third" line
xx <- x[3]
## replace <br> with \n
xn <- gsub('<br>' ,'\n', xx)
## remove all other html tags (<pre> <body> etc)
xtext <- gsub("<(.|\n)*?>","", xn)
## read in (Lines starting with # are automagically read as comments (and discarded)
## because comment.char = '#' by default

mydata <- read.table(textConnection(xtext), header = TRUE, sep = ',')
网友评论