比方说,我有这个xml文件: ?xml version="1.0" encoding="UTF-8" ?TimeSeries timeZone1.0/timeZone series header/ event date="2009-09-30" time="10:00:00" value="0.0" flag="2"/event event date="2009-09-30" time="10:15:00" value="0.0" fl
<?xml version="1.0" encoding="UTF-8" ?> <TimeSeries> <timeZone>1.0</timeZone> <series> <header/> <event date="2009-09-30" time="10:00:00" value="0.0" flag="2"></event> <event date="2009-09-30" time="10:15:00" value="0.0" flag="2"></event> <event date="2009-09-30" time="10:30:00" value="0.0" flag="2"></event> <event date="2009-09-30" time="10:45:00" value="0.0" flag="2"></event> <event date="2009-09-30" time="11:00:00" value="0.0" flag="2"></event> <event date="2009-09-30" time="11:15:00" value="0.0" flag="2"></event> </series> <series> <header/> <event date="2009-09-30" time="08:00:00" value="1.0" flag="2"></event> <event date="2009-09-30" time="08:15:00" value="2.6" flag="2"></event> <event date="2009-09-30" time="09:00:00" value="6.3" flag="2"></event> <event date="2009-09-30" time="09:15:00" value="4.4" flag="2"></event> <event date="2009-09-30" time="09:30:00" value="3.9" flag="2"></event> <event date="2009-09-30" time="09:45:00" value="2.0" flag="2"></event> <event date="2009-09-30" time="10:00:00" value="1.7" flag="2"></event> <event date="2009-09-30" time="10:15:00" value="2.3" flag="2"></event> <event date="2009-09-30" time="10:30:00" value="2.0" flag="2"></event> </series> <series> <header/> <event date="2009-09-30" time="10:00:00" value="0.0" flag="2"></event> <event date="2009-09-30" time="10:15:00" value="0.0" flag="2"></event> <event date="2009-09-30" time="10:30:00" value="0.0" flag="2"></event> <event date="2009-09-30" time="10:45:00" value="0.0" flag="2"></event> <event date="2009-09-30" time="11:00:00" value="0.0" flag="2"></event> </series> </TimeSeries>
让我们说我想用它的系列元素做一些事情,并且我想在实践中提出建议’矢量化矢量化’…我导入XML库并执行以下操作:
R> library("XML") R> doc <- xmlTreeParse('/home/mario/Desktop/sample.xml') R> TimeSeriesNode <- xmlRoot(doc) R> seriesNodes <- xmlElementsByTagName(TimeSeriesNode, "series") R> length(seriesNodes) [1] 3 R> (function(x){length(xmlElementsByTagName(x[['series']], 'event'))} + )(seriesNodes) [1] 6 R>
而且我不明白为什么我只能得到将函数应用于第一个元素的结果:我曾经期望三个值,就像seriesNodes的长度一样,如下所示:
R> mapply(length, seriesNodes) series series series 7 10 6
哎呀!我已经得到了答案:“使用mapply”:
R> mapply(function(x){length(xmlElementsByTagName(x, 'event'))}, seriesNodes) series series series 6 9 5
但后来我看到了以下问题:R-inferno告诉我,我是“循环隐藏”,而不是“矢量化”!我可以避免循环吗? …
您还可以使用xpathApply或xpathSApply–这些函数使用XPath规范提取节点集,然后执行每个集合的函数.这两个函数都由XML包提供.要使用这些函数,必须使用xmlInternalTreeParse或将xmlTreeParse的useInternalNodes选项设置为true来解析XML文档:require( XML ) countEvents <- function( series ){ events <- xmlElementsByTagName( series, 'event' ) return( length( events ) ) } doc <- xmlTreeParse( "sample.xml", useInternalNodes = T ) xpathSApply( doc, '/TimeSeries/series', countEvents ) [1] 6 9 5
我不知道它是否“更快”,但代码对于任何知道XPath语法以及apply函数如何运行的人来说都非常清晰且非常明确.