我有(html-)文本,我想改变 ouml; ä,ü,ö等真实角色的东西,因为否则xml-package不接受它. 所以我写了一个小函数,循环一个替换表(link1,link2)并用sp特殊字符替换特殊字符…函数看起来像这样(只
所以我写了一个小函数,循环一个替换表(link1,link2)并用sp特殊字符替换特殊字符…函数看起来像这样(只有looonger):
html.charconv <- function(text){
replacer <- matrix(c(
"Á", "Á",
"á", "á",
"Â", "Â",
"â", "â",
"´", "´"
)
,ncol=2,byrow=T)
for(i in 1:length(replacer[,1])){
text <- str_replace_all(text,replacer[i,2],replacer[i,1])
}
text
}
我怎么能加快速度呢?我考虑过矢量化但没有任何帮助解决方案,因为对于每个周期,最后一个周期的结果是它的起点.
通过将函数构造有点不同,您可以获得显着的加速,并忘记文本工具.基本上你:>拆分字符串
>匹配您想要的字符,并用新字符替换它们
>再把所有东西粘在一起
您可以使用以下功能执行此操作:
html.fastconv <- function(x,old,new){
xs <- strsplit(x,"&|;")
old <- gsub("&|;","",old)
xs <- lapply(xs,function(i){
id <- match(i,old,0L)
i[id!=0] <- new[id]
return(i)
})
sapply(xs,paste,collapse="")
}
这适用于:
> sometext <- c("Ádd somá leÂtterâ acute problems et´ cetera",
+ "Ádd somá leÂtterâ acute p ..." ... [TRUNCATED]
> newchar <- c("Á","á","Â","â","´")
> oldchar <- c("Á","á","Â","â","´")
> html.fastconv(sometext,oldchar,newchar)
[1] "Ádd somá leÂtterâ acute problems et´ cetera" "Ádd somá leÂtterâ acute problems et´ cetera"
为了记录,一些基准测试:
require(rbenchmark)
benchmark(html.fastconv(sometext,oldchar,newchar),html.charconv(sometext),
columns=c("test","elapsed","relative"),
replications=1000)
test elapsed relative
2 html.charconv(sometext) 0.79 5.643
1 html.fastconv(sometext, oldchar, newchar) 0.14 1.000
