我想检索SAS文件中列中的特定文本. 该文件将如下所示: Patient Location infoTxt001 B Admission Code: 123456 X Exit Code: 98765W002 C Admission Code: 4567 WY Exit Code: 76543Z003 D Admission Code: 67890 L Exit Code: 4321Z
该文件将如下所示:
Patient Location infoTxt 001 B Admission Code: 123456 X Exit Code: 98765W 002 C Admission Code: 4567 WY Exit Code: 76543Z 003 D Admission Code: 67890 L Exit Code: 4321Z
我想只检索排序代码和退出代码的冒号之后的信息,并将它们放在各自的列中. “代码”可以是字母,数字和空格的任意组合.新数据如下所示:
Patient Location AdmissionCode ExitCode 001 B 123456 X 8765W 002 C 4567 WY 76543Z 003 D 67890 L 4321Z
我不熟悉SAS中的功能,但逻辑可能如下所示:
data want; set have; do i = 1 to dim(infoTxt) AdmissionCode = substring(string1, regexpr(":", string) + 1); ExitCode = substring(string2, regexpr(":", string) + 1); run;
在上面的代码中,string1表示infoTxt中的第一行文本,string2表示第二行文本infoTxt.
SAS可以通过以PRX开头的函数族来利用Perl正则表达式.如果您熟悉正则表达式,则 tip sheet是一个很好的总结.PRXMATCH和PRXPOSN可以使用捕获组测试正则表达式模式并检索组文本.
data have; input; text = _infile_; datalines; Admission Code: 123456 X Exit Code: 98765W Admission Code: 4567 WY Exit Code: 76543Z Admission Code: 67890 L Exit Code: 4321Z run; data want; set have; if _n_ = 1 then do; retain rx; rx = prxparse ('/Admission Code: (.*)Exit Code:(.*)/'); end; length AdmissionCode ExitCode $50; if prxmatch(rx,text) then do; AdmissionCode = prxposn(rx, 1, text); ExitCode = prxposn(rx, 2, text); end; drop rx; run;