我想将 XML Like字符串拆分为c#或sql中的标记. 例如 输入字符串就像 entryAUTHORC. Qiao/AUTHOR and AUTHORR.Melhem/AUTHOR, "TITLEReducing Communication /TITLE",DATE1995/DATE. /entry 我想要这个输出: C AUTHOR. AUTHOR
例如
输入字符串就像
<entry><AUTHOR>C. Qiao</AUTHOR> and <AUTHOR>R.Melhem</AUTHOR>, "<TITLE>Reducing Communication </TITLE>",<DATE>1995</DATE>. </entry>
我想要这个输出:
C AUTHOR . AUTHOR Qiao AUTHOR and R AUTHOR . AUTHOR Melhem AUTHOR , " Reducing TITLE Communication TITLE " , 1995 DATE .考虑到以下因素,这是如何解决此问题的第一次尝试:
1. XML String是有效的(即标签之间不会有任何无效的字符)
像这样:
string xml = @"<ENTRY><AUTHOR>C. Qiao</AUTHOR>
<AUTHOR>R.Melhem</AUTHOR>
<TITLE>Reducing Communication </TITLE>
<DATE>1995</DATE>
</ENTRY>";
2.分裂将由空间”完成
string xml = @"<ENTRY><AUTHOR>C. Qiao</AUTHOR>
<AUTHOR>R.Melhem</AUTHOR>
<TITLE>Reducing Communication </TITLE>
<DATE>1995</DATE>
</ENTRY>";
XElement doc = XElement.Parse(xml);
foreach (XElement element in doc.Elements())
{
var values = element.Value.Split(' ');
foreach (string value in values)
{
Console.WriteLine(element.Name + " " + value);
}
}
将打印出来
AUTHOR C. AUTHOR Qiao AUTHOR R.Melhem TITLE Reducing TITLE Communication TITLE DATE 1995
编辑:
现在,基于“.”进行拆分.和空格,最好的想法是使用正则表达式.像这样:
var values = Regex.Split(element.Value, @"(\.| )");
foreach (string value in values.Where(x=>!String.IsNullOrWhiteSpace(x)))
{
Console.WriteLine(element.Name + " " + value);
}
如果您愿意,可以添加更多分隔符.以下示例将为您提供以下内容:
AUTHOR C AUTHOR . AUTHOR Qiao AUTHOR R AUTHOR . AUTHOR Melhem TITLE Reducing TITLE Communication DATE 1995
EDIT2:
这是一个与原始字符串一起使用的示例,它很可能不是最好的方法,因为它没有正确的令牌顺序,但它应该非常接近:
string xml = @" <entry>
<AUTHOR>C. Qiao</AUTHOR>
and
<AUTHOR>R.Melhem</AUTHOR>,
""<TITLE>Reducing Communication </TITLE>""
,<DATE>1995</DATE>.
</entry>";
//Parse xml to XDocument
XDocument doc = XDocument.Parse(xml);
// Get first element (we only have one)
XElement element = doc.Descendants().FirstOrDefault();
//Create a copy of an element for use by child elements.
XElement copyElement = new XElement(element);
//Remove all child nodes from root leaving only text
element.Elements().Remove();
//Splitting based on the tokens specified
var values = Regex.Split(element.Value, @"(\.| |\,|\"")");
foreach (string value in values.Where(x => !String.IsNullOrWhiteSpace(x)))
{
Console.WriteLine(value);
}
//Getting children nodes and splitting the same way
foreach (XElement elem in copyElement.Elements())
{
var val = Regex.Split(elem.Value, @"(\.| |\,|\"")");
foreach (string value in val.Where(x => !String.IsNullOrWhiteSpace(x)))
{
Console.WriteLine(value + " " + elem.Name);
}
}
//You can try to play with DescendantsAndSelf
//to see if you can do it in single action and with order preserved.
//foreach (XElement elem in element.DescendantsAndSelf())
//{
// //....
//}
这将打印出以下内容:
and , " " , . C AUTHOR . AUTHOR Qiao AUTHOR R AUTHOR . AUTHOR Melhem AUTHOR Reducing TITLE Communication TITLE 1995 DATE
