我有以下输入 XML Type Source TimeStamp2016-02-19T12:27:06.387Z/TimeStamp IPAddress IPVersion="IPv4"x.xx.xxx.xxx/IPAddress Port64435/Port DNS_Namex.xx.xxx.xxx.range9-27.abc.com/DNS_Name /Source /Type 我正在尝试使用下面的代码从
<Type>
<Source>
<TimeStamp>2016-02-19T12:27:06.387Z</TimeStamp>
<IPAddress IPVersion="IPv4">x.xx.xxx.xxx</IPAddress>
<Port>64435</Port>
<DNS_Name>x.xx.xxx.xxx.range9-27.abc.com</DNS_Name>
</Source>
</Type>
我正在尝试使用下面的代码从上面的标签中检索所有值.
REGISTER piggybank-0.15.0.jar
DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();
A = LOAD 'test.xml' using org.apache.pig.piggybank.storage.XMLLoader('Type') as (x:chararray);
B = FOREACH A GENERATE
XPath(x, 'Source/TimeStamp')
,XPath(x, 'Source/IPAddress')
,XPath(x, 'Source/IPAddress/@IPVersion')
,XPath(x, 'Source/Port')
,XPath(x, 'Source/DNS_Name');
当我转储B时,我得到以下输出,其中缺少IPVersion的值.
(2016-02-19T12:27:06.387Z,x.xx.xxx.xxx,,64435,x.xx.xxx.xxx.range9-27.abc.com)
有谁可以帮我解决这个问题?
piggybank的 XPath类中有2个错误:> ignoreNamespace逻辑中断了对XML属性的搜索
https://issues.apache.org/jira/browse/PIG-4751
> ignoreNamepace参数默认为true且无法覆盖
https://issues.apache.org/jira/browse/PIG-4752
看看XPathAll here的解决方法
