学习笔记 lxml模块 关于lxml lxml解析模块可以利用Xpath表达式来匹配HTML字符串的内容。 关于lxml解析库的安装 进入cmd,输入以下代码,即可安装: pip install lxml 语法 from lxml import etree #创建
学习笔记
lxml模块
- 关于lxml
lxml解析模块可以利用Xpath表达式来匹配HTML字符串的内容。
- 关于lxml解析库的安装
进入cmd,输入以下代码,即可安装:
pip install lxml- 语法
#创建解析对象
parse_html = etree.HTML(html)
#html = requests.get(url, headers = headers).content.decode('utf-8')
#解析对象调用xpath
r_list = parse_html.xpath('xpath表达式')
#只要调用xpath,返回的结果一定为列表
- 举个例子
针对下面HTML文档,我们利用Xpath获取所有li节点对象、所有name节点的class属性值、所有food节点里的文本内容:
<ol><li class="Ra01">
<name class = 'Bunny01'>小黄</name>
<age>8</age>
<food>胡萝卜</food>
</li>
<li class="Ra01">
<name class = 'Bunny02'>大白</name>
<age>9</age>
<food>白菜</food>
</li>
<li class="Ra02">
<name class = 'Bunny03'>奥尼尔</name>
<age>20</age>
<food>提草</food>
</li>
<li class="Ra03">
<name class = 'Bunny03'>王子</name>
<age>30</age>
<food>进口提草</food>
</li>
</ol>
代码:
# -*- coding: utf-8 -*-from lxml import etree
html = \
"""
<ol>
<li class="Ra01">
<name class = 'Bunny01'>小黄</name>
<age>8</age>
<food>胡萝卜</food>
</li>
<li class="Ra01">
<name class = 'Bunny02'>大白</name>
<age>9</age>
<food>白菜</food>
</li>
<li class="Ra02">
<name class = 'Bunny03'>奥尼尔</name>
<age>20</age>
<food>提草</food>
</li>
<li class="Ra03">
<name class = 'Bunny03'>王子</name>
<age>30</age>
<food>进口提草</food>
</li>
</ol>
"""
parse_html = etree.HTML(html)
#获取所有li节点对象
li_list = parse_html.xpath('//ol/li')
print(li_list)
print('-'*20)
#获取所有name节点的class属性值
name_list = parse_html.xpath('//ol/li/name/@class')
print(name_list)
print('-'*20)
#获取所有food节点里的文本内容
food_list = parse_html.xpath('//ol/li/food/text()')
print(food_list)
控制台输出结果:
[<Element li at 0xad2d7371c8>, <Element li at 0xad2d737448>, <Element li at 0xad2d737288>, <Element li at 0xad2d737488>]--------------------
['Bunny01', 'Bunny02', 'Bunny03', 'Bunny03']
--------------------
['胡萝卜', '白菜', '提草', '进口提草']
【文章原创作者:高防cdn http://www.558idc.com/gfcdn.html提供,感恩】