特色栏目： python 批处理 net编程 Javascript Php Asp Css Html5 Android seo centos

BeautifulSoup库的基本元素

来源：互联网收集：自由互联发布时间：2021-06-12

BeautifulSoup库 html body p class='title'/p /body/html BeautifulSoup库是解析、遍历、维护、"标签树"的功能库对标签的理解 p class='title'/p!--成对的尖括号和属性-- 导入beautifulsoup库 from bs4 import Beauti

BeautifulSoup库

<html>
    <body>
        <p class='title'></p>
    </body>
</html>

BeautifulSoup库是解析、遍历、维护、"标签树"的功能库

对标签的理解

<p class='title'></p>
<!--成对的尖括号和属性-->

导入beautifulsoup库

from bs4 import BeautifulSoup

import bs4

构造解析html的BeautifulSoup对象

from bs4 import BeautifulSoup
soup1=BeautifulSoup("<html>data</html>","html.parser")
soup2=BeautifulSoup(open("D://demo.html"),"html.parser")

BeautifulSoup库对应一个HTML/XML文档的全部内容

四种解析器

解析器使用方法条件 bs4的HTML解析器 BeautifulSoup(mk,‘html.parser‘) 安装bs4库 lxml的HTML解析器 BeautifulSoup(mk,‘lxml‘) pip install lxml lxml的xml解析器 BeautifulSoup(mk,‘xml‘) pip install lxml html5lib的解析器 BeautifulSoup(mk,‘html5lib‘) pip install html5lib

五种基本元素

基本元素说明 Tag 标签，<>开头和</>结尾 Name 标签的名字，格式 .name Attribute 标签的属性，字典形式进行组织, .attrs NavigatableString 标签内非属性字符串，格式 .string Comment 标签内字符串注释部分

获取页面信息demo

from bs4 import BeautifulSoup
import requests
html=requests.get('http://python123.io/ws/demo.html').text
soup=Beautiful(demo,'html.parser')
tag=soup.a#获取第一个a标签
name=tag.name#'a'，标签的名称
parentName=soup.a.parent.name#获取父亲节点的名称
attr=tag.attrs#属性值，字典
attr['class']#访问对应标签的属性
type(attr)#字典
tag.a.string#标签之间的信息
newsoup=BeautifulSoup('<b><!--This is a comment-->></b><p>
This is not a comment</p>','html.parser')
type(newsoup.b.string)#注释类型
type(newsoup.p.string)#文本类型

上一篇：多对多的三种创建方式、Form组件、Cookie与Session
下一篇：最易理解的傅里叶分析讲解