对于动态加载的抓取思路:先禁止Chrome运行脚本,看页面是否完整,不完整的肯定有js在搞怪,再去分析。 1 import requests 2 3 4 # 你在想什么 直接抓包里面选择ajax就好了啊 5 6 # url = "htt
对于动态加载的抓取思路:先禁止Chrome运行脚本,看页面是否完整,不完整的肯定有js在搞怪,再去分析。
1 import requests 2 3 4 # 你在想什么 直接抓包里面选择ajax就好了啊 5 6 # url = "https://www.moer.cn/articleCountAndStatus.json" 7 url = "https://www.moer.cn/articleCountAndStatus.json?articleId=308470" 8 # url = "www.baidu.com" 9 # url = "https://wenku.baidu.com/view/aed039c7a1116c175f0e7cd184254b35eefd1a05.html?from=search" 10 header = { 11 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36" 12 13 } 14 articleId = 308470 15 params = {"data": "{}".format(articleId), 16 } 17 response = requests.post(url=url, headers=header).text 18 print(response) 19 """ 20 {"code":0,"message":"success","result":{"authorArticle_saleCount":"0","authorArticle_browseCount":1400,"col":"N","authorArticle_zanCount":"1","zan":"N"}} 21 22 """