beautifulsoup4教程(一)基础知识和第一个爬虫
beautifulsoup4教程(二)bs4中四大对象
beautifulsoup4教程(三)遍历和搜索文档树
beautifulsoup4教程(四)css选择器
六、CSS选择器
6.1 通过标签名查找
print soup.select('title')
print soup.select('a')
print soup.select('b')
result:
[<title>The Dormouse's story</title>]
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
[<b>The Dormouse's story</b>]
6.2 通过类名查找
print soup.select('.story')
result:
[<p class="story">Once upon a time there were three little sisters; and their names were\n<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>,\n<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and\n<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;\nand they lived at the bottom of a well.</p>, <p class="story">...</p>]
6.3 通过id名查找
print soup. select( '#link1' )
result:
print soup. select( '#link1' )
6.4 组合查找
多个过滤条件需要用空格隔开,从前往后是逐层筛选 ,选择器作用的不是 同一个结点。
print soup. select( 'p #link1' )
print soup. select( 'a #link1' )
result:
[ < a class = "sister" href= "http://example.com/elsie" id = "link1" > < !- - Elsie - - > < / a> ]
[ ]
通过下面这种方式会更好理解
print soup. select( 'p >#link1' )
print soup. select( 'a >#link1' )
result:
[ < a class = "sister" href= "http://example.com/elsie" id = "link1" > < !- - Elsie - - > < / a> ]
[ ]
6.5 属性查找
print soup. select( 'p >a' )
print soup. select( 'p >a[href="http://example.com/tillie"]' )
result:
[ < a class = "sister" href= "http://example.com/elsie" id = "link1" > < !- - Elsie - - > < / a> , < a class = "sister" href= "http://example.com/lacie" id = "link2" > Lacie< / a> , < a class = "sister" href= "http://example.com/tillie" id = "link3" > Tillie< / a> ]
[ < a class = "sister" href= "http://example.com/tillie" id = "link3" > Tillie< / a> ]
6.6 列表迭代
print soup. select( 'p >a' )
print type ( soup. select( 'p >a' ) )
print "===="
print soup. select( 'p >a' ) [ 0 ]
print "===="
for a in soup. select( 'p >a' ) :
print a
result:
[ < a class = "sister" href= "http://example.com/elsie" id = "link1" > < !- - Elsie - - > < / a> , < a class = "sister" href= "http://example.com/lacie" id = "link2" > Lacie< / a> , < a class = "sister" href= "http://example.com/tillie" id = "link3" > Tillie< / a> ]
< type 'list' >
== ==
< a class = "sister" href= "http://example.com/elsie" id = "link1" > < !- - Elsie - - > < / a>
== ==
< a class = "sister" href= "http://example.com/elsie" id = "link1" > < !- - Elsie - - > < / a>
< a class = "sister" href= "http://example.com/lacie" id = "link2" > Lacie< / a>
< a class = "sister" href= "http://example.com/tillie" id = "link3" > Tillie< / a>