python 三：实例演练

通过前面的学习，我们基本了解抓取一个网站内容所需要的知识，下面我们就以猫眼电影网站为例，抓取他排名top100的电影信息，网址信息为：https://maoyan.com/board/4

登录https://maoyan.com/board/4网址发现，每个页面按顺序展示10个电影信息，第二页网址为：https://maoyan.com/board/4?offset=10，第三页为：https://maoyan.com/board/4?offset=20，则能推断出第N页的网址为：https://maoyan.com/board/4?offset=(N-1)*10 其中（N>1）我们要写一个循环传入偏移量，连续取10次，即可取出top100的电影信息。

我们先抓取首页试一试：

import requests

def get_one_page(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.text
    return None
def main():
    url = 'https://maoyan.com/board/4'
    html = get_one_page(url)
    print(html)
main()

运行查看结果，此时抓取了首页的内容，当然除了top10的电影外，还有网页的其他信息，此处我们暂时先不过滤，因为还没有学习python对html的解析，而用正则表达式又太麻烦。

整合代码：

import requests
from requests.exceptions import RequestException
import time

def get_one_page(url):
    try:
        response = requests.get(url)
        if response.status_code == 200:
            return response.text
        return None
    except RequestException:
        return None

def write_to_file(context):
    with open('resultd.txt', 'a') as f:
        f.write(context)

def main(offset):
    url = 'https://maoyan.com/board/4?offset=' + str(offset)
    html = get_one_page(url)
    write_to_file(html)

if __name__ == '__main__':
    for i in range(10):
        main(offset = i * 10)
        time.sleep(1)

示例中用write_to_file函数把每页内容写到文件中；由于内容太大而且杂乱此处不在展示。

下节，我们将解析这些html。

python 三：实例演练

浏览过的版块