利用BeautifulSoup4來抓取 www.pm25.com 上的PM2.5數(shù)據(jù),之所以抓取這個網(wǎng)站,是因為上面有城市PM2.5濃度排名(其實真正的原因是,它是百度搜PM2.5出來的第一個網(wǎng)站!)
程序里只對比了兩個城市,所以多線程的速度提升并不是很明顯,大家可以弄10個城市并開10個線程試試。
最后吐槽一下:上海的空氣質(zhì)量怎么這么差!!!
PM25.py
代碼如下:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# by ustcwq
import urllib2
import threading
from time import ctime
from bs4 import BeautifulSoup
def getPM25(cityname):
site = 'http://www.pm25.com/' + cityname + '.html'
html = urllib2.urlopen(site)
soup = BeautifulSoup(html)
city = soup.find(class_ = 'bi_loaction_city') # 城市名稱
aqi = soup.find("a",{"class","bi_aqiarea_num"}) # AQI指數(shù)
quality = soup.select(".bi_aqiarea_right span") # 空氣質(zhì)量等級
result = soup.find("div",class_ ='bi_aqiarea_bottom') # 空氣質(zhì)量描述
print city.text + u'AQI指數(shù):' + aqi.text + u'\n空氣質(zhì)量:' + quality[0].text + result.text
print '*'*20 + ctime() + '*'*20
def one_thread(): # 單線程
print 'One_thread Start: ' + ctime() + '\n'
getPM25('hefei')
getPM25('shanghai')
def two_thread(): # 多線程
print 'Two_thread Start: ' + ctime() + '\n'
threads = []
t1 = threading.Thread(target=getPM25,args=('hefei',))
threads.append(t1)
t2 = threading.Thread(target=getPM25,args=('shanghai',))
threads.append(t2)
for t in threads:
# t.setDaemon(True)
t.start()
if __name__ == '__main__':
one_thread()
print '\n' * 2
two_thread()
聲明:本網(wǎng)頁內(nèi)容旨在傳播知識,若有侵權(quán)等問題請及時與本網(wǎng)聯(lián)系,我們將在第一時間刪除處理。TEL:177 7030 7066 E-MAIL:11247931@qq.com