黑板客爬虫闯关4

youncyb 发布于 2017-08-11 1720 次阅读 Python


这关的密码有点恶心,一共13页,前12页每页8个最后一页有4个,所以一共是一百位的密码,wc,一百位,取完黄花菜都凉了,还好

threading这模块,多线程并发,不过这模块有点坑,至于坑在何处,大家学习了这模块后会发现的。

1.和第三关的登陆一模一样,登陆后先随便输入密码,提交后会提示一个pwd_list页面,并且密码的位置都是随机的,每次都不相同,哎

wc,还好有个可以给字典排序sorted函数。

废话不多说,上代码。

#! user/bin/env python
# -*- coding:utf-8 -*-
import requests
from lxml import etree
import threading
import time
def getPassword(session,url,pwd_dict):
    tree_pos=[]
    tree_val=[]
    req=session.get(url).text
    tree=etree.HTML(req)
    tree_pos=tree.xpath('//td[@title="password_pos"]/text()')
    tree_val=tree.xpath('//td[@title="password_val"]/text()')
    print(tree_pos,tree_val)
    for i,j in zip(tree_pos,tree_val):
        pwd_dict[int(i)]=int(j)#强制转换类型,不然报错。并且字典的key要为int型,不然排序函数没啥卵用。
    return
def main():
    pwd=''
    pwd_dict={}
    threads=[]
    url1="http://www.heibanke.com/accounts/login/?next=/lesson/crawler_ex03/"
    session=requests.Session()
    req=session.get(url1)
    cookies=requests.utils.dict_from_cookiejar(session.cookies)
    data1={'username':'youncyb','password':'heibanke163com','csrfmiddlewaretoken':cookies['csrftoken']}
    headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0',
              'Referer':'http://www.heibanke.com/accounts/login/?next=/lesson/crawler_ex02/'
            }
    req=session.post(url1,data=data1,headers=headers)
    cookies=requests.utils.dict_from_cookiejar(session.cookies)
    url="http://www.heibanke.com/lesson/crawler_ex03/pw_list/?page=1"
    for i in range(50):#保证密码达到100位
        t=threading.Thread(target=getPassword,args=(session,url,pwd_dict))
        t.start()
        time.sleep(8)#少于8秒报错,其实8加个random好一点吧。
        threads.append(t)
    for t in threads:
        t.join()
    pwd_list=sorted(pwd_dict.items(),key=lambda k:k[0])
    print(pwd_list)
    for (i,j) in pwd_list:
        pwd+=str(j)
    print(pwd)
    print(len(pwd))
    url2="http://www.heibanke.com/lesson/crawler_ex03/"
    data2={'username':'adc','password':pwd,'csrfmiddlewaretoken':cookies['csrftoken']}
    req=session.post(url2,data=data2,headers=headers).text
    content=str(etree.HTML(req).xpath('//h3/text()'))
    key="您输入的密码错误, 请重新输入"
    if key not in content:
        print(content)
        print(i)
if __name__ == '__main__':
    main()