Larry_liu92 2016-12-30 06:56 采纳率: 0%
浏览 2879

京东爬虫模拟登录卡在验证码

希望用python做一个小程序来爬取京东信息,无奈总卡在验证码环节。已确认京东发送验证码的地址应该没错,但是每次收到的验证码却都是几个固定伪码,导致登录不上去。
不知道哪位大神可以提供点思路给小弟。

class JDWrapper(object):
'''
This class used to simulate login JD
'''

def __init__(self, usr_name, usr_pwd):
    # cookie info
    self.trackid = ''
    self.uuid = ''
    self.eid = ''
    self.fp = ''

    self.usr_name = usr_name
    self.usr_pwd = usr_pwd

    self.interval = 0

    # init url related
    self.home = 'https://passport.jd.com/new/login.aspx'
    self.login = 'https://passport.jd.com/uc/loginService'
    self.imag = 'https://authcode.jd.com/verify/image'
    self.auth = 'https://passport.jd.com/uc/showAuthCode'

    self.sess = requests.Session()
    self.sess.header = {
        'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
        'ContentType': 'application/x-www-form-urlencoded; charset=utf-8',
        'Connection' : 'keep-alive',
    }

    try:
        self.browser = webdriver.PhantomJS('phantomjs.exe')
    except Exception, e:
        print 'Phantomjs initialize failed :', e
        exit(1)


@staticmethod
def print_json(resp_text):
    '''
    format the response content
    '''
    if resp_text[0] == '(':
        resp_text = resp_text[1:-1]

    for k,v in json.loads(resp_text).items():
        print u'%s : %s' % (k, v)

@staticmethod
def response_status(resp):
    if resp.status_code != requests.codes.OK:
        print 'Status: %u, Url: %s' % (resp.status_code, resp.url)
        return False
    return True

def need_auth_code(self, usr_name):
    # check if need auth code
    # 
    auth_dat = {
        'loginName': usr_name,
    }
    payload = {
        'r' : random.random(),
        'version' : 2015
    }

    resp = self.sess.post(self.auth, data=auth_dat, params=payload)
    if self.response_status(resp) : 
        js = json.loads(resp.text[1:-1])
        return js['verifycode']

    print u'获取是否需要验证码失败'
    return False


def get_auth_code(self, uuid):
    # image save path
    image_file = os.path.join(os.getcwd(), 'authcode.jfif')

    payload = {
        'a' : 1,
        'acid' : uuid,
        'uid' : uuid,
        'yys' : str(int(time.time() * 1000)),
    }

    # get auth code
    r = self.sess.get(self.imag, params=payload)
    if not self.response_status(r):
        print u'获取验证码失败'
        return False

    with open (image_file, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024):
            f.write(chunk)

        f.close()

    os.system('start ' + image_file)
    return str(raw_input('Auth Code: '))


def login_once(self, login_data):
    # url parameter
    payload = {
        'r': random.random(),
        'uuid' : login_data['uuid'],
        'version' : 2015,
    }

    resp = self.sess.post(self.login, data=login_data, params=payload)
    if self.response_status(resp):
        js = json.loads(resp.text[1:-1])
        #self.print_json(resp.text)

        if not js.get('success') :
            print  js.get('emptyAuthcode')
            return False
        else:
            return True

    return False
  • 写回答

3条回答 默认 最新

  • oyljerry 2016-12-30 07:28
    关注

    可以先把图片保存下来,看是不是你每次都正确取到图片了,然后就是图片识别的问题了。逐步隔离分析

    评论

报告相同问题?

悬赏问题

  • ¥15 如何解决MIPS计算是否溢出
  • ¥15 vue中我代理了iframe,iframe却走的是路由,没有显示该显示的网站,这个该如何处理
  • ¥15 操作系统相关算法中while();的含义
  • ¥15 CNVcaller安装后无法找到文件
  • ¥15 visual studio2022中文乱码无法解决
  • ¥15 关于华为5g模块mh5000-31接线问题
  • ¥15 keil L6007U报错
  • ¥15 webapi 发布到iis后无法访问
  • ¥15 初学者如何快速上手学习stm32?
  • ¥15 如何自动更换布娃娃图片上的衣服