实战1:知乎破解加密算法模拟登陆

1.1 目标站点网址

https://www.zhihu.com/signin?next=%2F

知乎上所有的数据都是得登录之后才能获取的,所以我们首先得用爬虫模拟知乎的登录操作

1.2 站点分析

1.2.1 登录过程分析

首先我们访问登录页面的时候,服务器会给我们返回一系列的cookie

这些cookie我们需要保存下来

接着,咱点击密码登录会发现,如下的一个验证码的请求:

这个请求是用来询问服务器,我需不需要填写验证码,’show_captcha’为false,则不需要验证码,’show_captcha’为true,则需要验证码

经过多次测试,会发现,lang这个字段,控制的是验证码格式,en为英文验证码,cn为中文验证码(这个需要你点击哪些中文是倒置的).因为我们选择使用英文验证码

当’show_captcha’为true的时候,浏览器会继续像这个url发送请求,来获取验证码的图片内容

虽然是同一个url,只不过这次请求的请求方法变成了PUT,返回的内容为base64处理过的图片内容,这个我们只需要用python中base64这个库去解析就好了

当我输入验证码并提交的时候,浏览器会向这个url发送第三次请求,用来验证我的验证码是否输入正确

提交验证码的url依旧是这个url,不过这次请求方法为POST,请求体为{"input_text":’验证码内容’}

如果验证通过,’success’这个字段的值就为true

然后紧接着,浏览器就会向服务器提交我的用户名和密码.

这个时候,咱就懵逼了,因为请求体的数据全部是前端js加密过的密文

第一,我不知道请求体都需要哪些字段

第二,我就算知道都有哪些字段,不知道加密规则,也无法正常提交

所以,我们下面就需要找到请求体的加密算法

1.2.2 加密算法破解

正常情况下,网站程序员为了便于识别,对于加密算法,一般都会带有关键字"encrypt",所以,我们使用控制台上的搜索功能,来尝试定位到他们的js加密算法

这个结果让人心情愉悦,咱一共找到3个文件,共6个匹配.

每个匹配都点进去瞅一眼

最后,我们会找到下面这块代码,咱打个断点,然后重新点击登录,程序会被卡主

这个时候我们注意看上边e这个变量

"client_id=c3cef7c66a1843f8b3a9e6a1e3160e20&grant_type=password&timestamp=1566371889615&source=com.zhihu.web&signature=849409fe69f76b28a7ebfa95f0acc784d7c812bf&username=%2B8618896530856&password=dadasdasdas&captcha=nngt&lang=en&utm_source=&ref_source=other_https%3A%2F%2Fwww.zhihu.com%2Fsignin%3Fnext%3D%252F"

很明显了,我们找到了post请求的请求体数据,但是这是url编码的,有些字段看不懂呀,咋办?

python里调url解析库解个码就好了:

from urllib.parse import unquote_plus

msg = '''
client_id=c3cef7c66a1843f8b3a9e6a1e3160e20&grant_type=password&timestamp=1566371889615&source=com.zhihu.web&signature=849409fe69f76b28a7ebfa95f0acc784d7c812bf&username=%2B8618896530856&password=dadasdasdas&captcha=nngt&lang=en&utm_source=&ref_source=other_https%3A%2F%2Fwww.zhihu.com%2Fsignin%3Fnext%3D%252F
'''
print(unquote_plus(msg))

输出结果如下:

client_id=c3cef7c66a1843f8b3a9e6a1e3160e20&grant_type=password&timestamp=1566371889615&source=com.zhihu.web&signature=849409fe69f76b28a7ebfa95f0acc784d7c812bf&username=+8618896530856&password=dadasdasdas&captcha=nngt&lang=en&utm_source=&ref_source=other_https://www.zhihu.com/signin?next=%2F

里面需要包含的字段如下:

client_id      用户id(固定值)
grant_type     验证方式(固定值)
timestamp      时间戳*1000,去尾
source         (固定值)
signature      签名(js加密,变动)
username       用户名
password       密码
captcha        验证码
lang           验证码方式(固定值)
utm_source     (固定值)
ref_source     (固定值)other_https://www.zhihu.com/signin?next=%2F

这些字段都很好处理,但是让人老壳疼的地方又来了,这个signature是加密的,我们还得找出它的js加密算法

怎么找,很容易,咱搜索关键字"signature",在一堆匹配中会找到下面的匹配

ok,咱找着了,hamc的加密方式,sha-1的加密算法,咱用python的hmac,hashlib这两个包就能搞定

import hmac
from hashlib import sha1

咱接着说请求体的加密,刚刚我们找到了加密的地方,怎么处理呢,只要把它所在的js函数全部复制到python里,在python里调用node.js去执行就可以了,复制结果如下:

    function t(e) {
        return (t = "function" == typeof Symbol && "symbol" == typeof Symbol.A ? function(e) {
            return typeof e
        }
        : function(e) {
            return e && "function" == typeof Symbol && e.constructor === Symbol && e !== Symbol.prototype ? "symbol" : typeof e
        }
        )(e)
    }
    Object.defineProperty(exports, "__esModule", {
        value: !0
    });
    var A = "2.0"
      , __g = {};
    function s() {}
    function i(e) {
        this.t = (2048 & e) >> 11,
        this.s = (1536 & e) >> 9,
        this.i = 511 & e,
        this.h = 511 & e
    }
    function h(e) {
        this.s = (3072 & e) >> 10,
        this.h = 1023 & e
    }
    function a(e) {
        this.a = (3072 & e) >> 10,
        this.c = (768 & e) >> 8,
        this.n = (192 & e) >> 6,
        this.t = 63 & e
    }
    function c(e) {
        this.s = e >> 10 & 3,
        this.i = 1023 & e
    }
    function n() {}
    function e(e) {
        this.a = (3072 & e) >> 10,
        this.c = (768 & e) >> 8,
        this.n = (192 & e) >> 6,
        this.t = 63 & e
    }
    function o(e) {
        this.h = (4095 & e) >> 2,
        this.t = 3 & e
    }
    function r(e) {
        this.s = e >> 10 & 3,
        this.i = e >> 2 & 255,
        this.t = 3 & e
    }
    s.prototype.e = function(e) {
        e.o = !1
    }
    ,
    i.prototype.e = function(e) {
        switch (this.t) {
        case 0:
            e.r[this.s] = this.i;
            break;
        case 1:
            e.r[this.s] = e.k[this.h]
        }
    }
    ,
    h.prototype.e = function(e) {
        e.k[this.h] = e.r[this.s]
    }
    ,
    a.prototype.e = function(e) {
        switch (this.t) {
        case 0:
            e.r[this.a] = e.r[this.c] + e.r[this.n];
            break;
        case 1:
            e.r[this.a] = e.r[this.c] - e.r[this.n];
            break;
        case 2:
            e.r[this.a] = e.r[this.c] * e.r[this.n];
            break;
        case 3:
            e.r[this.a] = e.r[this.c] / e.r[this.n];
            break;
        case 4:
            e.r[this.a] = e.r[this.c] % e.r[this.n];
            break;
        case 5:
            e.r[this.a] = e.r[this.c] == e.r[this.n];
            break;
        case 6:
            e.r[this.a] = e.r[this.c] >= e.r[this.n];
            break;
        case 7:
            e.r[this.a] = e.r[this.c] || e.r[this.n];
            break;
        case 8:
            e.r[this.a] = e.r[this.c] && e.r[this.n];
            break;
        case 9:
            e.r[this.a] = e.r[this.c] !== e.r[this.n];
            break;
        case 10:
            e.r[this.a] = t(e.r[this.c]);
            break;
        case 11:
            e.r[this.a] = e.r[this.c]in e.r[this.n];
            break;
        case 12:
            e.r[this.a] = e.r[this.c] > e.r[this.n];
            break;
        case 13:
            e.r[this.a] = -e.r[this.c];
            break;
        case 14:
            e.r[this.a] = e.r[this.c] < e.r[this.n];
            break;
        case 15:
            e.r[this.a] = e.r[this.c] & e.r[this.n];
            break;
        case 16:
            e.r[this.a] = e.r[this.c] ^ e.r[this.n];
            break;
        case 17:
            e.r[this.a] = e.r[this.c] << e.r[this.n];
            break;
        case 18:
            e.r[this.a] = e.r[this.c] >>> e.r[this.n];
            break;
        case 19:
            e.r[this.a] = e.r[this.c] | e.r[this.n];
            break;
        case 20:
            e.r[this.a] = !e.r[this.c]
        }
    }
    ,
    c.prototype.e = function(e) {
        e.Q.push(e.C),
        e.B.push(e.k),
        e.C = e.r[this.s],
        e.k = [];
        for (var t = 0; t < this.i; t++)
            e.k.unshift(e.f.pop());
        e.g.push(e.f),
        e.f = []
    }
    ,
    n.prototype.e = function(e) {
        e.C = e.Q.pop(),
        e.k = e.B.pop(),
        e.f = e.g.pop()
    }
    ,
    e.prototype.e = function(e) {
        switch (this.t) {
        case 0:
            e.u = e.r[this.a] >= e.r[this.c];
            break;
        case 1:
            e.u = e.r[this.a] <= e.r[this.c];
            break;
        case 2:
            e.u = e.r[this.a] > e.r[this.c];
            break;
        case 3:
            e.u = e.r[this.a] < e.r[this.c];
            break;
        case 4:
            e.u = e.r[this.a] == e.r[this.c];
            break;
        case 5:
            e.u = e.r[this.a] != e.r[this.c];
            break;
        case 6:
            e.u = e.r[this.a];
            break;
        case 7:
            e.u = !e.r[this.a]
        }
    }
    ,
    o.prototype.e = function(e) {
        switch (this.t) {
        case 0:
            e.C = this.h;
            break;
        case 1:
            e.u && (e.C = this.h);
            break;
        case 2:
            e.u || (e.C = this.h);
            break;
        case 3:
            e.C = this.h,
            e.w = null
        }
        e.u = !1
    }
    ,
    r.prototype.e = function(e) {
        switch (this.t) {
        case 0:
            for (var t = [], n = 0; n < this.i; n++)
                t.unshift(e.f.pop());
            e.r[3] = e.r[this.s](t[0], t[1]);
            break;
        case 1:
            for (var r = e.f.pop(), a = [], o = 0; o < this.i; o++)
                a.unshift(e.f.pop());
            e.r[3] = e.r[this.s][r](a[0], a[1]);
            break;
        case 2:
            for (var i = [], s = 0; s < this.i; s++)
                i.unshift(e.f.pop());
            e.r[3] = new e.r[this.s](i[0],i[1])
        }
    }
    ;
    var k = function(e) {
        for (var t = 66, n = [], r = 0; r < e.length; r++) {
            var a = 24 ^ e.charCodeAt(r) ^ t;
            n.push(String.fromCharCode(a)),
            t = a
        }
        return n.join("")
    };
    function Q(e) {
        this.t = (4095 & e) >> 10,
        this.s = (1023 & e) >> 8,
        this.i = 1023 & e,
        this.h = 63 & e
    }
    function C(e) {
        this.t = (4095 & e) >> 10,
        this.a = (1023 & e) >> 8,
        this.c = (255 & e) >> 6
    }
    function B(e) {
        this.s = (3072 & e) >> 10,
        this.h = 1023 & e
    }
    function f(e) {
        this.h = 4095 & e
    }
    function g(e) {
        this.s = (3072 & e) >> 10
    }
    function u(e) {
        this.h = 4095 & e
    }
    function w(e) {
        this.t = (3840 & e) >> 8,
        this.s = (192 & e) >> 6,
        this.i = 63 & e
    }
    function G() {
        this.r = [0, 0, 0, 0],
        this.C = 0,
        this.Q = [],
        this.k = [],
        this.B = [],
        this.f = [],
        this.g = [],
        this.u = !1,
        this.G = [],
        this.b = [],
        this.o = !1,
        this.w = null,
        this.U = null,
        this.F = [],
        this.R = 0,
        this.J = {
            0: s,
            1: i,
            2: h,
            3: a,
            4: c,
            5: n,
            6: e,
            7: o,
            8: r,
            9: Q,
            10: C,
            11: B,
            12: f,
            13: g,
            14: u,
            15: w
        }
    }
    Q.prototype.e = function(e) {
        switch (this.t) {
        case 0:
            e.f.push(e.r[this.s]);
            break;
        case 1:
            e.f.push(this.i);
            break;
        case 2:
            e.f.push(e.k[this.h]);
            break;
        case 3:
            e.f.push(k(e.b[this.h]))
        }
    }
    ,
    C.prototype.e = function(A) {
        switch (this.t) {
        case 0:
            var t = A.f.pop();
            A.r[this.a] = A.r[this.c][t];
            break;
        case 1:
            var s = A.f.pop()
              , i = A.f.pop();
            A.r[this.c][s] = i;
            break;
        case 2:
            var h = A.f.pop();
            A.r[this.a] = eval(h)
        }
    }
    ,
    B.prototype.e = function(e) {
        e.r[this.s] = k(e.b[this.h])
    }
    ,
    f.prototype.e = function(e) {
        e.w = this.h
    }
    ,
    g.prototype.e = function(e) {
        throw e.r[this.s]
    }
    ,
    u.prototype.e = function(e) {
        var t = this
          , n = [0];
        e.k.forEach(function(e) {
            n.push(e)
        });
        var r = function(r) {
            var a = new G;
            return a.k = n,
            a.k[0] = r,
            a.v(e.G, t.h, e.b, e.F),
            a.r[3]
        };
        r.toString = function() {
            return "() { [native code] }"
        }
        ,
        e.r[3] = r
    }
    ,
    w.prototype.e = function(e) {
        switch (this.t) {
        case 0:
            for (var t = {}, n = 0; n < this.i; n++) {
                var r = e.f.pop();
                t[e.f.pop()] = r
            }
            e.r[this.s] = t;
            break;
        case 1:
            for (var a = [], o = 0; o < this.i; o++)
                a.unshift(e.f.pop());
            e.r[this.s] = a
        }
    }
    ,
    G.prototype.D = function(e) {
        for (var t = atob(e), n = t.charCodeAt(0) << 8 | t.charCodeAt(1), r = [], a = 2; a < n + 2; a += 2)
            r.push(t.charCodeAt(a) << 8 | t.charCodeAt(a + 1));
        this.G = r;
        for (var o = [], i = n + 2; i < t.length; ) {
            var s = t.charCodeAt(i) << 8 | t.charCodeAt(i + 1)
              , c = t.slice(i + 2, i + 2 + s);
            o.push(c),
            i += s + 2
        }
        this.b = o
    }
    ,
    G.prototype.v = function(e, t, n) {
        for (t = t || 0,
        n = n || [],
        this.C = t,
        "string" == typeof e ? this.D(e) : (this.G = e,
        this.b = n),
        this.o = !0,
        this.R = Date.now(); this.o; ) {
            var r = this.G[this.C++];
            if ("number" != typeof r)
                break;
            var a = Date.now();
            if (500 < a - this.R)
                return;
            this.R = a;
            try {
                this.e(r)
            } catch (e) {
                this.U = e,
                this.w && (this.C = this.w)
            }
        }
    }
    ,
    G.prototype.e = function(e) {
        var t = (61440 & e) >> 12;
        new this.J[t](e).e(this)
    }
    ,
    "undefined" != typeof window && (new G).v("AxjgB5MAnACoAJwBpAAAABAAIAKcAqgAMAq0AzRJZAZwUpwCqACQACACGAKcBKAAIAOcBagAIAQYAjAUGgKcBqFAuAc5hTSHZAZwqrAIGgA0QJEAJAAYAzAUGgOcCaFANRQ0R2QGcOKwChoANECRACQAsAuQABgDnAmgAJwMgAGcDYwFEAAzBmAGcSqwDhoANECRACQAGAKcD6AAGgKcEKFANEcYApwRoAAxB2AGcXKwEhoANECRACQAGAKcE6AAGgKcFKFANEdkBnGqsBUaADRAkQAkABgCnBagAGAGcdKwFxoANECRACQAGAKcGKAAYAZx+rAZGgA0QJEAJAAYA5waoABgBnIisBsaADRAkQAkABgCnBygABoCnB2hQDRHZAZyWrAeGgA0QJEAJAAYBJwfoAAwFGAGcoawIBoANECRACQAGAOQALAJkAAYBJwfgAlsBnK+sCEaADRAkQAkABgDkACwGpAAGAScH4AJbAZy9rAiGgA0QJEAJACwI5AAGAScH6AAkACcJKgAnCWgAJwmoACcJ4AFnA2MBRAAMw5gBnNasCgaADRAkQAkABgBEio0R5EAJAGwKSAFGACcKqAAEgM0RCQGGAYSATRFZAZzshgAtCs0QCQAGAYSAjRFZAZz1hgAtCw0QCQAEAAgB7AtIAgYAJwqoAASATRBJAkYCRIANEZkBnYqEAgaBxQBOYAoBxQEOYQ0giQKGAmQABgAnC6ABRgBGgo0UhD/MQ8zECALEAgaBxQBOYAoBxQEOYQ0gpEAJAoYARoKNFIQ/zEPkAAgChgLGgkUATmBkgAaAJwuhAUaCjdQFAg5kTSTJAsQCBoHFAE5gCgHFAQ5hDSCkQAkChgBGgo0UhD/MQ+QACAKGAsaCRQCOYGSABoAnC6EBRoKN1AUEDmRNJMkCxgFGgsUPzmPkgAaCJwvhAU0wCQFGAUaCxQGOZISPzZPkQAaCJwvhAU0wCQFGAUaCxQMOZISPzZPkQAaCJwvhAU0wCQFGAUaCxQSOZISPzZPkQAaCJwvhAU0wCQFGAkSAzRBJAlz/B4FUAAAAwUYIAAIBSITFQkTERwABi0GHxITAAAJLwMSGRsXHxMZAAk0Fw8HFh4NAwUABhU1EBceDwAENBcUEAAGNBkTGRcBAAFKAAkvHg4PKz4aEwIAAUsACDIVHB0QEQ4YAAsuAzs7AAoPKToKDgAHMx8SGQUvMQABSAALORoVGCQgERcCAxoACAU3ABEXAgMaAAsFGDcAERcCAxoUCgABSQAGOA8LGBsPAAYYLwsYGw8AAU4ABD8QHAUAAU8ABSkbCQ4BAAFMAAktCh8eDgMHCw8AAU0ADT4TGjQsGQMaFA0FHhkAFz4TGjQsGQMaFA0FHhk1NBkCHgUbGBEPAAFCABg9GgkjIAEmOgUHDQ8eFSU5DggJAwEcAwUAAUMAAUAAAUEADQEtFw0FBwtdWxQTGSAACBwrAxUPBR4ZAAkqGgUDAwMVEQ0ACC4DJD8eAx8RAAQ5GhUYAAFGAAAABjYRExELBAACWhgAAVoAQAg/PTw0NxcQPCQ5C3JZEBs9fkcnDRcUAXZia0Q4EhQgXHojMBY3MWVCNT0uDhMXcGQ7AUFPHigkQUwQFkhaAkEACjkTEQspNBMZPC0ABjkTEQsrLQ==");
    var b = function(e) {
        return __g._encrypt(encodeURIComponent(e))
    };
    exports.ENCRYPT_VERSION = A,
    exports.default = b

python里执行js的库叫做 PyExecJS,需要下载

pip install  PyExecJS

不过还没完,咱node.js中的运行环境和浏览器的运行环境是不一样的,浏览器中会有document,window对象,咱node.js的运行环境就一个单纯的jsv8引擎

所以,我们需要依赖一个node.js的库jsdom来模拟浏览器环境,这也需要下载:

npm install jsdom

然后在js代码上面加上一段代码:

const jsdom = require("jsdom");
const { JSDOM } = jsdom;
const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);
window = dom.window;
document = window.document;
XMLHttpRequest = window.XMLHttpRequest;
atob = window.atob

说到这儿,知乎的登录过程咱就分析完毕了,所有的准备工作也搞定了,那剩下的就是写代码来模拟登录啦

1.3 代码部分

1.3.1 导包

from requests_html import HTMLSession     #请求解析库
import base64                             #base64解密加密库
from PIL import Image                     #图片处理库
import hmac                               #加密库
from hashlib import sha1                  #加密库
import time                               
from urllib.parse import urlencode        #url编码库
import execjs                             #python调用node.js

1.3.2 定义爬虫类

联系管理员微信tutu19192010,注册账号

上一篇
下一篇
Copyright © 2022 Egon的技术星球 egonlin.com 版权所有 青浦区尚茂路798弄 联系方式-13697081366