dsa99349 2018-06-22 17:45
浏览 74
已采纳

RegEX(preg_match_all)从隐藏登录表单中检索真实性令牌

Short in short I'm trying to use CURL to log-in to the eCommerce Platform Bonanza so that I can auto-print new orders that come in.

I searched on GitHub and found an auto-login script for twitter here which looks extremely similar to how Bonanza operates.

The login page I'm trying to execute my curl request first off is located here

It includes a form that POSTS the following variables to log you in

utf8: ✓
authenticity_token: 0tMPrfH0+Tt7z05jxu61pN10RveVp6o0dsfgf=4cS6g7kyeMsztpDmWj2P1ZYasfdf3QjNl/og==
username: myusername
password: mypassword
commit: Log in

Viewing the source for the form you can see the name="authenticity_token" and value= of the token I need to retrieve.

 <form class="user_session_form"
 action="https://www.bonanza.com/sessions" accept-charset="UTF-8"
 method="post"><input name="utf8" type="hidden" value="&#x2713;"
 /><input type="hidden" name="authenticity_token" value="siKgYUtSqTs8DHCXmj8gbV6Gp3L7gaQ9C/B0rLM9/V94+FnSxTb+x6vXADSFROCxxMLB3RAqOMeL/IJQADq6dk8A=="
 />

As stated, this seems to be very similiar to how the Twitter log-in script works, as it finds the Authentication Token, and makes a POST request to https://twitter.com/sessions with the variables passed to successfully login.

The twitter script uses this preg_match_all function to obtain the authentication token

function ara($ilk, $son, $text) {
    @preg_match_all('/' . preg_quote($ilk, '/') .
    '(.*?)'. preg_quote($son, '/').'/i', $text, $m);
    return @$m[1];
}

And here is how the function is used to get the authentication token...

$baslik = ara("<input type=\"hidden\" value=", "\" name=\"authenticity_token\">", $html);

note ($html) is the curl exec for the login page.

So to again to summarize, https://www.bonanza.com/home/login takes the following formdata to log-in:

utf8=%E2%9C%93&authenticity_token=SFrh%2FvFx7%2BH%2FA3kMQ2WEfZ23423AlbtP3bfT%2FaxQw7CwlgeUz5BBTMgtU7eHb%2BqyTnxs1TC30h64mT98mvA%3D%3D&username=myusername&password=mypassword&commit=Log+in

Makes a POST with these variables to https://www.bonanza.com/sessions to successfully log-in.

I'm trying to modify the twitter script the best I can, here's what I have thus far:

$username = "example@stackoverflow.com";
$password = "password"; 

$ch = curl_init();
$rand = rand(1,99999);
$cookie =  $_SERVER['DOCUMENT_ROOT'] . "/cookie-$rand.txt";
$sTarget = "https://www.bonanza.com/home/login";
curl_setopt($ch, CURLOPT_URL, $sTarget);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_REFERER, "https://www.bonanza.com/home/login");
$html = curl_exec($ch);
preg_match_all('/' . preg_quote("<input type=\"hidden\" value=", '/') .
'(.*?)'. preg_quote("\" name=\"authenticity_token\">", '/').'/i', $html, $m);

// Not Working.. Need to retrieve $authtoken in $m preg_match_all array output

$sPost = "utf8=%E2%9C%93&authenticity_token=$authtoken&username=$username&password=$password&commit=Log+in";
$sTarget = "https://www.bonanza.com/sessions";
curl_setopt($ch, CURLOPT_URL, $sTarget);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $sPost);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, false);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-type: application/x-www-form-urlencoded"));
curl_exec($ch);

I've tried to debug and see if there is any output for $m in the preg_match_all call, but the output is an empty array

Array
(
    [0] => Array
        (
        )

    [1] => Array
        (
        )

)

How can I modify my preg_match_all call (or another method) to retrieve the authenticity token required for a successful form log-in submit, and is there anything else I should be aware of to log-in programatically via CURL in this way?

  • 写回答

3条回答 默认 最新

  • doulanli6146 2018-06-22 19:52
    关注

    You could use this regex to get the authenticity token.
    It comes out in capture group 4.

    It doesn't matter the order of the attribute-values, this gets them
    anywhere in the valid input tag.

    (?s)<input(?=\s)(?=(?:[^>"']|"[^"]*"|'[^']*')*?\stype\s*=\s*(?:(['"])\s*hidden\s*\1))(?=(?:[^>"']|"[^"]*"|'[^']*')*?\sname\s*=\s*(?:(['"])\s*authenticity_token\s*\2))(?=(?:[^>"']|"[^"]*"|'[^']*')*?\svalue\s*=\s*(?:(['"])\s*(.*?)\s*\3))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>

    https://regex101.com/r/NCjFxc/1

    Quoting

    Single, Tilde as regex delimiter:
    '~(?s)<input(?=\s)(?=(?:[^>"\']|"[^"]*"|\'[^\']*\')*?\stype\s*=\s*(?:([\'"])\s*hidden\s*\1))(?=(?:[^>"\']|"[^"]*"|\'[^\']*\')*?\sname\s*=\s*(?:([\'"])\s*authenticity_token\s*\2))(?=(?:[^>"\']|"[^"]*"|\'[^\']*\')*?\svalue\s*=\s*(?:([\'"])\s*(.*?)\s*\3))\s+(?:"[\S\s]*?"|\'[\S\s]*?\'|[^>]*?)+>~'

    Double, Tilde as regex delimiter:
    "~(?s)<input(?=\\s)(?=(?:[^>\"']|\"[^\"]*\"|'[^']*')*?\\stype\\s*=\\s*(?:(['\"])\\s*hidden\\s*\\1))(?=(?:[^>\"']|\"[^\"]*\"|'[^']*')*?\\sname\\s*=\\s*(?:(['\"])\\s*authenticity_token\\s*\\2))(?=(?:[^>\"']|\"[^\"]*\"|'[^']*')*?\\svalue\\s*=\\s*(?:(['\"])\\s*(.*?)\\s*\\3))\\s+(?:\"[\\S\\s]*?\"|'[\\S\\s]*?'|[^>]*?)+>~"

    Readable version

     (?s)
    
     # Begin Input tag
     < input                # input tag
    
     (?= \s )
     (?=                    # Type Hidden (a pseudo atomic group)
          (?: [^>"'] | " [^"]* " | ' [^']* ' )*?
          \s type \s* = \s*      # Type
          (?:
               ( ['"] )               # (1), Quote
               \s* hidden \s*         # Hidden
               \1 
          )
     )
     (?=                    # Name authenticity_token
          (?: [^>"'] | " [^"]* " | ' [^']* ' )*?
          \s name \s* = \s*      # Name
          (?:
               ( ['"] )               # (2), Quote
               \s* authenticity_token \s*   # "Authenticity Token"
               \2 
          )
     )
     (?=                    # Value of authenticity_token
          (?: [^>"'] | " [^"]* " | ' [^']* ' )*?
          \s value \s* = \s*     # Value
          (?:
               ( ['"] )               # (3), Quote
               \s* 
               ( .*? )                # (4), Authenticity Token Value 
               \s* 
               \3 
          )
     )
     # Have the Authenticity Token, just match the rest of tag
     \s+ 
     (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
    
     >                      # End tag
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 升腾威讯云桌面V2.0.0摄像头问题
  • ¥15 关于Python的会计设计
  • ¥15 聚类分析 设计k-均值算法分类器,对一组二维模式向量进行分类。
  • ¥15 stm32c8t6工程,使用hal库
  • ¥100 有偿求易语言word文档取doc和docx页数方法或模块
  • ¥15 找能接spark如图片的,可议价
  • ¥15 关于#单片机#的问题,请各位专家解答!
  • ¥15 博通raid 的写入速度很高也很低
  • ¥15 目标计数模型训练过程中的问题
  • ¥100 Acess连接SQL 数据库后 不能用中文筛选