抓取网页会返回加密字符

I have tried quite a few methods of downloading the page below$url = 'https://kat.cr/usearch/life%20of%20pi/'; using PHP. However, I always receive a page with encrypted characters.

I've tried searching for possible solutions prior to posting, and have tried out a few, however, I haven't been able to get any to work yet.

Please see the methods I have tried below and suggest a solution. I am looking for a PHP solution for the same.

Approach 1 - using file_get_contents - returns encrypted characters

<?php
//$contents = file_get_contents($url, $use_include_path, $context, $offset);

include('simple_html_dom.php');

$url = 'https://kat.cr/usearch/life%20of%20pi/';
$html = str_get_html(utf8_encode(file_get_contents($url)));

echo $html;


?>

Approach 2 - using file_get_html - returns encrypted characters

include('simple_html_dom.php');

$url = 'https://kat.cr/usearch/life%20of%20pi/';

$encoded = htmlentities(utf8_encode(file_get_html($url)));
echo $encoded;

?>

Approach 3 - using gzread - returns blank page

<?php

include('simple_html_dom.php');

$url = 'https://kat.cr/usearch/life%20of%20pi/';

$fp = gzopen($url,'r');

$contents = '';

while($html = gzread($fp , 256000))
{
    $contents .= $html;
}

gzclose($fp);

?>

Approach 4 - using gzinflate - returns empty page

<?php

include('simple_html_dom.php');
//function gzdecode($data)
//{
//    return gzinflate(substr($data,10,-8));
//}

//$contents = file_get_contents($url, $use_include_path, $context, $offset);



$url = 'https://kat.cr/usearch/life%20of%20pi/';
$html = str_get_html(utf8_encode(file_get_contents($url)));

echo gzinflate(substr($html,10,-8));


?>

Approach 5 - using fopen and fgets - returns encrypted characters

<?php
$url='https://kat.cr/usearch/life%20of%20pi/';
$handle = fopen($url, "r");

if ($handle)
{
    while (($line = fgets($handle)) !== false)
    {
        echo $line;
    }
}
else
{
    // error opening the file.
    echo "could not open the wikipedia URL!";
}
fclose($handle);
?>

Approach 6 - adding ob_start at the beginning of script - page does not load

<?php
ob_start("ob_gzhandler");

$url = 'https://kat.cr/usearch/life%20of%20pi/';
$handle = fopen($url, "r");

if ($handle)
{
    while (($line = fgets($handle)) !== false)
    {
        echo $line;
    }
}
else
{
    // error opening the file.
    echo "could not open the wikipedia URL!";
}
fclose($handle);
?>

Approach 7 - using curl - returns empty page

<?php    
$url = 'https://kat.cr/usearch/life%20of%20pi/';

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // Define target site
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return page in string
curl_setopt($cr, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2');
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_TIMEOUT,5);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects

$return = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);

$html = str_get_html("$return");
echo $html;

?>

Approach 8 - using R - returns encrypted characters

> thepage = readLines('https://kat.cr/usearch/life%20of%20pi/')
There were 29 warnings (use warnings() to see them)
> thepage[1:5]
[1] "\037‹\b"                                                                                                                                                                                                                                                                                                         
[2] "+SC®\037\035ÕpšÐ\032«F°{¼…àßá$\030±ª\022ù˜ú×Gµ."                                                                                                                                                                                                                                                                
[3] "\023\022&ÒÅdDjÈÉÎŽj\t¹Iê¬©\003ä\fp\024“ä(M<©U«ß×Ðy2\tÈÂæœ8ž\036â!9ª]ûd<¢QR*>öÝdpä’kß!\022?ÙG~è'>\016¤ØÁ\0019Re¥†\0264æ’Ø‰üQâÓ°Ô^—\016\tÂ¡‹\\:\016\003Š]4¤aLiˆ†8ìS\022Ão€'ðÿ\020a;¦Aš`‚<\032!/\"DF=\034'EåX^ÔˆÚ4‰KDCê‡.¹©¡ˆ\004Gµ4&8r\006EÍÄO\002r|šóóZðóú\026?\0274Š ½\030!\týâ;W8Ž‹k‡õ¬™¬ÉÀ\017¯2b1ÓA< \004„š€&J"
[4] "@ƒˆxGµz\035\032Jpâ;²C‡u\034\004’Ñôp«e^*Wz-Óz!ê\022\001èÌI\023ä;LÖ\v›õ‡¸Oâº‡¯Y!\031þ\024-mÍ·‡G#°›„¦Î@º¿ÉùÒò(ìó¶³f\177¤?}\017½<Cæ_eÎ\0276\t\035®ûÄœ\025À}rÌ\005òÃŸ$t}ï/IºM»µ*íÖšh\006\t#kåd³¡€âÈ¹E÷CÌG·!\017ý°èø‡x†ä\a|³&jÇ‡õìè>\016ú\t™aá¾ž[\017—z¹«K¸çeØ¿=/"                                                    
[5] "\035æ\034vÎ÷Gûx?Ú'ûÝý`ßßwö¯v‹bÿFç\177F\177\035±?ÿýß\177þupþ'ƒ\035ösT´°ûï¢<+(Òx°Ó‰\"<‘G\021M(ãEŽ\003pa2¸¬`\aGýtÈFíî.úÏîAQÙ?\032ÉNDpBÎ\002Â"

Approach 9 - using BeautifulSoup (python) - returns encrypted characters

import urllib

htmltext = urllib.urlopen("https://kat.cr/usearch/life%20of%20pi/").read()
print htmltext

Approach 10 - using wget on the linux terminal - gets a page with encrypted characters

wget -O page https://kat.cr/usearch/Monsoon%20Mangoes%20malayalam/

Approach 11 -

tried manually by pasting the url to the below service - works

https://www.hurl.it/

Approach 12 -

    tried manually by pasting the url to the below service - works

https://www.import.io/

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

报告相同问题？

关注问题

java 如何获取动态网页内容,返回字符串 android-studio html5 java
2018-02-09 10:55

回答 10 已采纳兄弟，如果你上面的html文件是字符串的话你可以下载这个demo：http://download.csdn.net/download/baiyuliang2013/9706568 如果你的ht
PHP在没有循环的情况下获取数组中最长的字符 php
2019-07-13 08:49

回答 4 已采纳 You could sort the strings by length using for example usort and get the first item using reset.
delphi 获取webbrowser网页里的字符串?
2018-08-03 14:31

回答 1 已采纳 http://www.delphitop.com/html/kongjian/3762.html 得到的是html，然后可以用正则表达式，或者IndexOf/substring得到其中你要的内容
利用PHP脚本在Linux下用md5函数加密字符串的方法
2021-01-21 16:07

#touch a.php //创建a.php文件 ...一般来说，安装了Linux后，就会有md5sum这个工具，直接在命令行终端直接运行。可以用下面的命令来获取md5sum命令帮助 man md5sum #md5sum –help 有个提示：“With no
php访问国外的一个网页网页抓取json数据 json php
2018-11-15 06:12

回答 1 已采纳问题已解决，token问题，具体看是哪一个token，不同页面的token不同
关于PHP和JAVA之间的AES加密互通问题 java php
2019-03-02 12:04

回答 2 已采纳可以试试下面这种AES加密解密方式看行不行 ``` class AES { /** * * @param string $string 需要加密的字符串
【求助】php如何获取字符串指定的字符 php
2016-01-14 10:04

回答 4 已采纳你给的例子没有代表性，如果时间出现在固定的字符位置，用substr就可以了。如果时间前面的字符长度不固定，就需要用正则表达式 ``` \d{4}\-\d{2}\-\d{2}\s\d{
php使用异或实现的加密解密实例
2020-12-18 23:51

则b=a ^ c (^是异或的意思)，php在处理异或的字符时先把字符转化为二进制的ascii值，对这些值进行异或，获取结果后在将ascii值转化为字符，原理说晚了直接贴实现的代码：复制代码代码如下:echo ‘<meta charset=...
从时间字符串获取小时和分钟 php
2019-05-08 05:14

回答 2 已采纳 Create a DateTime object by passing that string into the constructor and just format it to your li
java网页抓取其中2个字符串 java
2012-06-27 21:13

回答 1 已采纳有空研究研究XPath,你能很轻松的取到页面上所有的信息。 [code="java"]private static void getTrackInfo(String html) throws E
php搜索字符串逗号分隔并获取匹配的元素 php
2016-03-22 14:54

回答 5 已采纳 With preg_match_all you can do like this. Php Code <?php $subject = "IDperson, Inscription,
Php加密固定长度字符串,php函数(加密解密,随机字符串,截取字符串长度,强制下载等)...
2021-05-06 07:39

原来在南边的昨天的博客 } } 复制代码使用方法： //以下是将字符串“helloweba欢迎您”分别加密和解密 //加密： echo encryptdecrypt(‘password’, ‘helloweba欢迎您’,0); //解密： echo encryptdecrypt(‘password’, ‘z0jax4qmwcf+...
PHP CURL 获取高德web API 时返回不全 php
2016-04-04 02:12

回答 2 已采纳你的代码没有问题，可以运行并获得数据，应该是你的文件编码格式出错了，你看看你的文件是不是gbk的
php 获取随机字符,php获取随机字符串
2021-04-08 11:30

樱桃小公举的博客 PHP获取随机字符串实现代码如下：function generateRandomString($length = 10) {$characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';$charactersLength = strlen($characters);$...
php中可能用来加密字符串的函数[base64_encode、urlencode、sha1]
2021-01-20 01:07

登录原理还是蛮复杂的，像我这样以为curl获取页面再post上去的想法真是太单纯了。整理下遇到的价格处理字符串的函数：复制代码代码如下: <?php $encryption = “username”; echo base64_encode ($encryption)...
没有解决我的问题, 去提问

悬赏问题

¥15 关于#python#的问题：求帮写python代码
¥20 MATLAB画图图形出现上下震荡的线条
¥15 LiBeAs的带隙等于0.997eV,计算阴离子的N和P
¥15 关于#windows#的问题：怎么用WIN 11系统的电脑克隆WIN NT3.51-4.0系统的硬盘
¥15 来真人，不要ai！matlab有关常微分方程的问题求解决，
¥15 perl MISA分析p3_in脚本出错
¥15 k8s部署jupyterlab，jupyterlab保存不了文件
¥15 ubuntu虚拟机打包apk错误
¥199 rust编程架构设计的方案有偿
¥15 回答4f系统的像差计算

码龄粉丝数原力等级 --

抓取网页会返回加密字符

0条回答默认最新

悬赏问题

抓取网页会返回加密字符

0条回答 默认 最新

悬赏问题

0条回答默认最新