douren9077 2017-12-26 20:02
浏览 340

NodeJS - 文件下载URL重定向和检索html页面

I have a small script that retrieves urls to download pdfs from http://gen.lib.rus.ec/. Once the url is retrieved, I have the following code to download the pdf:

   const download = require('download');

    const downloadFile = (url) => {
      download(url, 'dist').then(() => {
        console.log('done!');
      });
    }; 

The problem is what gets saved is a .php page with html content and not the actual pdf. The same link on the browser: sometimes redirects to a page with ads and a download link for the PDF, and at other times directly downloads the pdf.

example link:

http://libgen.io/get.php?md5=bb6aa2785236bbbd575f98a6a8c942fc

(sometimes) this redirects to:

http://libgen.io/ads.php?md5=bb6aa2785236bbbd575f98a6a8c942fc

So from the looks of it, my app seem to always retrieve the ads.php link although get.php is fed into the downloadFile function. I can parse the retrieved php page, find the download URL (refer below url) and download the pdf but is there a better way to directly initiate the pdf download programatically?

URL in retrieved html file that downloads the pdf:

http://libgen.io/get.php?md5=f49b28272e405bf000fe72e5d93c4ea0&key=MBJ0NVLFPDGELI3R

edit

get.php file

<!doctype html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Library Genesis</title>

<style>
#message {
    width: 400px;
    margin: 0px auto;
    padding: 10px 20px;
    text-align: center; 
    background-color: #0f9d58;
    color: #fff;
    border-radius: 3px;
}
</style>
<script src="/jquery-latest.min.js"></script>
<script src="/clipboard.min.js"></script>
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script></head><body><script>
$(document).ready(function() {
    function appendMessage(argument) {
        var div = $('<div>').attr('id', 'message').text('Ad block is installed and active. Please support us by disabling it.');
        var add = $('table').before(div);
    }
    setTimeout(function(){
        if($("ins").css('display') == "none") {
            appendMessage();
        }
    }, 500);
});
</script><table width=1000 align="center" border=0>
<tr>
<td align="left" valign="top" bgcolor="#F5F6CE" rowspan=2><font size=2 color="grey">Advertising:</font><br><div id="ads10">
<!-- gen_sky3 -->
<ins class="adsbygoogle"
     style="display:inline-block;width:300px;height:600px"
     data-ad-client="ca-pub-1513624324396300"
     data-ad-slot="9137517849"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div></td>
<td rowspan=2 valign="top" bgcolor="#F5F6CE"><font size=2 color="grey">BitCoin: <a href="bitcoin:1PiZNj7uhejkdMg5ycbqdGduKKafy8eubm?label=libgen">1PiZNj7uhejkdMg5ycbqdGduKKafy8eubm</a></font><br><div id="ads9">
<!-- gen_sky2 -->
<ins class="adsbygoogle"
     style="display:inline-block;width:300px;height:600px"
     data-ad-client="ca-pub-1513624324396300"
     data-ad-slot="3422388247"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div></td>
<td align='center' rowspan=2 valign='top'><a href='http://libgen.io/get.php?md5=f49b28272e405bf000fe72e5d93c4ea0&key=ON32H85E8S3NRX3D'><h2>GET</h2></a></br><a href='/book/index.php?md5=F49B28272E405BF000FE72E5D93C4EA0&oftorrent='>Download via torrent </a><input id="textarea-example" value="John E. Jackson-The Biology of Apples and Pears (The Biology of Horticultural Crops)-Cambridge University Press (2003).pdf" type="text" size="9"><button class="btn-clipboard" data-clipboard-target="#textarea-example"> (need rename file)</button><script>new Clipboard(".btn-clipboard");</script></br><textarea rows='13' name='bibtext' id='bibtext' readonly cols='40'>@book{book:222949,
   title =     {The Biology of Apples and Pears (The Biology of Horticultural Crops)},
   author =    {John E. Jackson},
   publisher = {Cambridge University Press},
   isbn =      {0521380189,9780521380188,9780511067464,0511067461},
   year =      {2003},
   series =    {},
   edition =   {},
   volume =    {},
   url =       {http://gen.lib.rus.ec/book/index.php?md5=f49b28272e405bf000fe72e5d93c4ea0}}</textarea>
<br><img src='https://chart.googleapis.com/chart?chs=300x300&cht=qr&chl=http%3A%2F%2Fgen.lib.rus.ec%2Fbook%2Findex.php%3Fmd5%3Df49b28272e405bf000fe72e5d93c4ea0&choe=UTF-8' title='Link to Libgen' /></td>
<td align="left" valign="top"  bgcolor="#F5F6CE" rowspan=2><div id="ads18">
<!-- gen_sky_160x600_2 -->
<ins class="adsbygoogle"
     style="display:inline-block;width:160px;height:600px"
     data-ad-client="ca-pub-1513624324396300"
     data-ad-slot="4896944640"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div></td>
</tr>
</table></body></html>
  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 没有证书,nginx怎么反向代理到只能接受https的公网网站
    • ¥50 成都蓉城足球俱乐部小程序抢票
    • ¥15 yolov7训练自己的数据集
    • ¥15 esp8266与51单片机连接问题(标签-单片机|关键词-串口)(相关搜索:51单片机|单片机|测试代码)
    • ¥15 电力市场出清matlab yalmip kkt 双层优化问题
    • ¥30 ros小车路径规划实现不了,如何解决?(操作系统-ubuntu)
    • ¥20 matlab yalmip kkt 双层优化问题
    • ¥15 如何在3D高斯飞溅的渲染的场景中获得一个可控的旋转物体
    • ¥88 实在没有想法,需要个思路
    • ¥15 MATLAB报错输入参数太多