douren9077 2017-12-26 20:02
浏览 340

NodeJS - 文件下载URL重定向和检索html页面

I have a small script that retrieves urls to download pdfs from http://gen.lib.rus.ec/. Once the url is retrieved, I have the following code to download the pdf:

   const download = require('download');

    const downloadFile = (url) => {
      download(url, 'dist').then(() => {
        console.log('done!');
      });
    }; 

The problem is what gets saved is a .php page with html content and not the actual pdf. The same link on the browser: sometimes redirects to a page with ads and a download link for the PDF, and at other times directly downloads the pdf.

example link:

http://libgen.io/get.php?md5=bb6aa2785236bbbd575f98a6a8c942fc

(sometimes) this redirects to:

http://libgen.io/ads.php?md5=bb6aa2785236bbbd575f98a6a8c942fc

So from the looks of it, my app seem to always retrieve the ads.php link although get.php is fed into the downloadFile function. I can parse the retrieved php page, find the download URL (refer below url) and download the pdf but is there a better way to directly initiate the pdf download programatically?

URL in retrieved html file that downloads the pdf:

http://libgen.io/get.php?md5=f49b28272e405bf000fe72e5d93c4ea0&key=MBJ0NVLFPDGELI3R

edit

get.php file

<!doctype html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Library Genesis</title>

<style>
#message {
    width: 400px;
    margin: 0px auto;
    padding: 10px 20px;
    text-align: center; 
    background-color: #0f9d58;
    color: #fff;
    border-radius: 3px;
}
</style>
<script src="/jquery-latest.min.js"></script>
<script src="/clipboard.min.js"></script>
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script></head><body><script>
$(document).ready(function() {
    function appendMessage(argument) {
        var div = $('<div>').attr('id', 'message').text('Ad block is installed and active. Please support us by disabling it.');
        var add = $('table').before(div);
    }
    setTimeout(function(){
        if($("ins").css('display') == "none") {
            appendMessage();
        }
    }, 500);
});
</script><table width=1000 align="center" border=0>
<tr>
<td align="left" valign="top" bgcolor="#F5F6CE" rowspan=2><font size=2 color="grey">Advertising:</font><br><div id="ads10">
<!-- gen_sky3 -->
<ins class="adsbygoogle"
     style="display:inline-block;width:300px;height:600px"
     data-ad-client="ca-pub-1513624324396300"
     data-ad-slot="9137517849"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div></td>
<td rowspan=2 valign="top" bgcolor="#F5F6CE"><font size=2 color="grey">BitCoin: <a href="bitcoin:1PiZNj7uhejkdMg5ycbqdGduKKafy8eubm?label=libgen">1PiZNj7uhejkdMg5ycbqdGduKKafy8eubm</a></font><br><div id="ads9">
<!-- gen_sky2 -->
<ins class="adsbygoogle"
     style="display:inline-block;width:300px;height:600px"
     data-ad-client="ca-pub-1513624324396300"
     data-ad-slot="3422388247"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div></td>
<td align='center' rowspan=2 valign='top'><a href='http://libgen.io/get.php?md5=f49b28272e405bf000fe72e5d93c4ea0&key=ON32H85E8S3NRX3D'><h2>GET</h2></a></br><a href='/book/index.php?md5=F49B28272E405BF000FE72E5D93C4EA0&oftorrent='>Download via torrent </a><input id="textarea-example" value="John E. Jackson-The Biology of Apples and Pears (The Biology of Horticultural Crops)-Cambridge University Press (2003).pdf" type="text" size="9"><button class="btn-clipboard" data-clipboard-target="#textarea-example"> (need rename file)</button><script>new Clipboard(".btn-clipboard");</script></br><textarea rows='13' name='bibtext' id='bibtext' readonly cols='40'>@book{book:222949,
   title =     {The Biology of Apples and Pears (The Biology of Horticultural Crops)},
   author =    {John E. Jackson},
   publisher = {Cambridge University Press},
   isbn =      {0521380189,9780521380188,9780511067464,0511067461},
   year =      {2003},
   series =    {},
   edition =   {},
   volume =    {},
   url =       {http://gen.lib.rus.ec/book/index.php?md5=f49b28272e405bf000fe72e5d93c4ea0}}</textarea>
<br><img src='https://chart.googleapis.com/chart?chs=300x300&cht=qr&chl=http%3A%2F%2Fgen.lib.rus.ec%2Fbook%2Findex.php%3Fmd5%3Df49b28272e405bf000fe72e5d93c4ea0&choe=UTF-8' title='Link to Libgen' /></td>
<td align="left" valign="top"  bgcolor="#F5F6CE" rowspan=2><div id="ads18">
<!-- gen_sky_160x600_2 -->
<ins class="adsbygoogle"
     style="display:inline-block;width:160px;height:600px"
     data-ad-client="ca-pub-1513624324396300"
     data-ad-slot="4896944640"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div></td>
</tr>
</table></body></html>
  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥30 帮我写一段可以读取LD2450数据并计算距离的Arduino代码
    • ¥15 C#调用python代码(python带有库)
    • ¥15 矩阵加法的规则是两个矩阵中对应位置的数的绝对值进行加和
    • ¥15 活动选择题。最多可以参加几个项目?
    • ¥15 飞机曲面部件如机翼,壁板等具体的孔位模型
    • ¥15 vs2019中数据导出问题
    • ¥20 云服务Linux系统TCP-MSS值修改?
    • ¥20 关于#单片机#的问题:项目:使用模拟iic与ov2640通讯环境:F407问题:读取的ID号总是0xff,自己调了调发现在读从机数据时,SDA线上并未有信号变化(语言-c语言)
    • ¥20 怎么在stm32门禁成品上增加查询记录功能
    • ¥15 Source insight编写代码后使用CCS5.2版本import之后,代码跳到注释行里面