I have a small script that retrieves urls to download pdfs from http://gen.lib.rus.ec/. Once the url is retrieved, I have the following code to download the pdf:
const download = require('download');
const downloadFile = (url) => {
download(url, 'dist').then(() => {
console.log('done!');
});
};
The problem is what gets saved is a .php page with html content and not the actual pdf. The same link on the browser: sometimes redirects to a page with ads and a download link for the PDF, and at other times directly downloads the pdf.
example link:
http://libgen.io/get.php?md5=bb6aa2785236bbbd575f98a6a8c942fc
(sometimes) this redirects to:
http://libgen.io/ads.php?md5=bb6aa2785236bbbd575f98a6a8c942fc
So from the looks of it, my app seem to always retrieve the ads.php
link although get.php
is fed into the downloadFile
function. I can parse the retrieved php page, find the download URL (refer below url) and download the pdf but is there a better way to directly initiate the pdf download programatically?
URL in retrieved html file that downloads the pdf:
http://libgen.io/get.php?md5=f49b28272e405bf000fe72e5d93c4ea0&key=MBJ0NVLFPDGELI3R
edit
get.php file
<!doctype html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Library Genesis</title>
<style>
#message {
width: 400px;
margin: 0px auto;
padding: 10px 20px;
text-align: center;
background-color: #0f9d58;
color: #fff;
border-radius: 3px;
}
</style>
<script src="/jquery-latest.min.js"></script>
<script src="/clipboard.min.js"></script>
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script></head><body><script>
$(document).ready(function() {
function appendMessage(argument) {
var div = $('<div>').attr('id', 'message').text('Ad block is installed and active. Please support us by disabling it.');
var add = $('table').before(div);
}
setTimeout(function(){
if($("ins").css('display') == "none") {
appendMessage();
}
}, 500);
});
</script><table width=1000 align="center" border=0>
<tr>
<td align="left" valign="top" bgcolor="#F5F6CE" rowspan=2><font size=2 color="grey">Advertising:</font><br><div id="ads10">
<!-- gen_sky3 -->
<ins class="adsbygoogle"
style="display:inline-block;width:300px;height:600px"
data-ad-client="ca-pub-1513624324396300"
data-ad-slot="9137517849"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div></td>
<td rowspan=2 valign="top" bgcolor="#F5F6CE"><font size=2 color="grey">BitCoin: <a href="bitcoin:1PiZNj7uhejkdMg5ycbqdGduKKafy8eubm?label=libgen">1PiZNj7uhejkdMg5ycbqdGduKKafy8eubm</a></font><br><div id="ads9">
<!-- gen_sky2 -->
<ins class="adsbygoogle"
style="display:inline-block;width:300px;height:600px"
data-ad-client="ca-pub-1513624324396300"
data-ad-slot="3422388247"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div></td>
<td align='center' rowspan=2 valign='top'><a href='http://libgen.io/get.php?md5=f49b28272e405bf000fe72e5d93c4ea0&key=ON32H85E8S3NRX3D'><h2>GET</h2></a></br><a href='/book/index.php?md5=F49B28272E405BF000FE72E5D93C4EA0&oftorrent='>Download via torrent </a><input id="textarea-example" value="John E. Jackson-The Biology of Apples and Pears (The Biology of Horticultural Crops)-Cambridge University Press (2003).pdf" type="text" size="9"><button class="btn-clipboard" data-clipboard-target="#textarea-example"> (need rename file)</button><script>new Clipboard(".btn-clipboard");</script></br><textarea rows='13' name='bibtext' id='bibtext' readonly cols='40'>@book{book:222949,
title = {The Biology of Apples and Pears (The Biology of Horticultural Crops)},
author = {John E. Jackson},
publisher = {Cambridge University Press},
isbn = {0521380189,9780521380188,9780511067464,0511067461},
year = {2003},
series = {},
edition = {},
volume = {},
url = {http://gen.lib.rus.ec/book/index.php?md5=f49b28272e405bf000fe72e5d93c4ea0}}</textarea>
<br><img src='https://chart.googleapis.com/chart?chs=300x300&cht=qr&chl=http%3A%2F%2Fgen.lib.rus.ec%2Fbook%2Findex.php%3Fmd5%3Df49b28272e405bf000fe72e5d93c4ea0&choe=UTF-8' title='Link to Libgen' /></td>
<td align="left" valign="top" bgcolor="#F5F6CE" rowspan=2><div id="ads18">
<!-- gen_sky_160x600_2 -->
<ins class="adsbygoogle"
style="display:inline-block;width:160px;height:600px"
data-ad-client="ca-pub-1513624324396300"
data-ad-slot="4896944640"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div></td>
</tr>
</table></body></html>