I'm trying to get the plain text from this webpage: https://html2-f.scribdassets.com/55ssxtbbb45pk2eg/pages/319-42c28ee981.jsonp which upon inspection is a callback function that inserts HTML. I'm trying to scrape the page and reformat the text to be comprehensive and actually display the HTML instead of it being plain text.
PHP:
echo file_get_contents("https://html2-f.scribdassets.com/55ssxtbbb45pk2eg/pages/319-42c28ee981.jsonp");
The returning text is a complete mess
����X321-5db7e88872.jsonp�Y]n�6���E�ıH�;��E�@���b�PM��%�f#K�H��}�;�z���:�eG"e��:@�E����j��XޖdJ���$�&$~����>a�8#��p�ӥy��X��8�r��(#kZ���85�j�A�%��������Ȇ�...
Whereas it should look like this:
"<div class=\"newpage\" id=\"page319\" style=\"width: 902px; height:1167px\">
<div class=text_layer style=\"z-index:2\"><div class=ie_fix>
<div class=\"ff81\" style=\"font-size:114px\">
<span class=a style=\"left:331px;top:75px;color:#ffffff\">1<span class=w9></span>3</span></div>...
Although I could manually copy/paste the text from the webpage into a text editor for future usage, I would like to eliminate this step as I'll need to do this for 320 pages.
Is there some work around for .jsonp urls? Or is the data encrypted by the server? (I just don't know)
</div>