douqujin2767 2014-10-16 19:44
浏览 37
已采纳

获取xml的gestis数据库

i try to get the (not xml aparently) content of this website: http://gestis.itrust.de/nxt/gateway.dll/gestis_de/010520.xml?f=templates$fn=default-doc.htm$3.0 via curl or file_get_contents in php.

you can open the website in any browser but whenever i try to open it with php to get the content automated it will return a 500 error.

here is the code used:

<?php

/* gets the data from a URL */
function get_data($url) {
    $ch = curl_init();
    $timeout = 5;
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}

$returned_content = get_data('http://gestis.itrust.de/nxt/gateway.dll/gestis_de/010520.xml?f=templates$fn=default-doc.htm$3.0');

echo $returned_content;

?>

does anybody have an idea how to get the xml via php from this website?

  • 写回答

1条回答 默认 最新

  • douding_1073 2014-10-17 01:14
    关注

    The website you want to open needs the vid=gestisdeu:sdbdeu value in form of a cookie to work:

    Cookie: nxt/gateway.dll/vid=gestisdeu%3Asdbdeu;
    

    Please consult the curl documentation how you can set cookies or take a look into the existing material that is already on this webiste, for example Is it possible to set the cookie content with CURL? and the like.

    Take care that depending on website and their configuration changes this might become different. So technically your question can't be really answered, because that website doesn't have any documentation of it's HTTP request requirements. So you need to find out on your own and provide those if you ask such a question.

    PHP Example:

    $url = 'http://gestis.itrust.de/nxt/gateway.dll/gestis_de/010520.xml?f=templates$fn=default-doc.htm$3.0';
    $options['http'] = ['header' => 'Cookie: nxt/gateway.dll/vid=gestisdeu%3Asdbdeu;'];
    stream_context_set_default($options);
    $content = file_get_contents($url);
    var_dump($content);
    

    Output:

    string(104975) "<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
    <html>
    <head>
    <title>DGUV-IFA GESTIS</title>
        <meta http-equiv="content-type" content="text/html;charset=utf-8">
    </head>
    <body>
        <html>
    <head>
    <META http-equiv="Content-Type" content="text/html">
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <link rel="stylesheet" href="/nxt/gateway.dll/gestis_de/010520.xml?f=stylesheets$fn=gestis-doc.css$up=1$3.0" type="text/css">
    <"...
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 #MATLAB仿真#车辆换道路径规划
  • ¥15 java 操作 elasticsearch 8.1 实现 索引的重建
  • ¥15 数据可视化Python
  • ¥15 要给毕业设计添加扫码登录的功能!!有偿
  • ¥15 kafka 分区副本增加会导致消息丢失或者不可用吗?
  • ¥15 微信公众号自制会员卡没有收款渠道啊
  • ¥100 Jenkins自动化部署—悬赏100元
  • ¥15 关于#python#的问题:求帮写python代码
  • ¥20 MATLAB画图图形出现上下震荡的线条
  • ¥15 关于#windows#的问题:怎么用WIN 11系统的电脑 克隆WIN NT3.51-4.0系统的硬盘