dos3018 2014-05-20 16:17
浏览 43
已采纳

我无法从usaspending.gov api中获取工作

I'm getting errors while scraping data from usaspending.gov can I can't figure out why. I've checked that my php settings are all open and even setup a test scrape of another random site url.

I took another step to include options with the method and useragent.

I suspect it's timing out, but if that's not it, I'm not sure what else to try to get this to work. Every other url I try, I have no problem getting into. If anyone has any suggestions, I'd love to read them!!

Here's my sample code.

$opts = array(
      'http'=>array(
        'method'=>"GET",
        'user_agent'=>"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8",
        'timeout'=>60
      )
    );

    $context = stream_context_create($opts);
    $test = file_get_contents('http://www.usaspending.gov/fpds/fpds.php?state=MI&detail=c&fiscal_year=2013',false,$context);

I'll also add, I've tried this with fopen, file_get_contents, and simplexml_load_file with no luck. I've tried it with the extended options on fopen and file_get_contents, no change. I'm sure I'm missing something small, just can't figure out what it is.

Edit: Here's the error message

Warning: file_get_contents(http://www.usaspending.gov/fpds/fpds.php?state=MI&detail=c&fiscal_year=2013) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in...

Additionally, the link works I'm trying to open, if you copy/paste it into your browser, you should get the download.

  • 写回答

1条回答 默认 最新

  • duanji8615 2014-11-14 03:18
    关注

    After beating my head against this same wall for a while, I used a curl method (How to get the real URL after file_get_contents if redirection happens?) to find where the basic API URL was redirecting and that seems to be working now!

    Instead of getting your same error message with:

    file_get_contents(http://www.usaspending.gov/fpds/fpds.php?detail=c&fiscal_year=2013&state=AL&max_records=1000&records_from=0)
    

    It is now working for me with:

    file_get_contents(http://www.usaspending.gov/api/fpds_api_complete.php?fiscal_year=2013&vendor_state=AL&Contracts=c&sortby=OBLIGATED_AMOUNT%2Bdesc&records_from=0&max_records=20&sortby=OBLIGATED_AMOUNT+desc)
    

    So pretty much using this as my base URL to access the API with more parameters added on (with the "Contracts" parameter replacing the original "detail" parameter):

    http://www.usaspending.gov/api/fpds_api_complete.php?Contracts=c&sortby=OBLIGATED_AMOUNT%2Bdesc&sortby=OBLIGATED_AMOUNT+desc

    I hope this helps, and works for you too!

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 delphi webbrowser组件网页下拉菜单自动选择问题
  • ¥15 wpf界面一直接收PLC给过来的信号,导致UI界面操作起来会卡顿
  • ¥15 init i2c:2 freq:100000[MAIXPY]: find ov2640[MAIXPY]: find ov sensor是main文件哪里有问题吗
  • ¥15 运动想象脑电信号数据集.vhdr
  • ¥15 三因素重复测量数据R语句编写,不存在交互作用
  • ¥15 微信会员卡等级和折扣规则
  • ¥15 微信公众平台自制会员卡可以通过收款码收款码收款进行自动积分吗
  • ¥15 随身WiFi网络灯亮但是没有网络,如何解决?
  • ¥15 gdf格式的脑电数据如何处理matlab
  • ¥20 重新写的代码替换了之后运行hbuliderx就这样了