dquv73115 2010-12-28 06:07
浏览 84

刮刮IMDB的250强名单给出了一些外语结果?

I'm having my server grab this page to download the full list for a movie analysis I'm doing: http://www.imdb.com/chart/top

But when it does a lot of the movie titles are appearing in another language. For example instead of saying The Shawshank Redemption it's giving me: Побег из Шоушенка

A simple file_get_contents in PHP is the fastest way to reproduce, though I'm using curl

Anyone have any ideas for what's going on, how to fix?

UPDATE: IMDB might be interpreting my server as being in another country for some strange reason. Is there any way to enforce it as being in the US?

  • 写回答

2条回答 默认 最新

  • doujianguang5506 2010-12-30 14:21
    关注

    I know how to deal with this in the Windows environment. You may borrow the same idea for your server OS.

    In Windows with a WebBrowser control, you can use menu View -> Encoding to select whatever language that shows the text properly, then when you grab the source page from the browser control, it will be in the correct coding.

    You may find the IRobotSoft web scraper easy to use for your movie analysis, which runs in Windows platform only.

    评论

报告相同问题?

悬赏问题

  • ¥15 用hfss做微带贴片阵列天线的时候分析设置有问题
  • ¥50 我撰写的python爬虫爬不了 要爬的网址有反爬机制
  • ¥15 Centos / PETSc / PETGEM
  • ¥15 centos7.9 IPv6端口telnet和端口监控问题
  • ¥120 计算机网络的新校区组网设计
  • ¥20 完全没有学习过GAN,看了CSDN的一篇文章,里面有代码但是完全不知道如何操作
  • ¥15 使用ue5插件narrative时如何切换关卡也保存叙事任务记录
  • ¥20 海浪数据 南海地区海况数据,波浪数据
  • ¥20 软件测试决策法疑问求解答
  • ¥15 win11 23H2删除推荐的项目,支持注册表等