dongsi8812 2015-02-16 16:23
浏览 21
已采纳

通过php一次性将图像从URl列表复制到我的服务器

I have big list of Urls in html file for images something like this :

<a href="http://example.com/image1.jpg">image1</a>
<a href="http://example.com/image2.jpg">image2</a>
<a href="http://example.com/image3.jpg">image3</a>
<a href="http://example.com/image4.jpg">image4</a>
<a href="http://example.com/image5.jpg">image5</a>
<a href="http://example.com/image6.jpg">image6</a>
<a href="http://example.com/image7.jpg">image7</a>

Around 50,000 Image

I want to make small script that can copy all images to my server so i can have them in :

http://Mywebsite.com/images/image1.jpg
http://Mywebsite.com/images/image1.jpg
http://Mywebsite.com/images/image1.jpg
...

I want to make loop and each Url in the list must be deleted after the image is copied successfully because sometimes if page crush on loading or something i can continue my loop without overwriting or reading again , if there is a better solution to not overwrite and read the url again please tell me.

  • 写回答

2条回答 默认 最新

  • drzyeetvt41077335 2015-02-16 17:27
    关注

    I would create a script that reads your html file line per line.
    You can do that using fopen and fgets.

    fopen("path/to/some/file", "r");
    while ( ( $line = fgets( $handle ) ) !== false ) 
    {
        // do somehting with $line
    }
    

    This way the file gets not simply parsed into memory, so you don't have to worry about size

    Then after parsing every line I would write down a lock file containing the current line number / index. So if your script crashes and you restart it the iteration simply skips every line until it's current index is higher than the index from the lock file.

    the script

    It might work but, in the end should not simply copy paste everything. But i hope it helps you finding your solution.

    #!/usr/bin/env php
    <?php
    // I DID NOT TEST THIS! 
    // but it should work.
    
    $handle = fopen("path/to/the/html/file/containing/the/urls.html", "r");
    $storage = "path/where/you/want/your/images/";
    $lockFile = __DIR__.'/index.lock';
    $index = 0;
    
    // get the lock index
    if ( !file_exists( $lockFile ) )
    {
        file_put_contents( $lockFile, 0 );
    }
    
    // load the current index
    $start = file_get_contents( $lockFile );
    
    if ( $handle ) 
    {
        // line by line step by step
        while ( ( $line = fgets( $handle ) ) !== false ) 
        {
            // update the 
            $index++;
    
            if ( $start > $index )
            {
                continue;
            }
    
            // match the url from the element
            preg_match( '/<a href="(.+)">/', $line, $url ); $url = $url[1];
    
            $file = basename( $url );
    
            // check if the file already exists 
    
            if ( !file_exists( $storage.$file )) //edited 
            {
                file_put_contents( $storage.$file, file_get_contents( $url ) );
            }
    
            // update the lock file
            file_put_contents( $lockFile, $index );
        }
    
        fclose($handle);
    } 
    else 
    {
        throw new Exception( 'Could not open file.' );
    } 
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么
  • ¥15 banner广告展示设置多少时间不怎么会消耗用户价值
  • ¥16 mybatis的代理对象无法通过@Autowired装填
  • ¥15 可见光定位matlab仿真
  • ¥15 arduino 四自由度机械臂
  • ¥15 wordpress 产品图片 GIF 没法显示
  • ¥15 求三国群英传pl国战时间的修改方法
  • ¥15 matlab代码代写,需写出详细代码,代价私
  • ¥15 ROS系统搭建请教(跨境电商用途)