使用字符串操作解开目录分隔符疯狂？

I'm working on converting a website. It involved standardizing the directory structure of images and media files. I'm parsing path information from various tags, standardizing them, checking to see if the media exists in the new standardized location, and putting it there if it doesn't. I'm using string manipulation to do so.

This is a little open-ended, but is there a class, tool, or concept out there I can use to save myself some headaches? For instance, I'm running into problems where, say, a page in a sudirectory (website.com/subdir/dir/page.php) has relative image paths (../images/image.png), or other kinds of things like this. It's not like there's one overarching problem, but just a lot of little things that add up.

When I think I've got my script covering most cases, then I get errors like Could not find file at export/standardized_folder/proper_image_folderimage.png where it should be export/standardized_folder/proper_image_folder/image.png. It's kind of driving me mad, doing string parsing and checks to make sure that directory separators are in the proper places.

I feel like I'm putting too much work into making a one-off import script very robust. Perhaps someone's already untangled this mess in a re-useable way, one which I can take advantage of?

Post Script: So here's a more in-depth scoop. I write my script that parses one "type" of page and pulls content from the same of its kind. Then I turn my script to parse another type of page, get all knids of errors, and learn that all my assumptions about how paths are referenced must be thrown out the window. Wash, rinse, repeat.

So I'm looking at doing some major re-factoring of my script, throwing out all assumptions, and checking, re-checking, and double-checking path information. Since I'm really trying to build a robust path building script, hopefully I can avoid re-inventing the wheel. Is there a wheel out there?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dth42345 2011-09-16 16:35
关注
If your problems have their root in resolving the relative links from a document and resolve to an absolute one (which should be half the job to map the linked images paths onto the file-system), I normally use Net_URL2 from pear. It's a simple class that just does the job.

To install, as root just call

# pear install channel://pear.php.net/Net_URL2-0.3.1

Even if it's a beta package, it's really stable.

A little example, let's say there is an array with all the images srcs in question and there is a base-URL for the document:

require_once('Net/URL2.php'); $baseUrl = 'http://www.example.com/test/images.html'; $docSrcs = array(...); $baseUrl = new Net_URL2($baseUrl); foreach($docSrcs as $href) { $url = $baseUrl->resolve($href); echo ' * ', $href, ' -> ', $url->getURL(), " "; // or echo " $href -> $url "; # Net_URL2 supports string context }

This will convert any relative links into absolute ones based on your base URL. The base URL is first of all the documents address. The document can override it by specifying another one with the base element^Docs. So you could look that up with the HTML parser you're already using (as well as the src and href values).

Net_URL2 reflects the current RFC 3986 to do the URL resolving.

Another thing that might be handy for your URL handling is the getNormalizedURL function. It does remove some potential error-cases like needless dot segments etc. which is useful if you need to compare one URL with another one and naturally for mapping the URL to a path then:

foreach($docSrcs as $href) { $url = $baseUrl->resolve($href); $url = $url->getNormalizedURL(); echo " $href -> $url "; }

So as you can resolve all URLs to absolute ones and you get them normalized, you can decide whether or not they are in question for your site, as long as the url is still a Net_URL2 instance, you can use one of the many functions to do that:

$host = strtolower($url->getHost()); if (in_array($host, array('example.com', 'www.example.com')) { # URL is on my server, process it further }

Left is the concrete path to the file in the URL:

$path = $url->getPath();

That path, considering you're comparing against a UNIX file-system, should be easy to prefix with a concrete base directory:

$filesystemImagePath = '/var/www/site-new/images'; $newPath = $filesystemImagePath . $path; if (is_file($newPath)) { # new image already exists. }

If you've got problems to combine the base path with the image path, the image path will always have a slash at the beginning.

Hope this helps.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

由用户输入一行字符串，以逗号为分隔符将字符串分隔后存入数组中，并输出。 html php 有问必答
2022-03-27 10:53

回答 3 已采纳 1 设置同name表格，就可以保存为数组。如： <input type=text name=s value=1> <input type=text name=s value=2&gt
如何使用PHP检查一个字符串是否有逗号分隔值 php
2016-02-19 10:16

回答 2 已采纳 Can be easily done by strpos if (strpos($sup_id, ',') !== false) { echo "There's a comma in the
python字符串如何在保留换行的前体下，将多个分隔符替换为一个？ python
2020-04-23 13:08

回答 2 已采纳可以用正则表达式： import re string="1 1111\n22 222\n333 3 3\n4444 4" string=",".join(re.split(r'
php常见面试题总结
2024-07-04 09:10

逸枫堂的博客然后使用TP框架中封装好的常量获取当前控制器和方法,然后把他们组装成字符串,使用in_array函数进行判断该数组中是否含有当前获取到的控制器和方法,如果没有,就提示该用户没有权限,如果有就进行下一步操作....
“可能没有为字符串添加零终止符”的警告 c语言
2022-04-30 22:46

回答 1 已采纳 426行sizeof(name)改为sizeof(name)+1
delphi 字符串查找或者匹配的问题？
2018-08-05 14:07

回答 1 已采纳 ``` Arr : array[0..4] of WideString =( WideString('中国'), WideString('乌拉圭'), WideString('日本'),
可以用一维指针进行交换字符串的操作吗？
2018-11-27 05:43

回答 2 已采纳 C语言中实参形参变量之间的数据传递是单向的“值传递”，不可能通过执行调用函数来改变实参指针变量的值，但是可以改变实参变量所指变量的值。你可以使用二级指针做参数来交换
常见的PHP面试问题及其解法
2024-04-08 11:56

①菜鸟的博客写一个函数，接受一个字符串参数，并返回反转的字符串。 function reverseString($string) { return strrev($string); } 写一个函数，接受一个数组参数，并返回数组中最大值和最小值的差。 function ...
请教如何把字符串作为分隔符把文件分割成不同文件 linux unix
2019-01-02 19:43

回答 1 已采纳 ``` #!/bin/bash i=0 while read line do if [[ $line =~ '0 rows affected' ]];then
php后台echo数值给java端字符串长度不符。 java php
2017-04-01 06:53

回答 1 已采纳应该是bom头，php存储为没有bom头的 [php隐形字符65279](http://www.w3dev.cn/article/20110817/php-hidden-char-65279-u
PHP如何实现本地html文件标签中的字符串替换？ html5 php
2018-07-19 07:18

回答 11 已采纳需要知道你采用的那个模板引擎，语法不太一样常规的是 ``` {php str_replace("必选的","","权威医学验光配镜，第一次配镜必选的正规医院");}或{str_repla
【PHP】URL加密解密（可逆）
2020-04-09 17:05

·氓的博客返回字符串，此字符串中除了 -_. 之外的所有非字母数字字符都将被替换成百分号（%）后跟两位十六进制数，空格则编码为加号（+）。此编码与 WWW 表单 POST 数据的编码方式是一样的，同时与 application/x-www...
json对象成为字符串，因为添加了双引号？ jquery json mysql php
2019-03-18 19:47

回答 1 已采纳 The json_encode() is working fine as it is providing the result as a JSON object as suggested by y
PHP 经典
2021-10-24 09:42

原克技术的博客 1、表单提交中的Get和Post的异同点 get 请求一般用于向服务端获取数据，post 一般向服务端提交数据 ...echo是PHP语句, print和print_r是函数,语句没有返回值,函数可以有返回值 print（）只能打印出简单类型
linux指令? 这些就够了!
2021-07-18 12:48

被大佬糊弄的只会点灯的小菜鸡的博客一、文件目录操作 1. ls 命令 ls 命令不仅可以查看 linux 文件夹包含的文件而且可以查看文件权限(包括目录、文件夹、文件权限)查看目录信息等等。命令格式 ls [选项][目录名] 常用参数 -l ：列出长数据串，...
没有解决我的问题, 去提问

悬赏问题

¥15 关于#vue.js#的问题：word excel和ppt预览问题语言-javascript)
¥15 Apache显示系统错误3该如何解决？
¥30 uniapp小程序苹果手机加载gif图片不显示动效？
¥20 js怎么实现跨域问题
¥15 C++dll二次开发，C#调用
¥15 请教，如何使用C#加载本地摄像头进行逐帧推流
¥15 Python easyocr无法顺利执行，如何解决？
¥15 为什么会突然npm err！啊
¥15 java服务连接es读取列表数据，服务连接本地es获取数据时的速度很快，但是换成远端的es就会非常慢，这是为什么呢
¥15 vxworks交叉编译gcc报错error: missing binary operator before token "("

使用字符串操作解开目录分隔符疯狂？

2条回答 默认 最新

悬赏问题

2条回答默认最新