I'm working on converting a website. It involved standardizing the directory structure of images and media files. I'm parsing path information from various tags, standardizing them, checking to see if the media exists in the new standardized location, and putting it there if it doesn't. I'm using string manipulation to do so.
This is a little open-ended, but is there a class, tool, or concept out there I can use to save myself some headaches? For instance, I'm running into problems where, say, a page in a sudirectory (website.com/subdir/dir/page.php
) has relative image paths (../images/image.png
), or other kinds of things like this. It's not like there's one overarching problem, but just a lot of little things that add up.
When I think I've got my script covering most cases, then I get errors like Could not find file at export/standardized_folder/proper_image_folderimage.png
where it should be export/standardized_folder/proper_image_folder/image.png
. It's kind of driving me mad, doing string parsing and checks to make sure that directory separators are in the proper places.
I feel like I'm putting too much work into making a one-off import script very robust. Perhaps someone's already untangled this mess in a re-useable way, one which I can take advantage of?
Post Script: So here's a more in-depth scoop. I write my script that parses one "type" of page and pulls content from the same of its kind. Then I turn my script to parse another type of page, get all knids of errors, and learn that all my assumptions about how paths are referenced must be thrown out the window. Wash, rinse, repeat.
So I'm looking at doing some major re-factoring of my script, throwing out all assumptions, and checking, re-checking, and double-checking path information. Since I'm really trying to build a robust path building script, hopefully I can avoid re-inventing the wheel. Is there a wheel out there?