Sorry for the confusing title, but I can't think of another one.
I have a text-file in this format (just a few lines taken out of context):
# Google_Product_Taxonomy_Version: 2015-02-19
1 - Animals & Pet Supplies
3237 - Animals & Pet Supplies > Live Animals
2 - Animals & Pet Supplies > Pet Supplies
3 - Animals & Pet Supplies > Pet Supplies > Bird Supplies
7385 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cage Accessories
499954 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cage Accessories > Bird Cage Bird Baths
7386 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cage Accessories > Bird Cage Food & Water Dishes
4989 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cages & Stands
4990 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Food
So far, so good. I want to write a parser, which contains all the information for each category. After the work is done, it has to be written in a mysql-DB.
There are exactly:
1 unique ID
1 Main-category
n sub-categories
The tricky part (for me) is, how to keep those information and save them in an array, with an aspect on the performance.
My DB must have a final output like this
ID | parent | title |
1 | | Animals & Pet Supplies
3232 | 1 | Live Animals
2 | 1 | Pet Supplies
3 | 2 | Bird Supplies
In fact, I must be able to reproduce this "crumb" pure by my DB-entries.
I started with my parser like this:
public function enrichTaxonomy()
{
$aOutput = array();
// ignore first line
fgets($handle);
// iterate throug it
while (($line = fgets($handle)) !== false)
{
$splitted = explode("-", $line);
// build first level
if (strpos($splitted[1], '>') === false)
{
$aOutput['id'][] = trim($splitted[0]);
$aOutput['title'][] = trim($splitted[1]);
} else
{
// recursive?
if (substr_count($splitted[1], " > ") == 1)
{
$splitted2ndLevel = explode(" > ", $splitted[1]);
$aOutput['id'][] = trim($splitted[0]);
$aOutput['title'][] = trim($splitted2ndLevel[1]);
}
}
}
echo "<pre>";
var_dump($aOutput);
echo "</pre>";
}
But I realized, that this isn't a very good way, since my next step would have been:
if (substr_count($splitted[1], " > ") == 2)
{
$splitted3rdLevel = explode(" > ", $splitted[1]);
$aOutput['id'][] = trim($splitted[0]);
$aOutput['title'][] = trim($splitted3rdLevel[2]);
}
if (substr_count($splitted[1], " > ") == 3)
{
$splitted4thLevel = explode(" > ", $splitted[1]);
$aOutput['id'][] = trim($splitted[0]);
$aOutput['title'][] = trim($splitted4thLevel[3]);
}
Also, this seems to be very complicated afterwards, when I try to have a final array, which I may then iterate trough to insert this data in my DB.
An important note is, that each "subcategory" has to know its "father", so I can insert the "parent"-id as well.
My question now: What is a good, short (in relation), performant way to achieve this?