I have a multidimensional array that looks like this:
Array
(
[0] => Array
(
[0] => Title 1
[1] => Some text ... US5801351017 ...
)
[1] => Array
(
[0] => Title 2
[1] => Some text ... US0378331005 ...
)
[2] => Array
(
[0] => Title 3
[1] => Some text ... //Note here that it does not contain an ISIN Code
)
...
I am trying to filter out the arrays that match my Regex containg an ISIN Code. The array above was produced from the following code:
$title = $html->find("h3.r a");
$titlearray = array_map(function($value){
return trim($value->plaintext);
}, $title);
$description = $html->find("span.st");
$descriptionarray = array_map(function($value){
$string = strip_tags($value);
return $string;
}, $description);
$result1 = array();
foreach($titlearray as $key => $value) {
$tmp = array($value);
if (isset($descriptionarray[$key])) {
$tmp[] = $descriptionarray[$key];
}
$result1[] = $tmp;
}
print_r($result1);
I have written some code that comes very close but does not really unset
the arrays that do not contain an ISIN Code. The code I have is this:
$title = $html->find("h3.r a");
$titlearray = array_map(function($value){
return trim($value->plaintext);
}, $title);
$description = $html->find("span.st");
$descriptionarray = array_map(function($value){
$match = array();
$string = strip_tags($value);
$pattern = "/[BE|BM|FR|BG|VE|DK|HR|DE|JP|HU|HK|JO|US|BR|XS|FI|GR|IS|RU|LB|"
. "PT|NO|TW|UA|TR|LK|LV|LU|TH|NL|PK|PH|RO|EG|PL|AA|CH|CN|CL|EE|CA|"
. "IR|IT|ZA|CZ|CY|AR|AU|AT|IN|CS|CR|IE|ID|ES|PE|TN|PA|SG|IL|US|MX|"
. "SK|KRSI|KW|MY|MO|SE|GB|GG|KY|JE|VG|NG|SA|MU]{2}[A-Z0-9]{10}/";
preg_match($pattern, $string, $match);
return $match;
}, $description);
$merged = array();
$i=0;
foreach($descriptionarray as $value){
$merged[$i] = $value;
$merged[$i][] = $titlearray[$i];
$i++;
}
print_r($merged);
which gives me these arrays:
Array
(
[0] => Array
(
[0] => US5801351017
[1] => Title 1
)
[1] => Array
(
[0] => US0378331005
[1] => Title 2
)
[2] => Array
(
[0] => Title 3
)
...
How can I get rid of the arrays that do not match my Regex? What I am looking for is this output:
Array
(
[0] => Array
(
[0] => Title 1
[1] => US5801351017
)
[1] => Array
(
[0] => Title 2
[1] => US0378331005
)
...
EDIT
@CasimiretHippolyte
According to his answer, I have this code now:
$titles = $html->find("h3.r a");
$descriptions = $html->find("span.st");
$ISIN_PATTERN = "/[BE|BM|FR|BG|VE|DK|HR|DE|JP|HU|HK|JO|US|BR|XS|FI|GR|IS|RU|LB|"
. "PT|NO|TW|UA|TR|LK|LV|LU|TH|NL|PK|PH|RO|EG|PL|AA|CH|CN|CL|EE|CA|"
. "IR|IT|ZA|CZ|CY|AR|AU|AT|IN|CS|CR|IE|ID|ES|PE|TN|PA|SG|IL|US|MX|"
. "SK|KRSI|KW|MY|MO|SE|GB|GG|KY|JE|VG|NG|SA|MU]{2}[A-Z0-9]{10}/";
$results = [];
foreach ($descriptions as $k => $v) {
if (preg_match($ISIN_PATTERN, strip_tags($v), $m)) {
$results[] = ['Title' => trim($titles[$k]->plaintext), 'ISIN' => $m[1]];
}
}
print_r($results);
This narrows my array down selecting merely the elements that match the Regex, but it does not display the matches under 'ISIN' => $m[1]
. It outputs this:
Array
(
[0] => Array
(
[Title] => Title 1
[ISIN] =>
)
[1] => Array
(
[Title] => Title 2
[ISIN] =>
)
...
FURTHER EDIT
This code solves the issue:
$titles = $html->find("h3.r a");
$descriptions = $html->find("span.st");
$ISIN_PATTERN = "/[BE|BM|FR|BG|VE|DK|HR|DE|JP|HU|HK|JO|US|BR|XS|FI|GR|IS|RU|LB|"
. "PT|NO|TW|UA|TR|LK|LV|LU|TH|NL|PK|PH|RO|EG|PL|AA|CH|CN|CL|EE|CA|"
. "IR|IT|ZA|CZ|CY|AR|AU|AT|IN|CS|CR|IE|ID|ES|PE|TN|PA|SG|IL|US|MX|"
. "SK|KRSI|KW|MY|MO|SE|GB|GG|KY|JE|VG|NG|SA|MU]{2}[A-Z0-9]{10}/";
$results1 = [];
foreach ($descriptions as $k => $v) {
if (preg_match($ISIN_PATTERN, strip_tags($v), $m)) {
$results1[] = ['Title' => trim($titles[$k]->plaintext), 'ISIN' => $m[1]];
}
}
$titlesarray = array_column($results1, 'Title');
$results2 = array_map(function($value){
$match = array();
$string = strip_tags($value);
$pattern = "/[BE|BM|FR|BG|VE|DK|HR|DE|JP|HU|HK|JO|US|BR|XS|FI|GR|IS|RU|LB|"
. "PT|NO|TW|UA|TR|LK|LV|LU|TH|NL|PK|PH|RO|EG|PL|AA|CH|CN|CL|EE|CA|"
. "IR|IT|ZA|CZ|CY|AR|AU|AT|IN|CS|CR|IE|ID|ES|PE|TN|PA|SG|IL|US|MX|"
. "SK|KRSI|KW|MY|MO|SE|GB|GG|KY|JE|VG|NG|SA|MU]{2}[A-Z0-9]{10}/";
preg_match($pattern, $string, $match);
return $match;
}, $descriptions);
$descriptionarray = array_column($results2, 0);
$result3 = array();
foreach($titlesarray as $key => $value) {
$tmp = array($value);
if (isset($descriptionarray[$key])) {
$tmp[] = $descriptionarray[$key];
}
$result3[] = $tmp;
}
print_r($result3);
I scraped something together very fast as I needed a quick solution. This is highly inefficient given that I use an extra arrar_map()
, simplify the arrays into a Simple Array and then join them back together. Apart from that, I repeat my Regex.
LAST EDIT
@CasimiretHippolyte answer is the most efficient solution and gives the answer for using either his pattern with $m[1]
or my pattern with $m[0]
.