I would like to capture each of these in their own group with preg_match_all in PHP.
- The chapter, section, or page
- The number (or letter if it has one) of the specified chapter, section, or page. If there is a single space between them it should be taken into account
- The words "and", "or"
Keeping in mind that the number of items in the string may be dynamic, the regex should work on all the examples below:
- Ch1 and Sect2b
- Ch 4 x blahunwantedtext and Sect 5y and Sect6 z and Ch7 or Ch8
This is what I managed to come up with so far:
<?php
$str = 'Ch 1 a and Sect 2b and Pg3';
preg_match_all ('/([a-z]+)([\s]?[0-9]+)([\s]?[a-z]*)([\s]?and*[\s]?)/is', $str, $matches);
Array
(
[0] => Array
(
[0] => Ch 1 a and
[1] => Sect 2b and
)
[1] => Array
(
[0] => Ch
[1] => Sect
)
[2] => Array
(
[0] => 1
[1] => 2
)
[3] => Array
(
[0] => a
[1] => b
)
[4] => Array
(
[0] => and
[1] => and
)
)
I'm unable to match the last portions of the string (Pg3) in my array.
The expected result should be:
Array
(
[0] => Array
(
[0] => Ch 1 a and
[1] => Sect 2b and
[2] => Pg3
)
[1] => Array
(
[0] => Ch
[1] => Sect
[2] => Pg
)
[2] => Array
(
[0] => 1
[1] => 2
[2] => 3
)
[3] => Array
(
[0] => a
[1] => b
[2] =>
)
[4] => Array
(
[0] => and
[1] => and
[2] =>
)
)