Perhaps this is worth a shot (note: untested)
$desc = preg_replace('/\<br\b[^>]*>/i', ' ', $this->getDescription());
The expression explained:
-
\<br
is a literal match for the string <br
-
\b
is a word boundary: preg_match('/foo\bbar/', 'foobar')
will not match, but preg_match('/foo\bbar/', 'foo bar')
will match. That is, in essence a word-boundary. The beginning and ending of a word
-
[^>]*
matches all chareacters except for a literal >
. The asterisk states that this character class may occur zero or more times: with <br />
, for example, this char class will match /
(all spaces and the forward slash. Given this: <br>
, then this part will be skipped (occurs zero times)
-
>
is a litteral match for the close-tag >
char
If your markup is valid (ie not malformed), this expression will remove nothing you don't want to remove. But given strings like this: <br data-string="<b>Don't include markup here</b>"/>
this expression will fail: there is a property that contains markup, but that is something I, personally, find revolting. You don't include markup in an attribute of a tag, IMO.
Another case where regex lets the guard down is when encountering malformed markup:
<br/The closing > was omitted</p>
The regex will match the opening <br
, then the [^>]*
will match:
/The closing > was omitted</p
Only to match the >
of </p>
as the end of the br
tag. But that's just the "fault" of whoever wrote the markup...