PCRE and newlines
PCRE has a superfluity of newline related escape sequences and alternatives.
Well, a nifty escape sequence that you can use here is \R
. By default \R
will match Unicode newlines sequences, but it can be configured using different alternatives.
To match any Unicode newline sequence that is in the ASCII
range.
preg_match('~\R~', $string);
This is equivalent to the following group:
(?>
|
||\f|\x0b|\x85)
To match any Unicode newline sequence; including newline characters outside the ASCII
range and both the line separator (U+2028
) and paragraph separator (U+2029
), you want to turn on the u
(unicode) flag.
preg_match('~\R~u', $string);
The u
(unicode) modifier turns on additional functionality of PCRE and Pattern strings are treated as (UTF-8).
The is equivalent to the following group:
(?>
|
||\f|\x0b|\x85|\x{2028}|\x{2029})
It is possible to restrict \R
to match CR
, LF
, or CRLF
only:
preg_match('~(*BSR_ANYCRLF)\R~', $string);
The is equivalent to the following group:
(?>
|
|)
Additional
Five different conventions for indicating line breaks in strings are supported:
(*CR) carriage return
(*LF) linefeed
(*CRLF) carriage return, followed by linefeed
(*ANYCRLF) any of the three above
(*ANY) all Unicode newline sequences
Note: \R
does not have special meaning inside of a character class. Like other unrecognized escape sequences, it is treated as the literal character "R" by default.