Tim Pietzcker gave the Dot-Net counting version.
It has the same elements as the PCRE (php) version below.
All the caveats are the same. In particular, non-array parenthesis must
be balanced because they use the same closing parenthesis as delimiters.
All text must be parsed (or should be).
The outer groups 1, 2, 3, 4 allow you to get the parts
CONTENT
CORE-1 array()
CORE-2 any ()
EXCEPTIONS
Each match gets you one of these outer things and are mutually exclusive.
The trick is to define a php function parse( core)
that parses the CORE.
Inside that function is the while (regex.search( core ) { .. }
loop.
Each time either CORE-1 or 2 groups match, call the parse( core )
function passing
the contents of that core's group to it.
And inside the loop, just take off content and assign it to the hash.
Obviously, the group 1 construct which calls (?&content)
should be replaced
with constructs to obtain your hash like variable data.
On a detailed scale, this can be very tedious.
Usually, you'd have to account for every single character to correctly
parse the entire thing.
(?is)(?:((?&content))|(?>\barray\s*\()((?=.)(?&core)|)\)|\(((?=.)(?&core)|)\)|(\barray\s*\(|[()]))(?(DEFINE)(?<core>(?>(?&content)|(?>\barray\s*\()(?:(?=.)(?&core)|)\)|\((?:(?=.)(?&core)|)\))+)(?<content>(?>(?!\barray\s*\(|[()]).)+))
Expanded
# 1: CONTENT
# 2: CORE-1
# 3: CORE-2
# 4: EXCEPTIONS
(?is)
(?:
( # (1), Take off CONTENT
(?&content)
)
| # OR -----------------------------
(?> # Start 'array('
\b array \s* \(
)
( # (2), Take off 'array( CORE-1 )'
(?= . )
(?&core)
|
)
\) # End ')'
| # OR -----------------------------
\( # Start '('
( # (3), Take off '( any CORE-2 )'
(?= . )
(?&core)
|
)
\) # End ')'
| # OR -----------------------------
( # (4), Take off Unbalanced or Exceptions
\b array \s* \(
| [()]
)
)
# Subroutines
# ---------------
(?(DEFINE)
# core
(?<core>
(?>
(?&content)
|
(?> \b array \s* \( )
# recurse core of array()
(?:
(?= . )
(?&core)
|
)
\)
|
\(
# recurse core of any ()
(?:
(?= . )
(?&core)
|
)
\)
)+
)
# content
(?<content>
(?>
(?!
\b array \s* \(
| [()]
)
.
)+
)
)
Output
** Grp 0 - ( pos 0 , len 11 )
some_var =
** Grp 1 - ( pos 0 , len 11 )
some_var =
** Grp 2 - NULL
** Grp 3 - NULL
** Grp 4 [core] - NULL
** Grp 5 [content] - NULL
-----------------------
** Grp 0 - ( pos 11 , len 153 )
array(
'id' => nextId(),
'profile' => array(
'name' => 'Hugo Hurley',
'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
)
)
** Grp 1 - NULL
** Grp 2 - ( pos 17 , len 146 )
'id' => nextId(),
'profile' => array(
'name' => 'Hugo Hurley',
'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
)
** Grp 3 - NULL
** Grp 4 [core] - NULL
** Grp 5 [content] - NULL
-------------------------------------
** Grp 0 - ( pos 164 , len 3 )
;
** Grp 1 - ( pos 164 , len 3 )
;
** Grp 2 - NULL
** Grp 3 - NULL
** Grp 4 [core] - NULL
** Grp 5 [content] - NULL
A previous incarnation of something else, to get an idea of usage
# Perl code:
#
# use strict;
# use warnings;
#
# use Data::Dumper;
#
# $/ = undef;
# my $content = <DATA>;
#
# # Set the error mode on/off here ..
# my $BailOnError = 1;
# my $IsError = 0;
#
# my $href = {};
#
# ParseCore( $href, $content );
#
# #print Dumper($href);
#
# print "
";
# print "
Base======================
";
# print $href->{content};
# print "
First======================
";
# print $href->{first}->{content};
# print "
Second======================
";
# print $href->{first}->{second}->{content};
# print "
Third======================
";
# print $href->{first}->{second}->{third}->{content};
# print "
Fourth======================
";
# print $href->{first}->{second}->{third}->{fourth}->{content};
# print "
Fifth======================
";
# print $href->{first}->{second}->{third}->{fourth}->{fifth}->{content};
# print "
Six======================
";
# print $href->{six}->{content};
# print "
Seven======================
";
# print $href->{six}->{seven}->{content};
# print "
Eight======================
";
# print $href->{six}->{seven}->{eight}->{content};
#
# exit;
#
#
# sub ParseCore
# {
# my ($aref, $core) = @_;
# my ($k, $v);
# while ( $core =~ /(?is)(?:((?&content))|(?><!--block:(.*?)-->)((?&core)|)<!--endblock-->|(<!--(?:block:.*?|endblock)-->))(?(DEFINE)(?<core>(?>(?&content)|(?><!--block:.*?-->)(?:(?&core)|)<!--endblock-->)+)(?<content>(?>(?!<!--(?:block:.*?|endblock)-->).)+))/g )
# {
# if (defined $1)
# {
# # CONTENT
# $aref->{content} .= $1;
# }
# elsif (defined $2)
# {
# # CORE
# $k = $2; $v = $3;
# $aref->{$k} = {};
# # $aref->{$k}->{content} = $v;
# # $aref->{$k}->{match} = $&;
#
# my $curraref = $aref->{$k};
# my $ret = ParseCore($aref->{$k}, $v);
# if ( $BailOnError && $IsError ) {
# last;
# }
# if (defined $ret) {
# $curraref->{'#next'} = $ret;
# }
# }
# else
# {
# # ERRORS
# print "Unbalanced '$4' at position = ", $-[0];
# $IsError = 1;
#
# # Decide to continue here ..
# # If BailOnError is set, just unwind recursion.
# # -------------------------------------------------
# if ( $BailOnError ) {
# last;
# }
# }
# }
# return $k;
# }
#
# #================================================
# __DATA__
# some html content here top base
# <!--block:first-->
# <table border="1" style="color:red;">
# <tr class="lines">
# <td align="left" valign="<--valign-->">
# <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
# <!--hello--> <--again--><!--world-->
# some html content here 1 top
# <!--block:second-->
# some html content here 2 top
# <!--block:third-->
# some html content here 3 top
# <!--block:fourth-->
# some html content here 4 top
# <!--block:fifth-->
# some html content here 5a
# some html content here 5b
# <!--endblock-->
# <!--endblock-->
# some html content here 3a
# some html content here 3b
# <!--endblock-->
# some html content here 2 bottom
# <!--endblock-->
# some html content here 1 bottom
# <!--endblock-->
# some html content here1-5 bottom base
#
# some html content here 6-8 top base
# <!--block:six-->
# some html content here 6 top
# <!--block:seven-->
# some html content here 7 top
# <!--block:eight-->
# some html content here 8a
# some html content here 8b
# <!--endblock-->
# some html content here 7 bottom
# <!--endblock-->
# some html content here 6 bottom
# <!--endblock-->
# some html content here 6-8 bottom base
#
# Output >>
#
# Base======================
# some html content here top base
#
# some html content here1-5 bottom base
#
# some html content here 6-8 top base
#
# some html content here 6-8 bottom base
#
# First======================
#
# <table border="1" style="color:red;">
# <tr class="lines">
# <td align="left" valign="<--valign-->">
# <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
# <!--hello--> <--again--><!--world-->
# some html content here 1 top
#
# some html content here 1 bottom
#
# Second======================
#
# some html content here 2 top
#
# some html content here 2 bottom
#
# Third======================
#
# some html content here 3 top
#
# some html content here 3a
# some html content here 3b
#
# Fourth======================
#
# some html content here 4 top
#
#
# Fifth======================
#
# some html content here 5a
# some html content here 5b
#
# Six======================
#
# some html content here 6 top
#
# some html content here 6 bottom
#
# Seven======================
#
# some html content here 7 top
#
# some html content here 7 bottom
#
# Eight======================
#
# some html content here 8a
# some html content here 8b
#