I am creating my own language.
The goal is to "compile" it to PHP or Javascript, and, ultimately, to interpret and run it on the same language, to make it look like a "middle-level" language.
Right now, I'm focusing on the aspect of interpreting it in PHP and run it.
At the moment, I'm using regex to split the string and extract the multiple tokens.
This is the regex I have:
/\:((?:cons@(?:\d+(?:\.\d+)?|(?:"(?:(?:\\\\)+"|[^"]|(?:
||
))*")))|(?:[a-z]+(?:@[a-z]+)?|\^?[\~\&](?:[a-z]+|\d+|\-1)))/g
This is quite hard to read and maintain, even though it works.
Is there a better way of doing this?
Here is an example of the code for my language:
:define:&0:factorial
:param:~0:static
:case
:lower@equal:cons@1
:case:end
:scope
:return:cons@1
:scope:end
:scope
:define:~0:static
:define:~1:static
:require:static
:call:static@sub:^~0:~1 :store:~0
:call:&-1:~0 :store:~1
:call:static@sum:^~0:~1 :store:~0
:return:~0
:scope:end
:define:end
This defines a recursive function to calculate the factorial (not so well written, that isn't important).
The goal is to get what is after the :
, including the @
. :static@sub
is a whole token, saving it without the :
.
Everything is the same, except for the token :cons
, which can take a value after. The value is a numerical value (integer
or float
, called static
or dynamic
in the language, respectively) or a string, which must start and end with "
, supporting escaping like \"
. Multi-line strings aren't supported.
Variables are the ones with ~0
, using ^
before will get the value to the above :scope
.
Functions are similar, being used &0
instead and &-1
points to the current function (no need for ^&-1
here).
Said this, Is there a better way to get the tokens?
Here you can see it in action: http://regex101.com/r/nF7oF9/2