I will strive to be objective in the question.
I have a database with millions of mathematical formulas created by an application, these are formulas that respect a logic:
1) Only 5 mathematical operations involved: addition, subtraction, multiplication, division and power (+ - * / ^)
2) Each operation is delimited/grouped by parentheses.
The "parameters" can be simple, such as a constant or a variable.
Example: (x + 15)
In the compound case, I may have:
(x * (x + 15))
It is important to remember that the correct order of operations was guaranteed with the use of parentheses.
Example:
((x * (x + 15)) / 15)
My challenge is to reduce these formulas.
They are not useful or repeat themselves a lot when the operations are the same and no matter the order of the factors.
Example:
((((x + 4) + 8) - x) / 12)
That equals 1, it's not a formula, it's a constant, and I have to ignore it.
And duplicities like ((x + 4) + 8) and ((8 + x) + 4) need to be eliminated.
That's why I need to reduce.
Just to be aware, all formulas are standardized with the parentheses and spaces between operations.
I created a routine in PHP using regular expressions to make substitutions, I managed to make huge progress reducing in a specific formula from 8 thousand possibilities to less than three hundred.
However, as the size (not complexity because they are not complex) of the formulas increases, my routine can no longer be efficient.
I need an algorithm, not a routine, to apply the mathematical reduction we learned in school.
I think it is possible since the formulas are standardized and limited to 5 basic mathematical operations.
I am using the EvalMath class obtained in GitHub to aid in the reduction and execution of the formulas.
To give you a better idea, the formulas are abstract and each "@" is replaced in real time by constants and variables.
(@ + @)
(@ / @)
(@ ** @)
((@ + @) + @)
((@ + @) ** @)
((@ - @) + @)
((@ - @) / @)
((@ - @) ** @)
((@ * @) + @)
((@ * @) / @)
((@ * @) ** @)
((@ / @) / @)
Here is a snippet of PHP code that is part of my reduction routine.
Within a WHILE (TRUE) I've grouped rules that repeat themselves until no substitutions are made.
In the end I have an array with numerous duplicities obtained with some reduction and reordering of the elements of the formula, something that an array_unique () solves.
I really need help, my brain is exploding with this problem.
Thank you.
<?php
$Math_Formulas = array(
'(((x + 7) ** 9) - 9)',
'(((x ^ 3) - 9) - 5)',
'(((2 + x) + x) * x)',
'(((x + 3) / 6) / 8)',
'(((x - 5) + 6) ** 2)',
'(1024 ^ (x / 5))',
'((3 - (x + 6)) + 3)',
'(((x ^ 3) + 9) * 6)',
);
while (TRUE)
{
$changed = FALSE;
// Rule 1: (x - x) = 0
for ($i = 0; $i < count($Math_Formulas); $i++)
{
$Formula = trim(preg_replace_callback('/([^-]?)([a-z]+?) [-] ([a-z]+?)/', function($matches) {
$Expression = $matches[0];
if ($matches[2] == $matches[3])
{
$Expression = $matches[1] . '0';
}
return($Expression);
}, $Math_Formulas[$i]));
if ($Formula != $Math_Formulas[$i])
{
$changed = TRUE;
$Math_Formulas[$i] = $Formula;
}
}
// Rule 2: ...
if (!$changed)
{
break;
}
}
$Math_Formulas = array_values(array_unique($Math_Formulas));
?>
UPDATE 1:
I think if the "reverse polish notation" had been used in the creation of the formulas everything would be much easier, but with the formulas that I have, I need to rearrange the parameters (in ascending or descending alphabetical order) to be able to compare them.
In RPN:
(x + 4) + 5) becomes "x 4 5 +"
(X + 5) + 4) becomes "x 5 4 +"
How to compare both? And what about larger functions?
I think I made a mistake by not detailing the technique used in the 14 "regular expressions" that I am applying to simplify, as much as possible, these formulas. There is more than regular expression in the process:
Original Formula: (((4 - 5) + x) + 8)
Step 1: Addition (or subtraction or multiplication) of the two-to-two constants and reduction of the expression, without the parentheses.
Formula: ((-1 + x) + 8)
Step 2: Delete the parentheses for ((n +- n) +- n) or (n +- (n +- n)).
Formula: (-1 + x + 8)
Step 3: Reorder the parameters in descending alphabetical order.
Formula: (x + 8 - 1)
Step 4: In the loop, Step 1 is executed again.
Final Formula: (x + 7)
There are more transformations, for example (x + x + x) becomes (3 * x), (-x + x) becomes 0.
All of this was getting beautiful, but when I came across functions such as ((x * 9) * (x * 5)) / 9), that logic lost efficiency. I would have to create at least one more 14 other nested rules.