I currently need to parse a lot of .phtml files, get specific html tags and add a custom data attribute to them. I'm using python beautifulsoup to parse the entire document and add the tags, and this part works just fine.
The problem is that on the view files (phtml) there are tags that get parsed too. Below is an example of input-output
INPUT
<?php
$stars = $this->getData('sideBarCoStars', []);
if (!$stars) return;
$sideBarCoStarsCount = $this->getData('sideBarCoStarsCount');
$title = $this->getData('sideBarCoStarsTitle');
$viewAllUrl = $this->getData('sideBarCoStarsViewAllUrl');
$isDomain = $this->getData('isDomain');
$lazy_load = $lazy_load ?? 0;
$imageSrc = $this->getData('emptyImageData');
?>
<header>
<h3>
<a href="<?php echo $viewAllUrl; ?>" class="noContentLink white">
<?php echo "{$title} ({$sideBarCoStarsCount})"; ?>
</a>
</h3>
OUTPUT
<?php
$stars = $this->
getData('sideBarCoStars', []);
if (!$stars) return;
$sideBarCoStarsCount = $this->getData('sideBarCoStarsCount');
$title = $this->getData('sideBarCoStarsTitle');
$viewAllUrl = $this->getData('sideBarCoStarsViewAllUrl');
$isDomain = $this->getData('isDomain');
$lazy_load = $lazy_load ?? 0;
$imageSrc = $this->getData('emptyImageData');
?>
<header>
<h3>
<a class="noContentLink white" href="<?php echo $viewAllUrl; ?>">
<?php echo "{$title} ({$sideBarCoStarsCount})"; ?>
</a>
</h3>
I tried different ways, but didn't succeed on making beautifulsoup to ignore the PHP tags. Is it possible to get html.parser custom rules to ignore , or to beautifulsoup? Thanks!