I need to save the "plain" version of HTML content coming from a textarea with a WYSIWYG editor. Right now I'm using the following function right before saving into the database:
public function preUpdate(PreUpdateEventArgs $event)
{
if (($resource = $event->getEntity()) instanceof Resource) {
$resource->setPlainContent($this->computePlainContent($resource));
}
}
protected function computePlainContent(Resource $resource)
{
return preg_replace(
'/\s+/',
' ',
html_entity_decode(
strip_tags($resource->getContent()),
ENT_QUOTES | ENT_HTML401
)
);
}
Plain text will be used for searching among pages.
Questions:
- is this good/safe**, assuming the editor will always produce valid HTML?
- would you remove punctuation mark, and how?
- should I use
ENT_HTML401
orENT_XHTML
with CKEditor (default configuration, don't know the output quality)?
** for safe I mean safe to produce a good output. Users (o this system) are trusted.