I'm am building an app for a local online newspaper company.
They have an existing website which is a wordpress site where they upload news stories (wordpress posts).
The only people uploading the news stories are journalists within the company.
In one of the main sections of the app i'm building, I connect to this wordpress database (with a php file on the same server) and retrieve news story content to display within the app. I have built this service myself with php and used javascript to insert to the html on the client side.
I have been reading up on security (including the OWASP cheat sheet for XSS prevention) and have been taking the necessary steps to implement maximum security into the app including encoding the data before inserting to the html. However some of the content coming from the database contains html and this is where my concern/question is (more details on this to come)
Here is the flow of the app:
Establish a PDO connection with the wordpress database (also setting the charset to utf-8. and setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
) as stated here for protection against SQL injection.
<?php
include_once 'wp_psl_config.php';
//initiate a PDO connection
$pdoConnection = new PDO(HOSTDBNAME, USER, PASSWORD);
$pdoConnection->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$pdoConnection->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$pdoConnection->exec("SET CHARACTER SET utf8");
?>
I am using parameterized queries and prepared statements to retrieve news stories as follows:
function getStoryData($story_id, $pdoConnection){
$data = array();
$query ='SELECT * FROM wp_posts WHERE ID=:story_id';
$statement = $pdoConnection->prepare($query);
$statement->bindValue(':story_id', $story_id, PDO::PARAM_INT);
$statement->execute();
$statement->setFetchMode(PDO::FETCH_ASSOC);
//store content into $data array
return $data;
}
On the client side I have been using OWASP ESAPI javascript library for encoding content before inserting to html. I am using the encodeForHTML() function for encoding the post_title, post_excerpt, post_date etc (before inserting to my html) as these do not contain any html that needs to be rendered.
Here is an example of my Javascript/Jquery code for generating and inserting the html:
var safe_post_title = $ESAPI.encoder().encodeForHTML(post_title);
var safe_story_html = '<h3 class="story_headline">' + safe_post_title + '</h3>';
$('#story_area').html(safe_story_html);
However the wordpress post_content field (which contains the main story content) contains many different html elements and also script tags and so this is where my concern is.
Here is an example of the data in the wordpress post_content field:
Line of text... more text... more text.
more text...
If you're not sure who represents you, you can find out
<a href="http://example.com/">here</a>.
<h5>Search here:</h5>
<div id="ragic_webview"></div>
<script type="text/javascript">// <![CDATA[
var ragic_url = 'www.ragic.com/companyname/sheets/3';
var ragic_feature= 'fts';
var exactMatch = true;
/* * * DON'T EDIT BELOW THIS LINE * * */
(function() {
var rq = document.createElement('script');
rq.type = 'text/javascript';
rq.async = true;
rq.src = window.location.protocol == "https:" ? "https://www.ragic.com/intl/common/loadfts.js" : "http://www.ragic.com/intl/common/loadfts.js";
(document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(rq);
})();
// ]]>
</script>
<noscript>Please enable JavaScript to view the <a href="http://www.ragic.com/?ref_noscript">Online database form by Ragic.</a></noscript>
<a id="ragic-link" href="http://www.ragic.com">online database form by <span class="logo-ragic">Ragic</span></a>
Another example of post_content data:
Line of text... more text... more text.
more text...
<script id="infogram_0_housing_list_by_area" src="//e.infogr.am/js/embed.js?c5h" type="text/javascript"></script>
<div style="width: 100%; padding: 8px 0; font-family: Arial; font-size: 13px; line-height: 15px; text-align: center;">
<a style="color: #989898; text-decoration: none;" href="https://infogr.am/housing_list_by_area" target="_blank">Housing List, by Area</a> <span class="break_between_paragraphs"></span>
<a style="color: #989898; text-decoration: none;" href="https://infogr.am" target="_blank">
Create your own infographics</a>
</div>
Some main questions I have:
The company have an anti spam on their wordpress site. Does this lessen the security concern for me when displaying this content in the app?
Also, Should I allow the script tags at all?
- Overall, can you give me some advice on what is the most secure way to display this data. I have looked into html purifier. Is this a good option?