How To Use Htmlpurifier To Allow Entire Document To Be Passed Including Html,head,title,body
Solution 1:
HTML Purifier by default only knows tags that are valid within a <body>
context, because that's its intended use-case. Basically, it doesn't actually know what a <meta>
, <html>
, <head>
or <title>
tag is - and that's a big deal, because most of its security relies on understanding the semantic underpinnings of the HTML!
There are some older stackoverflow questions on this topic:
...but they don't currently have very useful answers, so after some contemplation, I think your question still has merit and am going to answer here.
Generally, this has been discussed a few times on the HTML Purifier forums (e.g. in Allow HTML, HEAD, STYLE and BODY tags) - but the nutshell is that you can't do this without a significant amount of work, and unfortunately I'm not currently familiar with any snippet of code that solves this problem with a simple copy and paste.
So you're going to have to dig into the guts of HTML Purifier.
You can teach HTML Purifier most tags and associated behaviour using the instructions on the Customize! documentation page. The part most interesting for you would be near the bottom, an example where <form>
is taught to HTML Purifier. Quoting from there for some posterity:
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config->set('HTML.DefinitionRev', 1);
$config->set('Cache.DefinitionImpl', null); // remove this later!
$def = $config->getHTMLDefinition(true);
$def->addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
array('_blank','_self','_target','_top')
));
$form = $def->addElement(
'form', // name
'Block', // content set
'Flow', // allowed children
'Common', // attribute collection
array( // attributes
'action*' => 'URI',
'method' => 'Enum#get|post',
'name' => 'ID'
)
);
$form->excludes = array('form' => true);
Each of the parameters corresponds to one of the questions we asked. Notice that we added an asterisk to the end of the action attribute to indicate that it is required. If someone specifies a form without that attribute, the tag will be axed. Also, the extra line at the end is a special extra declaration that prevents forms from being nested within each other.
You would have to do similar things with all tags outside of the <body>
tag that you want to support (all the way up to <html>
).
Note: Even if you add all these tags to HTML Purifier, the setting Core.ConvertDocumentToFragment
that you discovered needs to be set to false
(as you have done).
Alternative
If this looks like too much work, and you have other ways to sanitise the header section and body attributes of your document, you can also cut your document into pieces, sanitise the pieces separately, then carefully stick them back together.
(Or, of course, just use the alternative for the entire document.)
Solution 2:
Quick workaround. Edit function extractBody() of Lexer.php
public function extractBody($html)
{
return $html;
}
Post a Comment for "How To Use Htmlpurifier To Allow Entire Document To Be Passed Including Html,head,title,body"