Generated by
JDiff

com.itextpdf.styledxmlparser.jsoup.safety Documentation Differences

This file contains all the changes in documentation in the package com.itextpdf.styledxmlparser.jsoup.safety as colored differences. Deletions are shown like this , and additions are shown like this.
If no deletions or additions are shown in an entry, the HTML tags will be what has changed. The new HTML tags are shown in the differences. If no documentation existed, and then some was added in a later version, this change is noted in the appropriate class pages of differences, but the change is not shown on this page. Only changes in existing text are shown here. Similarly, documentation which was inherited from another class or interface is not shown here.
Note that an HTML error in the new documentation may cause the display of other documentation changes to be presented incorrectly. For instance, failure to close a tag will cause all subsequent paragraphs to be displayed differently.

Class Cleaner

The whitelist safelist based HTML cleaner. Use to ensure that end-user provided HTML contains only the elements and attributes that you are expecting; no junk, and no cross-site scripting attacks!

The HTML cleaner parses the input as HTML and then runs it through a white safe-list, so the output HTML can only contain HTML that is allowed by the whitelist safelist.

It is assumed that the input HTML is a body fragment; the clean methods only pull from the source's body, and the canned white safe-lists only allow body contained tags.

Rather than interacting directly with a Cleaner object, generally see the {@code clean} methods in com.itextpdf.styledxmlparser.jsoup.Jsoup.

Class Cleaner, constructor Cleaner(Whitelist)

Create Use a new cleaner, that sanitizes documents using the supplied .Cleaner(Safelist) whitelist instead. @ param whitelist white-list deprecated to clean as of with 1.14.1.
Class Cleaner, Document clean(Document)

Creates a new, clean document, from the original dirty document, containing only elements allowed by the whitelist safelist. The original document is not modified. Only elements from the dirt dirty document's body are used. The OutputSettings of the original document are cloned into the clean document. @param dirtyDocument Untrusted base document to clean. @return cleaned document.
Class Cleaner, boolean isValid(Document)

Determines if the input document bodyis valid, against the whitelist safelist. It is considered valid if all the tags and attributes in the input HTML are allowed by the whitelist safelist, and that there is no content in the head.

This method can be used as a validator for user input forms . An invalid document will still be cleaned successfully using the .clean(Document) document. If using as a validator, it is recommended to still clean the document to ensure enforced attributes are set correctly, and that the output is tidied. @param dirtyDocument document to test @return true if no tags or attributes need to be removed; false if they do


Class Whitelist

Whitelists define what HTML (elements and attributes) to allow through the cleaner. Everything else is removed. Start with @deprecated one As of the defaults: .none .simpleText .basic release v1. basicWithImages 14. relaxed If you need to allow more through (please be careful!) 1, tweak a this base whitelist class is with: deprecated in .addTags favour .addAttributes of .addEnforcedAttribute Safelist . addProtocols You can remove any setting from an existing whitelist The name has been changed with the intent of with: promoting more .removeTags inclusive language. removeAttributes .removeEnforcedAttribute Safelist .removeProtocols is a The drop-in cleaner replacement, and these whitelists assume that you want to clean a no further changes other than updating the name in body your fragment of HTML code are required ( to add user supplied HTML into cleanly a templated page), and migrate. not to clean a full HTML This class will be removed in document v1. If the latter 15.1. is the case Until that release, either wrap the document HTML around the cleaned this body class HTML, or acts create as a whitelist that shim allows to html maintain and code head compatibility elements (source as and appropriate binary).

If you are going to extend For a whitelist, please be very careful. Make sure you understand clear what rationale attributes may lead of to XSS attack the removal of vectors. this URL change, attributes are please see particularly Terminology, vulnerable Power, and require careful validation. See http://ha.ckers.org/xss.html for some XSS Inclusive attack Language examples. in @author Internet-Drafts Jonathan and Hedley RFCs