The core public access point to the jsoup functionality. More...

Static Public Member Functions
static Document	Parse (String html, String baseUri)
	Parse HTML into a Document. More...

static Document	Parse (String html, String baseUri, iText.StyledXmlParser.Jsoup.Parser.Parser parser)
	Parse HTML into a Document, using the provided Parser. More...

static Document	Parse (String html)
	Parse HTML into a Document. More...

static Document	ParseXML (String xml, String baseUri)
	Parse XML into a Document. More...

static Document	ParseXML (String xml)
	Parse XML into a Document. More...

static Document	ParseXML (Stream @in, String charsetName, String baseUri)
	Parse XML into a Document. More...

static Document	ParseXML (Stream @in, String charsetName)
	Parse XML into a Document. More...

static Document	Parse (FileInfo @in, String charsetName, String baseUri)
	Parse the contents of a file as HTML. More...

static Document	Parse (FileInfo @in, String charsetName)
	Parse the contents of a file as HTML. More...

static Document	Parse (Stream @in, String charsetName, String baseUri)
	Read an input stream, and parse it to a Document. More...

static Document	Parse (Stream @in, String charsetName, String baseUri, iText.StyledXmlParser.Jsoup.Parser.Parser parser)
	Read an input stream, and parse it to a Document. More...

static Document	ParseBodyFragment (String bodyHtml, String baseUri)
	Parse a fragment of HTML, with the assumption that it forms the `body` of the HTML. More...

static Document	ParseBodyFragment (String bodyHtml)
	Parse a fragment of HTML, with the assumption that it forms the `body` of the HTML. More...

static String	Clean (String bodyHtml, String baseUri, Whitelist whitelist)
	Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes. More...

static String	Clean (String bodyHtml, Whitelist whitelist)
	Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes. More...

static String	Clean (String bodyHtml, String baseUri, Whitelist whitelist, OutputSettings outputSettings)
	Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes. More...

static bool	IsValid (String bodyHtml, Whitelist whitelist)
	Test if the input HTML has only tags and attributes allowed by the Whitelist. More...

Detailed Description

The core public access point to the jsoup functionality.

Jonathan Hedley

Member Function Documentation

◆ Clean() [1/3]

static String iText.StyledXmlParser.Jsoup.Jsoup.Clean	(	String	bodyHtml,
		String	baseUri,
		Whitelist	whitelist
	)

inlinestatic

Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.

Parameters

bodyHtml	input untrusted HTML (body fragment)
baseUri	URL to resolve relative URLs against
whitelist	white-list of permitted HTML elements

Returns: safe HTML (body fragment)

See also: iText.StyledXmlParser.Jsoup.Safety.Cleaner.Clean(iText.StyledXmlParser.Jsoup.Nodes.Document)

◆ Clean() [2/3]

static String iText.StyledXmlParser.Jsoup.Jsoup.Clean	(	String	bodyHtml,
		String	baseUri,
		Whitelist	whitelist,
		OutputSettings	outputSettings
	)

inlinestatic

Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.

Parameters

bodyHtml	input untrusted HTML (body fragment)
baseUri	URL to resolve relative URLs against
whitelist	white-list of permitted HTML elements
outputSettings	document output settings; use to control pretty-printing and entity escape modes

Returns: safe HTML (body fragment)

See also: iText.StyledXmlParser.Jsoup.Safety.Cleaner.Clean(iText.StyledXmlParser.Jsoup.Nodes.Document)

◆ Clean() [3/3]

static String iText.StyledXmlParser.Jsoup.Jsoup.Clean	(	String	bodyHtml,
		Whitelist	whitelist
	)

inlinestatic

Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.

Parameters

bodyHtml	input untrusted HTML (body fragment)
whitelist	white-list of permitted HTML elements

Returns: safe HTML (body fragment)

See also: iText.StyledXmlParser.Jsoup.Safety.Cleaner.Clean(iText.StyledXmlParser.Jsoup.Nodes.Document)

◆ IsValid()

static bool iText.StyledXmlParser.Jsoup.Jsoup.IsValid	(	String	bodyHtml,
		Whitelist	whitelist
	)

inlinestatic

Test if the input HTML has only tags and attributes allowed by the Whitelist.

Test if the input HTML has only tags and attributes allowed by the Whitelist. Useful for form validation. The input HTML should still be run through the cleaner to set up enforced attributes, and to tidy the output.

Parameters

bodyHtml	HTML to test
whitelist	whitelist to test against

Returns: true if no tags or attributes were removed; false otherwise

See also: Clean(System.String, iText.StyledXmlParser.Jsoup.Safety.Whitelist)

◆ Parse() [1/7]

static Document iText.StyledXmlParser.Jsoup.Jsoup.Parse	(	FileInfo @	in,
		String	charsetName
	)

inlinestatic

Parse the contents of a file as HTML.

Parse the contents of a file as HTML. The location of the file is used as the base URI to qualify relative URLs.

Parameters

in	file to load HTML from
charsetName	(optional) character set of file contents. Set to `null` to determine from `http-equiv` meta tag, if present, or fall back to `UTF-8` (which is often safe to do).

Returns: sane HTML

Exceptions

System.IO.IOException if the file could not be found, or read, or if the charsetName is invalid.

See also: Parse(System.IO.FileInfo, System.String, System.String)

◆ Parse() [2/7]

static Document iText.StyledXmlParser.Jsoup.Jsoup.Parse	(	FileInfo @	in,
		String	charsetName,
		String	baseUri
	)

inlinestatic

Parse the contents of a file as HTML.

Parameters

in	file to load HTML from
charsetName	(optional) character set of file contents. Set to `null` to determine from `http-equiv` meta tag, if present, or fall back to `UTF-8` (which is often safe to do).
baseUri	The URL where the HTML was retrieved from, to resolve relative links against.

Returns: sane HTML

Exceptions

System.IO.IOException if the file could not be found, or read, or if the charsetName is invalid.

◆ Parse() [3/7]

static Document iText.StyledXmlParser.Jsoup.Jsoup.Parse	(	Stream @	in,
		String	charsetName,
		String	baseUri
	)

inlinestatic

Read an input stream, and parse it to a Document.

Parameters

in	input stream to read. Make sure to close it after parsing.
charsetName	(optional) character set of file contents. Set to `null` to determine from `http-equiv` meta tag, if present, or fall back to `UTF-8` (which is often safe to do).
baseUri	The URL where the HTML was retrieved from, to resolve relative links against.

Returns: sane HTML

Exceptions

System.IO.IOException if the file could not be found, or read, or if the charsetName is invalid.

◆ Parse() [4/7]

static Document iText.StyledXmlParser.Jsoup.Jsoup.Parse	(	Stream @	in,
		String	charsetName,
		String	baseUri,
		iText.StyledXmlParser.Jsoup.Parser.Parser	parser
	)

inlinestatic

Read an input stream, and parse it to a Document.

Read an input stream, and parse it to a Document. You can provide an alternate parser, such as a simple XML (non-HTML) parser.

Parameters

in	input stream to read. Make sure to close it after parsing.
charsetName	(optional) character set of file contents. Set to `null` to determine from `http-equiv` meta tag, if present, or fall back to `UTF-8` (which is often safe to do).
baseUri	The URL where the HTML was retrieved from, to resolve relative links against.
parser	alternate parser to use.

Returns: sane HTML

Exceptions

System.IO.IOException if the file could not be found, or read, or if the charsetName is invalid.

◆ Parse() [5/7]

static Document iText.StyledXmlParser.Jsoup.Jsoup.Parse ( String html )

inlinestatic

Parse HTML into a Document.

Parse HTML into a Document. As no base URI is specified, absolute URL detection relies on the HTML including a tag.

Parameters

html	HTML to parse

Returns: sane HTML

See also: Parse(System.String, System.String)

◆ Parse() [6/7]

static Document iText.StyledXmlParser.Jsoup.Jsoup.Parse	(	String	html,
		String	baseUri
	)

inlinestatic

Parse HTML into a Document.

Parse HTML into a Document. The parser will make a sensible, balanced document tree out of any HTML.

Parameters

html	HTML to parse
baseUri	The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a tag.

Returns: sane HTML

◆ Parse() [7/7]

static Document iText.StyledXmlParser.Jsoup.Jsoup.Parse	(	String	html,
		String	baseUri,
		iText.StyledXmlParser.Jsoup.Parser.Parser	parser
	)

inlinestatic

Parse HTML into a Document, using the provided Parser.

Parse HTML into a Document, using the provided Parser. You can provide an alternate parser, such as a simple XML (non-HTML) parser.

Parameters

html	HTML to parse
baseUri	The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a tag.
parser	alternate parser to use.

Returns: sane HTML

◆ ParseBodyFragment() [1/2]

static Document iText.StyledXmlParser.Jsoup.Jsoup.ParseBodyFragment ( String bodyHtml )

inlinestatic

Parse a fragment of HTML, with the assumption that it forms the body of the HTML.

Parameters

bodyHtml body HTML fragment

Returns: sane HTML document

See also: iText.StyledXmlParser.Jsoup.Nodes.Document.Body()

◆ ParseBodyFragment() [2/2]

static Document iText.StyledXmlParser.Jsoup.Jsoup.ParseBodyFragment	(	String	bodyHtml,
		String	baseUri
	)

inlinestatic

Parse a fragment of HTML, with the assumption that it forms the body of the HTML.

Parameters

bodyHtml	body HTML fragment
baseUri	URL to resolve relative URLs against.

Returns: sane HTML document

See also: iText.StyledXmlParser.Jsoup.Nodes.Document.Body()

◆ ParseXML() [1/4]

static Document iText.StyledXmlParser.Jsoup.Jsoup.ParseXML	(	Stream @	in,
		String	charsetName
	)

inlinestatic

Parse XML into a Document.

Parse XML into a Document. The parser will make a sensible, balanced document tree out of any HTML.

Parameters

in	input stream to read. Make sure to close it after parsing.
charsetName	(optional) character set of file contents. Set to `null` to determine from `http-equiv` meta tag, if present, or fall back to `UTF-8` (which is often safe to do).

Exceptions

System.IO.IOException if the file could not be found, or read, or if the charsetName is invalid.

Returns: sane XML

◆ ParseXML() [2/4]

static Document iText.StyledXmlParser.Jsoup.Jsoup.ParseXML	(	Stream @	in,
		String	charsetName,
		String	baseUri
	)

inlinestatic

Parse XML into a Document.

Parse XML into a Document. The parser will make a sensible, balanced document tree out of any HTML.

Parameters

in	input stream to read. Make sure to close it after parsing.
charsetName	(optional) character set of file contents. Set to `null` to determine from `http-equiv` meta tag, if present, or fall back to `UTF-8` (which is often safe to do).
baseUri	The URL where the HTML was retrieved from, to resolve relative links against.

Exceptions

System.IO.IOException if the file could not be found, or read, or if the charsetName is invalid.

Returns: sane XML

◆ ParseXML() [3/4]

static Document iText.StyledXmlParser.Jsoup.Jsoup.ParseXML ( String xml )

inlinestatic

Parse XML into a Document.

Parse XML into a Document. The parser will make a sensible, balanced document tree out of any HTML.

Parameters

xml	XML to parse

Returns: sane XML

◆ ParseXML() [4/4]

static Document iText.StyledXmlParser.Jsoup.Jsoup.ParseXML	(	String	xml,
		String	baseUri
	)

inlinestatic

Parse XML into a Document.

Parse XML into a Document. The parser will make a sensible, balanced document tree out of any HTML.

Parameters

xml	XML to parse
baseUri	The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a tag.

Returns: sane XML

Static Public Member Functions

Detailed Description

Member Function Documentation

◆ Clean() [1/3]

◆ Clean() [2/3]

◆ Clean() [3/3]

◆ IsValid()

◆ Parse() [1/7]

◆ Parse() [2/7]

◆ Parse() [3/7]

◆ Parse() [4/7]

◆ Parse() [5/7]

◆ Parse() [6/7]

◆ Parse() [7/7]

◆ ParseBodyFragment() [1/2]

◆ ParseBodyFragment() [2/2]

◆ ParseXML() [1/4]

◆ ParseXML() [2/4]

◆ ParseXML() [3/4]

◆ ParseXML() [4/4]