iText 7 7.1.8 API
iText.StyledXmlParser.Jsoup.Jsoup Class Reference

The core public access point to the jsoup functionality. More...

Static Public Member Functions

static Document  Parse (String html, String baseUri)
  Parse HTML into a Document. More...
 
static Document  Parse (String html, String baseUri, iText.StyledXmlParser.Jsoup.Parser.Parser parser)
  Parse HTML into a Document, using the provided Parser. More...
 
static Document  Parse (String html)
  Parse HTML into a Document. More...
 
static Document  ParseXML (String xml, String baseUri)
  Parse XML into a Document. More...
 
static Document  ParseXML (String xml)
  Parse XML into a Document. More...
 
static Document  ParseXML (Stream @in, String charsetName, String baseUri)
  Parse XML into a Document. More...
 
static Document  ParseXML (Stream @in, String charsetName)
  Parse XML into a Document. More...
 
static Document  Parse (FileInfo @in, String charsetName, String baseUri)
  Parse the contents of a file as HTML. More...
 
static Document  Parse (FileInfo @in, String charsetName)
  Parse the contents of a file as HTML. More...
 
static Document  Parse (Stream @in, String charsetName, String baseUri)
  Read an input stream, and parse it to a Document. More...
 
static Document  Parse (Stream @in, String charsetName, String baseUri, iText.StyledXmlParser.Jsoup.Parser.Parser parser)
  Read an input stream, and parse it to a Document. More...
 
static Document  ParseBodyFragment (String bodyHtml, String baseUri)
  Parse a fragment of HTML, with the assumption that it forms the body of the HTML. More...
 
static Document  ParseBodyFragment (String bodyHtml)
  Parse a fragment of HTML, with the assumption that it forms the body of the HTML. More...
 
static String  Clean (String bodyHtml, String baseUri, Whitelist whitelist)
  Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes. More...
 
static String  Clean (String bodyHtml, Whitelist whitelist)
  Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes. More...
 
static String  Clean (String bodyHtml, String baseUri, Whitelist whitelist, OutputSettings outputSettings)
  Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes. More...
 
static bool  IsValid (String bodyHtml, Whitelist whitelist)
  Test if the input HTML has only tags and attributes allowed by the Whitelist. More...
 

Detailed Description

The core public access point to the jsoup functionality.

Jonathan Hedley

Member Function Documentation

◆ Clean() [1/3]

static String iText.StyledXmlParser.Jsoup.Jsoup.Clean ( String  bodyHtml,
String  baseUri,
Whitelist  whitelist 
)
inlinestatic

Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.

Parameters
bodyHtml input untrusted HTML (body fragment)
baseUri URL to resolve relative URLs against
whitelist white-list of permitted HTML elements
Returns
safe HTML (body fragment)
See also
iText.StyledXmlParser.Jsoup.Safety.Cleaner.Clean(iText.StyledXmlParser.Jsoup.Nodes.Document)

◆ Clean() [2/3]

static String iText.StyledXmlParser.Jsoup.Jsoup.Clean ( String  bodyHtml,
String  baseUri,
Whitelist  whitelist,
OutputSettings  outputSettings 
)
inlinestatic

Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.

Parameters
bodyHtml input untrusted HTML (body fragment)
baseUri URL to resolve relative URLs against
whitelist white-list of permitted HTML elements
outputSettings document output settings; use to control pretty-printing and entity escape modes
Returns
safe HTML (body fragment)
See also
iText.StyledXmlParser.Jsoup.Safety.Cleaner.Clean(iText.StyledXmlParser.Jsoup.Nodes.Document)

◆ Clean() [3/3]

static String iText.StyledXmlParser.Jsoup.Jsoup.Clean ( String  bodyHtml,
Whitelist  whitelist 
)
inlinestatic

Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.

Parameters
bodyHtml input untrusted HTML (body fragment)
whitelist white-list of permitted HTML elements
Returns
safe HTML (body fragment)
See also
iText.StyledXmlParser.Jsoup.Safety.Cleaner.Clean(iText.StyledXmlParser.Jsoup.Nodes.Document)

◆ IsValid()

static bool iText.StyledXmlParser.Jsoup.Jsoup.IsValid ( String  bodyHtml,
Whitelist  whitelist 
)
inlinestatic

Test if the input HTML has only tags and attributes allowed by the Whitelist.

Test if the input HTML has only tags and attributes allowed by the Whitelist. Useful for form validation. The input HTML should still be run through the cleaner to set up enforced attributes, and to tidy the output.

Parameters
bodyHtml HTML to test
whitelist whitelist to test against
Returns
true if no tags or attributes were removed; false otherwise
See also
Clean(System.String, iText.StyledXmlParser.Jsoup.Safety.Whitelist)

◆ Parse() [1/7]

static Document iText.StyledXmlParser.Jsoup.Jsoup.Parse ( FileInfo @  in,
String  charsetName 
)
inlinestatic

Parse the contents of a file as HTML.

Parse the contents of a file as HTML. The location of the file is used as the base URI to qualify relative URLs.

Parameters
in file to load HTML from
charsetName (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do).
Returns
sane HTML
Exceptions
System.IO.IOException if the file could not be found, or read, or if the charsetName is invalid.
See also
Parse(System.IO.FileInfo, System.String, System.String)

◆ Parse() [2/7]

static Document iText.StyledXmlParser.Jsoup.Jsoup.Parse ( FileInfo @  in,
String  charsetName,
String  baseUri 
)
inlinestatic

Parse the contents of a file as HTML.

Parameters
in file to load HTML from
charsetName (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do).
baseUri The URL where the HTML was retrieved from, to resolve relative links against.
Returns
sane HTML
Exceptions
System.IO.IOException if the file could not be found, or read, or if the charsetName is invalid.

◆ Parse() [3/7]

static Document iText.StyledXmlParser.Jsoup.Jsoup.Parse ( Stream @  in,
String  charsetName,
String  baseUri 
)
inlinestatic

Read an input stream, and parse it to a Document.

Parameters
in input stream to read. Make sure to close it after parsing.
charsetName (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do).
baseUri The URL where the HTML was retrieved from, to resolve relative links against.
Returns
sane HTML
Exceptions
System.IO.IOException if the file could not be found, or read, or if the charsetName is invalid.

◆ Parse() [4/7]

static Document iText.StyledXmlParser.Jsoup.Jsoup.Parse ( Stream @  in,
String  charsetName,
String  baseUri,
iText.StyledXmlParser.Jsoup.Parser.Parser  parser 
)
inlinestatic

Read an input stream, and parse it to a Document.

Read an input stream, and parse it to a Document. You can provide an alternate parser, such as a simple XML (non-HTML) parser.

Parameters
in input stream to read. Make sure to close it after parsing.
charsetName (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do).
baseUri The URL where the HTML was retrieved from, to resolve relative links against.
parser alternate parser to use.
Returns
sane HTML
Exceptions
System.IO.IOException if the file could not be found, or read, or if the charsetName is invalid.

◆ Parse() [5/7]

static Document iText.StyledXmlParser.Jsoup.Jsoup.Parse ( String  html )
inlinestatic

Parse HTML into a Document.

Parse HTML into a Document. As no base URI is specified, absolute URL detection relies on the HTML including a tag.

Parameters
html HTML to parse
Returns
sane HTML
See also
Parse(System.String, System.String)

◆ Parse() [6/7]

static Document iText.StyledXmlParser.Jsoup.Jsoup.Parse ( String  html,
String  baseUri 
)
inlinestatic

Parse HTML into a Document.

Parse HTML into a Document. The parser will make a sensible, balanced document tree out of any HTML.

Parameters
html HTML to parse
baseUri The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a tag.
Returns
sane HTML

◆ Parse() [7/7]

static Document iText.StyledXmlParser.Jsoup.Jsoup.Parse ( String  html,
String  baseUri,
iText.StyledXmlParser.Jsoup.Parser.Parser  parser 
)
inlinestatic

Parse HTML into a Document, using the provided Parser.

Parse HTML into a Document, using the provided Parser. You can provide an alternate parser, such as a simple XML (non-HTML) parser.

Parameters
html HTML to parse
baseUri The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a tag.
parser alternate parser to use.
Returns
sane HTML

◆ ParseBodyFragment() [1/2]

static Document iText.StyledXmlParser.Jsoup.Jsoup.ParseBodyFragment ( String  bodyHtml )
inlinestatic

Parse a fragment of HTML, with the assumption that it forms the body of the HTML.

Parameters
bodyHtml body HTML fragment
Returns
sane HTML document
See also
iText.StyledXmlParser.Jsoup.Nodes.Document.Body()

◆ ParseBodyFragment() [2/2]

static Document iText.StyledXmlParser.Jsoup.Jsoup.ParseBodyFragment ( String  bodyHtml,
String  baseUri 
)
inlinestatic

Parse a fragment of HTML, with the assumption that it forms the body of the HTML.

Parameters
bodyHtml body HTML fragment
baseUri URL to resolve relative URLs against.
Returns
sane HTML document
See also
iText.StyledXmlParser.Jsoup.Nodes.Document.Body()

◆ ParseXML() [1/4]

static Document iText.StyledXmlParser.Jsoup.Jsoup.ParseXML ( Stream @  in,
String  charsetName 
)
inlinestatic

Parse XML into a Document.

Parse XML into a Document. The parser will make a sensible, balanced document tree out of any HTML.

Parameters
in input stream to read. Make sure to close it after parsing.
charsetName (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do).
Exceptions
System.IO.IOException if the file could not be found, or read, or if the charsetName is invalid.
Returns
sane XML

◆ ParseXML() [2/4]

static Document iText.StyledXmlParser.Jsoup.Jsoup.ParseXML ( Stream @  in,
String  charsetName,
String  baseUri 
)
inlinestatic

Parse XML into a Document.

Parse XML into a Document. The parser will make a sensible, balanced document tree out of any HTML.

Parameters
in input stream to read. Make sure to close it after parsing.
charsetName (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do).
baseUri The URL where the HTML was retrieved from, to resolve relative links against.
Exceptions
System.IO.IOException if the file could not be found, or read, or if the charsetName is invalid.
Returns
sane XML

◆ ParseXML() [3/4]

static Document iText.StyledXmlParser.Jsoup.Jsoup.ParseXML ( String  xml )
inlinestatic

Parse XML into a Document.

Parse XML into a Document. The parser will make a sensible, balanced document tree out of any HTML.

Parameters
xml XML to parse
Returns
sane XML

◆ ParseXML() [4/4]

static Document iText.StyledXmlParser.Jsoup.Jsoup.ParseXML ( String  xml,
String  baseUri 
)
inlinestatic

Parse XML into a Document.

Parse XML into a Document. The parser will make a sensible, balanced document tree out of any HTML.

Parameters
xml XML to parse
baseUri The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a tag.
Returns
sane XML