|
iText 7 7.1.8 API
|
The core public access point to the jsoup functionality. More...
Static Public Member Functions |
|
| static Document | Parse (String html, String baseUri) |
| Parse HTML into a Document. More... |
|
| static Document | Parse (String html, String baseUri, iText.StyledXmlParser.Jsoup.Parser.Parser parser) |
| Parse HTML into a Document, using the provided Parser. More... |
|
| static Document | Parse (String html) |
| Parse HTML into a Document. More... |
|
| static Document | ParseXML (String xml, String baseUri) |
| Parse XML into a Document. More... |
|
| static Document | ParseXML (String xml) |
| Parse XML into a Document. More... |
|
| static Document | ParseXML (Stream @in, String charsetName, String baseUri) |
| Parse XML into a Document. More... |
|
| static Document | ParseXML (Stream @in, String charsetName) |
| Parse XML into a Document. More... |
|
| static Document | Parse (FileInfo @in, String charsetName, String baseUri) |
| Parse the contents of a file as HTML. More... |
|
| static Document | Parse (FileInfo @in, String charsetName) |
| Parse the contents of a file as HTML. More... |
|
| static Document | Parse (Stream @in, String charsetName, String baseUri) |
| Read an input stream, and parse it to a Document. More... |
|
| static Document | Parse (Stream @in, String charsetName, String baseUri, iText.StyledXmlParser.Jsoup.Parser.Parser parser) |
| Read an input stream, and parse it to a Document. More... |
|
| static Document | ParseBodyFragment (String bodyHtml, String baseUri) |
Parse a fragment of HTML, with the assumption that it forms the body of the HTML. More... |
|
| static Document | ParseBodyFragment (String bodyHtml) |
Parse a fragment of HTML, with the assumption that it forms the body of the HTML. More... |
|
| static String | Clean (String bodyHtml, String baseUri, Whitelist whitelist) |
| Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes. More... |
|
| static String | Clean (String bodyHtml, Whitelist whitelist) |
| Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes. More... |
|
| static String | Clean (String bodyHtml, String baseUri, Whitelist whitelist, OutputSettings outputSettings) |
| Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes. More... |
|
| static bool | IsValid (String bodyHtml, Whitelist whitelist) |
| Test if the input HTML has only tags and attributes allowed by the Whitelist. More... |
|
The core public access point to the jsoup functionality.
|
inlinestatic |
Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.
| bodyHtml | input untrusted HTML (body fragment) |
| baseUri | URL to resolve relative URLs against |
| whitelist | white-list of permitted HTML elements |
|
inlinestatic |
Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.
| bodyHtml | input untrusted HTML (body fragment) |
| baseUri | URL to resolve relative URLs against |
| whitelist | white-list of permitted HTML elements |
| outputSettings | document output settings; use to control pretty-printing and entity escape modes |
|
inlinestatic |
Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.
| bodyHtml | input untrusted HTML (body fragment) |
| whitelist | white-list of permitted HTML elements |
|
inlinestatic |
Test if the input HTML has only tags and attributes allowed by the Whitelist.
Test if the input HTML has only tags and attributes allowed by the Whitelist. Useful for form validation. The input HTML should still be run through the cleaner to set up enforced attributes, and to tidy the output.
| bodyHtml | HTML to test |
| whitelist | whitelist to test against |
|
inlinestatic |
Parse the contents of a file as HTML.
Parse the contents of a file as HTML. The location of the file is used as the base URI to qualify relative URLs.
| in | file to load HTML from |
| charsetName | (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do). |
| System.IO.IOException | if the file could not be found, or read, or if the charsetName is invalid. |
|
inlinestatic |
Parse the contents of a file as HTML.
| in | file to load HTML from |
| charsetName | (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do). |
| baseUri | The URL where the HTML was retrieved from, to resolve relative links against. |
| System.IO.IOException | if the file could not be found, or read, or if the charsetName is invalid. |
|
inlinestatic |
Read an input stream, and parse it to a Document.
| in | input stream to read. Make sure to close it after parsing. |
| charsetName | (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do). |
| baseUri | The URL where the HTML was retrieved from, to resolve relative links against. |
| System.IO.IOException | if the file could not be found, or read, or if the charsetName is invalid. |
|
inlinestatic |
Read an input stream, and parse it to a Document.
Read an input stream, and parse it to a Document. You can provide an alternate parser, such as a simple XML (non-HTML) parser.
| in | input stream to read. Make sure to close it after parsing. |
| charsetName | (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do). |
| baseUri | The URL where the HTML was retrieved from, to resolve relative links against. |
| parser | alternate parser to use. |
| System.IO.IOException | if the file could not be found, or read, or if the charsetName is invalid. |
|
inlinestatic |
Parse HTML into a Document.
Parse HTML into a Document. As no base URI is specified, absolute URL detection relies on the HTML including a tag.
| html | HTML to parse |
|
inlinestatic |
Parse HTML into a Document.
Parse HTML into a Document. The parser will make a sensible, balanced document tree out of any HTML.
| html | HTML to parse |
| baseUri | The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a tag. |
|
inlinestatic |
Parse HTML into a Document, using the provided Parser.
Parse HTML into a Document, using the provided Parser. You can provide an alternate parser, such as a simple XML (non-HTML) parser.
| html | HTML to parse |
| baseUri | The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a tag. |
| parser | alternate parser to use. |
|
inlinestatic |
Parse a fragment of HTML, with the assumption that it forms the body of the HTML.
| bodyHtml | body HTML fragment |
|
inlinestatic |
Parse a fragment of HTML, with the assumption that it forms the body of the HTML.
| bodyHtml | body HTML fragment |
| baseUri | URL to resolve relative URLs against. |
|
inlinestatic |
Parse XML into a Document.
Parse XML into a Document. The parser will make a sensible, balanced document tree out of any HTML.
| in | input stream to read. Make sure to close it after parsing. |
| charsetName | (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do). |
| System.IO.IOException | if the file could not be found, or read, or if the charsetName is invalid. |
|
inlinestatic |
Parse XML into a Document.
Parse XML into a Document. The parser will make a sensible, balanced document tree out of any HTML.
| in | input stream to read. Make sure to close it after parsing. |
| charsetName | (optional) character set of file contents. Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 (which is often safe to do). |
| baseUri | The URL where the HTML was retrieved from, to resolve relative links against. |
| System.IO.IOException | if the file could not be found, or read, or if the charsetName is invalid. |
|
inlinestatic |
Parse XML into a Document.
Parse XML into a Document. The parser will make a sensible, balanced document tree out of any HTML.
| xml | XML to parse |
|
inlinestatic |
Parse XML into a Document.
Parse XML into a Document. The parser will make a sensible, balanced document tree out of any HTML.
| xml | XML to parse |
| baseUri | The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a tag. |