public class Jsoup extends Object
Modifier and Type | Method and Description |
---|---|
static String |
clean(String bodyHtml, String baseUri, Whitelist whitelist)
Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.
|
static String |
clean(String bodyHtml, String baseUri, Whitelist whitelist, Document.OutputSettings outputSettings)
Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.
|
static String |
clean(String bodyHtml, Whitelist whitelist)
Get safe HTML from untrusted input HTML, by parsing input HTML and filtering it through a white-list of permitted tags and attributes.
|
static boolean |
isValid(String bodyHtml, Whitelist whitelist)
Test if the input HTML has only tags and attributes allowed by the Whitelist.
|
static Document |
parse(File in, String charsetName)
Parse the contents of a file as HTML.
|
static Document |
parse(File in, String charsetName, String baseUri)
Parse the contents of a file as HTML.
|
static Document |
parse(InputStream in, String charsetName, String baseUri)
Read an input stream, and parse it to a Document.
|
static Document |
parse(InputStream in, String charsetName, String baseUri, Parser parser)
Read an input stream, and parse it to a Document.
|
static Document |
parse(String html)
Parse HTML into a Document.
|
static Document |
parse(String html, String baseUri)
Parse HTML into a Document.
|
static Document |
parse(String html, String baseUri, Parser parser)
Parse HTML into a Document, using the provided Parser.
|
static Document |
parseBodyFragment(String bodyHtml)
Parse a fragment of HTML, with the assumption that it forms the body of the HTML.
|
static Document |
parseBodyFragment(String bodyHtml, String baseUri)
Parse a fragment of HTML, with the assumption that it forms the body of the HTML.
|
public static Document parse(String html, String baseUri)
html
- HTML to parse
baseUri
- The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a
tag.
public static Document parse(String html, String baseUri, Parser parser)
html
- HTML to parse
baseUri
- The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a
tag.
parser
- alternate parser
to use.
public static Document parse(String html)
tag.
html
- HTML to parse
parse(String, String)
public static Document parse(File in, String charsetName, String baseUri) throws IOException
in
- file to load HTML from
charsetName
- (optional) character set of file contents. Set to null
to determine from http-equiv
meta tag, if present, or fall back to UTF-8
(which is often safe to do).
baseUri
- The URL where the HTML was retrieved from, to resolve relative links against.
IOException
- if the file could not be found, or read, or if the charsetName is invalid.
public static Document parse(File in, String charsetName) throws IOException
in
- file to load HTML from
charsetName
- (optional) character set of file contents. Set to null
to determine from http-equiv
meta tag, if present, or fall back to UTF-8
(which is often safe to do).
IOException
- if the file could not be found, or read, or if the charsetName is invalid.
parse(File, String, String)
public static Document parse(InputStream in, String charsetName, String baseUri) throws IOException
in
- input stream to read. Make sure to close it after parsing.
charsetName
- (optional) character set of file contents. Set to null
to determine from http-equiv
meta tag, if present, or fall back to UTF-8
(which is often safe to do).
baseUri
- The URL where the HTML was retrieved from, to resolve relative links against.
IOException
- if the file could not be found, or read, or if the charsetName is invalid.
public static Document parse(InputStream in, String charsetName, String baseUri, Parser parser) throws IOException
in
- input stream to read. Make sure to close it after parsing.
charsetName
- (optional) character set of file contents. Set to null
to determine from http-equiv
meta tag, if present, or fall back to UTF-8
(which is often safe to do).
baseUri
- The URL where the HTML was retrieved from, to resolve relative links against.
parser
- alternate parser
to use.
IOException
- if the file could not be found, or read, or if the charsetName is invalid.
public static Document parseBodyFragment(String bodyHtml, String baseUri)
body
of the HTML.
bodyHtml
- body HTML fragment
baseUri
- URL to resolve relative URLs against.
Document.body()
public static Document parseBodyFragment(String bodyHtml)
body
of the HTML.
bodyHtml
- body HTML fragment
Document.body()
public static String clean(String bodyHtml, String baseUri, Whitelist whitelist)
bodyHtml
- input untrusted HTML (body fragment)
baseUri
- URL to resolve relative URLs against
whitelist
- white-list of permitted HTML elements
Cleaner.clean(Document)
public static String clean(String bodyHtml, Whitelist whitelist)
bodyHtml
- input untrusted HTML (body fragment)
whitelist
- white-list of permitted HTML elements
Cleaner.clean(Document)
public static String clean(String bodyHtml, String baseUri, Whitelist whitelist, Document.OutputSettings outputSettings)
bodyHtml
- input untrusted HTML (body fragment)
baseUri
- URL to resolve relative URLs against
whitelist
- white-list of permitted HTML elements
outputSettings
- document output settings; use to control pretty-printing and entity escape modes
Cleaner.clean(Document)
public static boolean isValid(String bodyHtml, Whitelist whitelist)
bodyHtml
- HTML to test
whitelist
- whitelist to test against
clean(String, Whitelist)
Copyright © 1998–2019 iText Group NV. All rights reserved.