Use CSS selectors to find elements

Problem

You want to find or manipulate elements using CSS selectors.

Solution

Use the Element.select(String cssSelector) and Elements.select(String selector) methods:

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "https://example.com/");

Elements links = doc.select("a[href]"); // a with href
Elements pngs = doc.select("img[src$=.png]");
  // img with src ending .png

Element masthead = doc.select("div.masthead").first();
  // div with class=masthead

Elements resultDivs = doc.select("h3.r > div");
  // direct div after h3
Elements resultAs   = resultDivs.select("a");
  // A elements within resultDivs

Description

jsoup elements support a CSS selector syntax to find matching elements, that allows very powerful and robust queries.

The select method is available in a Document, Element, or in Elements. It is contextual, so you can filter by selecting from a specific element, or by chaining select calls.

Select returns a list of Elements (as Elements), which provides a range of methods to extract and manipulate the results.

See the Selector API reference for the full supported list and more details.

You can experiment with different CSS selectors on Try jsoup.

jsoup's Element.select() method functions similarly to the Javascript DOM method querySelectorAll(), and Element.selectFirst() is equivalent to querySelector().

Selector overview

tagname: find elements by tag, e.g. div
#id: find elements by ID, e.g. #logo
.class: find elements by class name, e.g. .masthead
[attribute]: elements with attribute, e.g. [href]
[^attrPrefix]: elements with an attribute name prefix, e.g. [^data-] finds elements with HTML5 dataset attributes
[attr=value]: elements with attribute value, e.g. [width=500] (also quotable, like [data-name='launch sequence'])
[attr^=value], [attr$=value], [attr*=value]: elements with attributes that start with, end with, or contain the value, e.g. [href*=/path/]
[attr~=regex]: elements with attribute values that match the regular expression; e.g. img[src~=(?i)\.(png|jpe?g)]
*: all elements, e.g. *
ns|tag: find elements by tag in a namespace prefix, e.g. fb|name finds <fb:name> elements
*|tag: final elements by tag in any namespace prefix, e.g. *|name finds <fb:name> and <name> elements

Selector combinations

el#id: elements with ID, e.g. div#logo
el.class: elements with class, e.g. div.masthead
el[attr]: elements with attribute, e.g. a[href]
Any combination, e.g. a[href].highlight
ancestor child: child elements that descend from ancestor, e.g. .body p finds p elements anywhere under a block with class "body"
parent > child: child elements that descend directly from parent, e.g. div.content > p finds p elements; and body > * finds the direct children of the body tag
siblingA siblingB: finds sibling B element immediately preceded by sibling A, e.g. div.head div
siblingA ~ siblingX: finds sibling X element preceded by sibling A, e.g. h1 ~ p
el, el, el: group multiple selectors, find unique elements that match any of the selectors; e.g. div.masthead, div.logo

Pseudo selectors

:has(selector): find elements that contain elements matching the selector; e.g. div:has(p)
:is(selector): find elements that match any of the selectors in the selector list; e.g. :is(h1, h2, h3, h4, h5, h6) finds any heading element
:not(selector): find elements that do not match the selector; e.g. div:not(.logo)
:contains(text): find elements that contain the given text. The search is case-insensitive; e.g. p:contains(jsoup)
:containsOwn(text): find elements that directly contain the given text
:matches(regex): find elements whose text matches the specified regular expression; e.g. div:matches((?i)login)
:matchesOwn(regex): find elements whose own text matches the specified regular expression
:lt(n): find elements whose sibling index (i.e. its position in the DOM tree relative to its parent) is less than n; e.g. td:lt(3)
:gt(n): find elements whose sibling index is greater than n; e.g. div p:gt(2)
:eq(n): find elements whose sibling index is equal to n; e.g. form input:eq(1)
Note that the above indexed pseudo-selectors are 0-based, that is, the first element is at index 0, the second at 1, etc

Cookbook

Introduction

Parsing and traversing a Document

Input

Extracting data

Use DOM methods to navigate a document
Use CSS selectors to find elements
Use XPath selectors to find elements and nodes
Extract attributes, text, and HTML from elements
Working with relative and absolute URLs
Example program: list links

jsoup