Low-level Nim wrapper for Chrome DevTools Protocol (CDP) v1.3 stable.
Bend Chrome to your will with complete control over your browser. Scrape dynamic webpages, create browser automations, and beyond. Wield responsibly ;)
Chrome v112 or higher recommended. It's better, as the new
headless
flag uses the same browser binary rather than a completely seperate implemenation. It's likely your version of Chrome already has a more recent version.
This library is cross-platform (Windows, Mac, Linux) and supports both the C and C backends.
cdp
is intended to be a low-level wrapper of CDP for use in a webcrawler and a few browser automation projects for my company SEO Science. We only target v1.3 stable version of CDP as we want to reduce the
amount of maintenance overhead, and ensure reliability of the library in
production use.
We may include some of the experimental Domains, methods, and events in the future. PRs welcome.
If you would like to use an experimental feature (like the Animation Domain), you certainly can via the sendCommand
procedure. Details below.
A Chrome browser. Pretty cool right? No webdriver binary. No other dependencies outside of what you probably already have on your system.
In the future, we do plan to add support for Chromium and Edge.
Install from nimble: nimble install cdp
Alternatively clone via git: git clone https://github.com/Niminem/ChromeDevToolsProtocol
Check out the tests directory for more examples.
import std/[json, asyncdispatch]
import cdp
proc main() {.async.} =
let
browser = await launchBrowser() # launch a new browser
tab = await browser.newTab() # open a new tab
await tab.enablePageDomain() # enable the Page domain (for monitoring page events)
discard await tab.navigate("https://github.com/Niminem") # navigate to a page
discard await browser.waitForSessionEvent(tab.sessionId, $Page.domContentEventFired) # wait for page to load
let
resp = await tab.evaluate("document.title", %*{"returnByValue": true}) # evaluate a script on the page
title = resp["result"]["result"]["value"].to(string) # get the title of the page
echo "Title of the page: ", title # Title of the page: Niminem (Leon Lysak) · GitHub
await tab.disablePageDomain() # disable the Page domain
await browser.close() # close the browser / cdp, websocket, delete userdatadir, terminate browser process
waitFor main()
I highly recommend reading Aslushnikov's README on using Chrome DevTools Protocol as a quick primer. I'll try my best to explain the concepts and how they relate to this API.
The Chrome DevTools Protocol allows for tools to instrument, inspect, debug and profile Chromium, Chrome and other Blink-based browsers (like Microsoft Edge).
Even Chrome DevTools uses this protocol and the team maintains its API.
When Chrome is started with a --remote-debugging-port=<number>
flag, it starts a Chrome DevTools Protocol server and creates a WebSocket URL. Clients can create a WebSocket to connect to the URL and start sending CDP commands.
Chrome DevTools protocol is mostly based on JSONRPC- each comand is a JSON object with an id
, a method
, and an optional params
(JSON object).
A few things to keep in mind:
- Every command that is sent over to CDP must have a unique
id
parameter. Message responses will be delivered over websocket and will have the sameid
. - Incoming WebSocket messages without an
id
parameter are protocol events. - Message order is important in CDP. For example, protocol events that occur as the result of a CDP command will be reported before the response to that CDP command.
- There's a top-level "browser" target that always exists. More on this in the next section.
This library provides a level of abstraction by wrapping the various CDP commands (methods) in v1.3 stable:
# Navigates current page (tab) to the given URL.
proc navigate(tab: Tab; url: string; params: JsonNode): Future[JsonNode]
proc navigate(tab: Tab; url: string): Future[JsonNode]
# reference: https://chromedevtools.github.io/devtools-protocol/1-3/Page/#method-navigate
cdp
also abstracts away other things, like the management of the unique id
parameter for commands.
Chrome DevTools protocol has APIs to interact with many different parts of the browser - such as pages (this library refers to them as tabs), serviceworkers and extensions. These parts are called Targets and can be fetched/tracked using the Target domain.
When a client wants to interact with a target using CDP, it has to first attach to the target using Target.attachToTarget
command. The command will establish a protocol session to the given target and return a sessionId
.
In order to submit a CDP command to the target, every message should also include a sessionId
parameter next to the usual JSONRPC’s id
.
This library provides a level of abstraction for a page (tab) target:
...
let tab = await browser.newTab() # opens a new page (tab)
discard await tab.navigate("https://github.com/Niminem") # navigate to a page
...
With browser.newTab()
, the procedure will create a target (page), attach to it, and the sessionId (a field of
the returned Tab
object) will be used in all future commands from the Tab
.
Some commands set state which is stored per session (ex: Runtime.enable
and Targets.setDiscoverTargets
). Each session is initialized with a set of domains, the exact set depends on the attached target. For example, sessions connected to a browser don't have a "Page" domain, but sessions connected to pages do.
We call sessions attached to a Browser target browser sessions. Similarly, there are page sessions, worker sessions and so on. In fact, the WebSocket connection is an implicitly created browser session.
Although this library currently provides an abstraction over a Page target (tab) and the implicit Browser target, you can create other targets with the generic sendCommand
procedure. More on this below.
When a client connects over the WebSocket to the launched Chrome browser, a root browser session is created. This session is the one that receives commands if there's no sessionId
specified. Essentially, this refers to the Browser
object returned from the launchBrowser
procedure. Later on, when the root browser session is used to attach to a page target, a new page session created (the Tab
object).
The page session is created from inside the browser session and thus is a child of the browser session. When a parent session closes (ex: Target.detachFromTarget
), all of its child sessions are closed as well.
As of cdp
0.1.0, the browser.close()
procedure will close the browser session (thus closing all of the child sessions).
More granular closing procedures must be implemented on your end for now.
The Chrome DevTools Protocol has stable and experimental parts. Events, methods, and sometimes whole domains might be marked as experimental. DevTools team doesn't commit to maintaining experimental APIs and changes/removes them regularly.
!!! USE EXPERIMENTAL APIS AT YOUR OWN RISK !!!
As history has shown, experimental APIs do change quite often. If possible, stick to the stable protocol (the wrapped commands and events from this library).
Check out the full API documentation here.
proc launchBrowser*(userDataDir = "";
portNo = 0; headlessMode = HeadlessMode.On;
chromeArguments: seq[string] = @[]): Future[Browser] {.async}
Using the cdp
library begins with Browser
instance.
Use this Browser instance (browser Target / browser session) to interact with the appropriate CDP domains/methods.
launchBrowser
Launches a new Chrome browser instance and returns a Browser
object.
userDataDir
parameter can be used to specify a directory where the browser's user data will be stored. If an empty string is passed, a temporary directory will be created and used.portNo
parameter can be used to specify a port number for the browser to listen on. IfportNo
is 0, Chrome will choose a random port.headlessMode
parameter can be used to specify whether the browser should be launched in headless mode or not.HeadlessMode.On
(the default) will launch the new version of Chrome headless mode (for Chrome >= v112). UseHeadlessMode.Legacy
to launch the browser in headless mode for Chrome < v112. The new headless mode is recommended as the old version of headless mode will be deprecated, and the new version is the actual browser rather than a separate browser implementation.chromeArguments
parameter can be used to pass additional arguments to the Chrome browser instance. For a list of all available arguments, see: https://peter.sh/experiments/chromium-command-line-switches/ or https://github.com/puppeteer/puppeteer/blob/main/packages/puppeteer-core/src/node/ChromeLauncher.ts for a list of arguments used by Puppeteer (a high-level API for CDP using NodeJs).
The following command-line arguments are always passed to Chrome:
--remote-debugging-port=<portNo>
--user-data-dir=<userDataDir>
--no-first-run
--headless=new
or--headless
(ifheadlessMode
isHeadlessMode.On
(default) orHeadlessMode.Legacy
). IfHeadlessMode.Off
is passed, Chrome will open a visible window.
IMPORANT: Make sure you call browser.close()
before quitting or ending your program or else you will have a zombie process.
proc newTab*(browser: Browser): Future[Tab] {.async.}
newTab
procedure creates a new tab (Page) with the browser instance and returns a Tab
object.
Use this Tab
object to interact with a page session. You can monitor and intercept network events, execute javascript, and so much more via enabling the appropriate domains. Page sessions will be used the most.
# Pulled from the 'Basic Usage' example above
...
await tab.enablePageDomain() # enable the Page domain (for monitoring page events)
discard await tab.navigate("https://github.com/Niminem")
discard await browser.waitForSessionEvent(tab.sessionId, $Page.domContentEventFired) # wait for page to load
...
# It's good practice to disable the domain when you are done
await tab.disablePageDomain()
...
Enabling the Page Domain allows you monitor page events, such as the domContentEventFired
event. With this access you
can, for example, scape various parts of a web page after the DOM is ready. Some Domains may need to be enabled in order to use them like
this one. In the API documentation, I've provided direct references to each
domain and each domain's methods/events.
Notice the tab.navigate
call. This is a wrapped method for the Tab object, directly corresponding to the navigate CDP
method. ALL CDP methods and events for browser and page targets have been wrapped for v1.3 stable.
type
SessionId* = string
ProtocolEvent* = string
EventCallback* = proc(data: JsonNode) {.async.}
# Adds a callback function to the global event table for the specified event
proc addGlobalEventCallback*(browser: Browser; event: ProtocolEvent; cb: EventCallback)
# Adds a callback function to the session event table for the specified event.
proc addSessionEventCallback*(browser: Browser; sessionId: SessionId;
event: ProtocolEvent; cb: EventCallback)
# Returns a `Future` that completes when the specified global event is received.
proc waitForGlobalEvent*(browser: Browser; event: ProtocolEvent): Future[JsonNode] {.async.}
# Returns a `Future` that completes when the specified session event is received.
proc waitForSessionEvent*(browser: Browser; sessionId: string;
event: ProtocolEvent): Future[JsonNode] {.async.}
As of cdp
0.1.0, there are two kinds of callbacks- one for handling Global events (those without an id
parameter)
and the other for Session events (those corresponding to a session, like a Tab/Page, which have a sessionId
parameter).
There are two ways you can use them.
Either you register a callback procedure that executes each time the CDP event occurs via addGlobalEventCallback
or addSessionEventCallback
, or you can use the waitFor
variant. waitFor
should be used to suspend execution until the event occurs, like the example above where we waited for Page.domContentEventFired
before interacting with the tab.
NOTE: Currently, there can only be one callback per Global event, and only one callback per event for each session for Session events. If you try adding another callback for the same global/session event, the current callback will be overwritten.
deleteGlobalEventCallback
and deleteSessionEventCallback
can be called to delete the event callbacks.
import std/[json, asyncdispatch]
import cdp
proc logGlobalEvent(event: JsonNode) {.async.} = echo "Logging Global Event: " & event["method"].to(string)
proc logSessionEvent(event: JsonNode) {.async.} = echo "Logging Session Event: " & event["method"].to(string)
proc main() {.async.} =
let experimentalGlobalEvent = "Target.attachedToTarget"
let browser = await launchBrowser()
browser.addGlobalEventCallback(experimentalGlobalEvent, logGlobalEvent)
let tab = await browser.newTab()
browser.addSessionEventCallback(tab.sessionId, $Page.frameNavigated, logSessionEvent)
await tab.enablePageDomain()
discard await tab.navigate("https://github.com/Niminem")
await tab.disablePageDomain()
browser.deleteGlobalEventCallback(experimentalGlobalEvent)
browser.deleteSessionEventCallback(tab.sessionId, $Page.frameNavigated)
await browser.close()
waitFor main()
cdp
wraps all CDP methods via the sendCommand
procedure.
proc sendCommand*(browser: Browser; mthd: string; params: JsonNode): Future[JsonNode] {.async.}
proc sendCommand*(browser: Browser; mthd: string): Future[JsonNode] {.async.}
proc sendCommand*(tab: Tab; mthd: string; params: JsonNode): Future[JsonNode] {.async.}
proc sendCommand*(tab: Tab; mthd: string): Future[JsonNode] {.async.}
Wrapped CDP method example:
proc deleteCookies*(tab: Tab; name: string; params: JsonNode) {.async.} = # optional params exist
params["name"] = newJString(name)
discard await tab.sendCommand("Network.deleteCookies", params)
proc deleteCookies*(tab: Tab; name: string) {.async.} = # only 'name' parameter is required
discard await tab.sendCommand("Network.deleteCookies", %*{"name": name})
You can easily use any experimental method with this generic procedure.
Similarly, you can easily use any experimental CDP event with the callback functions as they are just strings.
cdp
provides enums for stable events as a convenience so I don't accidently shoot myself in the foot:
type
Network* {.pure.} = enum
dataReceived = "Network.dataReceived",
eventSourceMessageReceived = "Network.eventSourceMessageReceived",
loadingFailed = "Network.loadingFailed",
...
# used like this:
...
browser.addGlobalEventCallback($Network.dataReceived, procName)
...
# alternatively for experimental events:
...
browser.addGlobalEventCallback("Target.attachedToTarget", procName)
..
- ALL commands provide a generic
Future[Json]
response containing theid
of the method called. This is how we can map sent commands to the appropriate responses as CDP is a single multiplexed web socket connection (I think I said that right). Anyways, use thediscard
statement as necessary. - If any commands should have a non-generic response (aka something you need to parse and use), it is still of
the same
Future[Json]
type. This is because some CDPreturn objects
return optional parameters. Example: DOM.getNodeForLocation. There is no efficient way to directly map them all into their return objects. - Exception handling is basic, accounting mostly for setup and Chrome process stuff. For now, you have to roll your own for CDP methods. I'm certain that CDP provides error message/description in the responses you receive.
- There is no getting around it. You'll need to become familiar with CDP itself and what the different domains/events provide (depending on what you're trying to accomplish of course), but when you do- you will have complete control over your browser. The only limit will be your imagination.
- Look into basic exception handling for errors in CDP responses
- Add support for Chromium (Windows and Mac as this exists for Linux) and Microsoft Edge (Windows)
- Add
close
procedure forTab
(I think we should destroy the object after it's dettached from the target) - Add convenience functions for handling global/session events with the
Tab
object - Investigate if there's a need for supporting multiple global and session events (my inclination is no)