Dodo
Dodo is a pure PHP implementation of the HTML DOM, based on a rigorous PHP binding of the WHATWG WebIDL specification of the DOM API. It aims to be modern and correct, and where possible complete. Like the domino JavaScript library it was inspired by, it is intended for server-side DOM manipulation and so deliberately avoids implementing the portions of the DOM dealing with dynamic loading, browser layout, and other similar features. It does aim to be fast.
Installation
[edit]In MediaWiki
[edit]Dodo will become available in MediaWiki as a composer dependency of wikimedia/parsoid
when it is mature.
Over time it is expected to gradually replace all usage of the native PHP DOM extension, which is buggy, ill-maintained, and out-of-date.
Everywhere else
[edit]Install the wikimedia/dodo package from Packagist:
composer require wikimedia/dodo
Semantic versioning is used.
The major version number will be incremented for every change that breaks backwards compatibility.
Architecture overview
[edit]For full reference documentation, please see the documentation generated from the source (or the source itself)
The API implemented by Dodo is mostly defined by the PHP binding for WebIDL. This is described in the IDLeDOM documentation.
Examples
[edit]use Wikimedia\Dodo\DOMImplementation;
function demo( DOMImplementation $impl ) {
$doc = $impl->createHTMLDocument( "Test document" );
$doc->getBody()->setInnerHTML( "<p>Look at me!</p>" );
// Direct property access style is also supported,
// although it will be slower
return $doc->body->innerHTML;
}
In the above code sample, we first construct an HTML document with the given title, then parse an HTML string in order to populate the <body>
element of the document.
In order to demonstrate how property-style access is supported, we then re-serialize the body and return it.
Performance
[edit]Dodo has not yet been fully benchmarked, but we hope it will be competitive with the native PHP DOM extension.
There are two aspects of performance: memory usage and speed.
In order to minimize memory usage, the number of fields in each DOM Node has been minimized wherever possible.
Fields like Node::nodeType
and Node::nodeName
are not actually stored in the DOM Node, but implemented via dynamic dispatch based on the type of the object.
For speed, Dodo uses a fairly common optimization that represents node children in a linked list (a circular linked list, in particular) and avoids creating the backing arrays required by the spec (eg, to implement Node::childNodes
) unless they are requested.
Writing your code to iterate using Node::firstChild
and Node::nextSibling
instead of iterating over the Node::childNodes
array will be fastest (as it is in most browser DOM implementations as well).
In order to achieve maximum performance, the getters and setters required by the DOM spec are implemented as explicit methods, for example the nodeType
property is accessed by Node::getNodeType()
.
The complex getter/setter behavior required by the DOM can't be implemented directly with PHP properties, but we do support property-style access (eg $node->nodeType
) via the magic methods __get
and __set
.
These impose a performance penalty, however, so for best performance client code will use explicit calls to the getter and setter methods rather than property-style access.