#readability #html #content #extracting #pages #dom #web

dom_smoothie

A Rust crate for extracting relevant content from web pages

3 unstable releases

new 0.2.0 Dec 30, 2024
0.1.1 Dec 18, 2024
0.1.0 Dec 17, 2024

#1319 in Web programming

Download history 60/week @ 2024-12-11 180/week @ 2024-12-18 90/week @ 2024-12-25

330 downloads per month

MIT license

105KB
2.5K SLoC

DOM_SMOOTHIE

Crates.io version Download docs.rs docs codecov

Rust CI

A Rust crate for extracting relevant content from web pages.

dom_smoothie closely follows the implementation of readability.js, bringing its functionality to Rust.

Examples

Basic Example

use std::error::Error;

use dom_smoothie::Readability;

fn main() -> Result<(), Box<dyn Error>> {
    let cfg = dom_smoothie::Config {
        classes_to_preserve: vec!["caption".into()],
        ..Default::default()
    };

    let html = include_str!("../test-pages/ok/001/source.html");

    let mut readability = Readability::new(html, Some("http://fakehost/test/"), Some(cfg))?;
    let article = readability.parse()?;

    println!("Title: {}", &article.title);
    println!("Content:\n {}", &article.content);
    
    Ok(())
}

License

Licensed under MIT (LICENSE or http://opensource.org/licenses/MIT).

Contribution

Any contribution intentionally submitted for inclusion in this project will be licensed under the MIT license, without any additional terms or conditions.

Dependencies

~10–17MB
~188K SLoC