3 unstable releases
new 0.2.0 | Dec 30, 2024 |
---|---|
0.1.1 | Dec 18, 2024 |
0.1.0 | Dec 17, 2024 |
#1319 in Web programming
330 downloads per month
105KB
2.5K
SLoC
DOM_SMOOTHIE
A Rust crate for extracting relevant content from web pages.
dom_smoothie closely follows the implementation of readability.js, bringing its functionality to Rust.
Examples
Basic Example
use std::error::Error;
use dom_smoothie::Readability;
fn main() -> Result<(), Box<dyn Error>> {
let cfg = dom_smoothie::Config {
classes_to_preserve: vec!["caption".into()],
..Default::default()
};
let html = include_str!("../test-pages/ok/001/source.html");
let mut readability = Readability::new(html, Some("http://fakehost/test/"), Some(cfg))?;
let article = readability.parse()?;
println!("Title: {}", &article.title);
println!("Content:\n {}", &article.content);
Ok(())
}
License
Licensed under MIT (LICENSE or http://opensource.org/licenses/MIT).
Contribution
Any contribution intentionally submitted for inclusion in this project will be licensed under the MIT license, without any additional terms or conditions.
Dependencies
~10–17MB
~188K SLoC