Feedstock provides its functionality through one nested class and two class-level methods.
Feedstock::Extract.new(selector:, absolute: nil, content: nil, processor: nil, prefix: nil, suffix: nil, type: nil, filter: nil)
An Extract is a subclass of Struct. Its initialiser takes the following parameters.
A String representing the path to the node in the document expressed in CSS's selector syntax.
Feedstock will extract the content in the node pointed to by the path and then
perform additional transformations if certain parameters are provided to the
initialiser. For a given rule, the order of transformation is: (1) extract; (2)
if processor:
is provided, process; (3) if prefix:
or suffix:
is
provided, wrap; and (4) if type:
is provided, format.
A Boolean indicating whether the selector should search from the root of the document.
A value indicating how to extract the content from the selected node. It can
either be "inner_html"
, "html"
, "xml"
or a Hash of the form {attribute: "<attribute>"}
.
If the value is "inner_html"
, Feedstock will extract the content of the node
as HTML. If the value is "html"
or "xml"
, the HTML (or XML) tag and its
contents are converted to a String. If the value is an attribute hash,
Feedstock will extract the value of that attribute. This is important for
links, where the link itself is typically the content of the href
attribute
rather than the content of the <a>
element.
If not provided, Feedstock concatenates the text nodes in the selected node's subtree.
A Lambda that takes two arguments. The first is the extracted content, the second is the rule being processed. The Lambda must return a String.
A String to prepend to the content extracted.
A String to append to the content extracted.
A String representing the type of the content. Valid values are "datetime"
and "cdata"
. If the value is "datetime"
, the content is parsed by the
Timeliness library to return a string. If the value is "cdata"
,
the content is wrapped in <![CDATA[
and ]]>
tags.
A Lambda that takes one argument, a Hash containing the values extracted for the entry. A user can then use a Lambda to decide whether to keep or reject the content. The Lambda must return a truthy value to keep the content.
Feedstock.data(url, rules, format = :html)
The data
method takes up to three parameters and returns a Hash with the keys
:info
and :entries
. Each parameter works the same as in Feedstock.feed
and is explained in more detail below.
Feedstock.feed(url, rules, format = :html, template = "#{__dir__}/../default.xml")
The feed
method takes up to four parameters and returns a String. Each
parameter is explained in more detail below.
The url
parameter is a String and must resolve to either an HTML or XML
document.
The rules
parameter is a Hash representing a collection of rules. rules
has
two mandatory keys and one optional key.
-
:info
The
:info
key is mandatory and must be associated with a Hash (called the info hash). The keys of the info hash must be symbols, not strings. When used with the default template, Feedstock will use the key as the name of the XML entity in the resulting document. For example, if the key is:id
, the XML entity in the resulting feed will be<id>
.The values in the info hash can be either a String or an Extract.
-
:entry
The
:entry
key is mandatory and must be associated with a Hash (called the entry hash). The keys of the entry hash must be symbols, not strings. When used with the default template, Feedstock will use the key as the name of the XML entity in the resulting feed. For example, if the key is"id"
, the XML entity in the resulting feed will be<id>
.The values in the entry hash can be either a String or an Extract.
-
:entries
The
:entries
key is optional and may be associated with an Extract. The Extract represents a node within the document to which the selectors in the:entry
rules will be relative.
The format
parameter can be either :html
or :xml
. The default is :html
.
The template
parameter should be a path to an ERB template into which the
information and entries extracted from the document will be inserted. The ERB
template will be passed a Hash containing an :info
key and an :entries
key.
A default template is included with Feedstock but a user can also specify their own template.