366 lines
13 KiB
Markdown
366 lines
13 KiB
Markdown
+++
|
||
title = "Documentation"
|
||
description = "Documentation"
|
||
weight = 2
|
||
+++
|
||
|
||
## How it Works
|
||
|
||
`rsspls` fetches each page specified by the configuration and extracts elements
|
||
from the page using [CSS selectors][selectors]. For example elements are matched
|
||
to determine the title and content of the feed entry. The generated feeds are
|
||
written to an output directory. HTTP caching is used to only update the feed
|
||
when the source page changes.
|
||
|
||
## Supported Platforms
|
||
|
||
`rsspls` should work on all [platforms supported by the Rust compiler][platforms]
|
||
including Linux, macOS, Windows, and BSD. Pre-compiled binaries are available
|
||
for common platforms. See [the install page](@/install.md) for details.
|
||
|
||
## Usage
|
||
|
||
```
|
||
rsspls [OPTIONS] -o OUTPUT_DIR
|
||
|
||
OPTIONS:
|
||
-h, --help
|
||
Prints this help information
|
||
|
||
-c, --config
|
||
Specify the path to the configuration file.
|
||
$XDG_CONFIG_HOME/rsspls/feeds.toml is used if not supplied.
|
||
|
||
-o, --output
|
||
Directory to write generated feeds to.
|
||
|
||
-V, --version
|
||
Prints version information
|
||
|
||
FILES:
|
||
~/$XDG_CONFIG_HOME/rsspls/feeds.toml rsspls configuration file.
|
||
~/$XDG_CONFIG_HOME/rsspls Configuration directory.
|
||
~/XDG_CACHE_HOME/rsspls Cache directory.
|
||
|
||
Note: XDG_CONFIG_HOME defaults to ~/.config, XDG_CACHE_HOME
|
||
defaults to ~/.cache.
|
||
```
|
||
|
||
## Configuration
|
||
|
||
Unless specified via the `--config` command line option `rsspls` reads its
|
||
configuration from one of the following paths:
|
||
|
||
* UNIX-like systems:
|
||
* `$XDG_CONFIG_HOME/rsspls/feeds.toml`
|
||
* `~/.config/rsspls/feeds.toml` if `XDG_CONFIG_HOME` is unset.
|
||
* Windows:
|
||
* `C:\Users\You\AppData\Roaming\rsspls\feeds.toml`
|
||
|
||
The configuration file is in [TOML][toml] format.
|
||
|
||
The parts of the page to extract for the feed are specified using [CSS
|
||
selectors][selectors].
|
||
|
||
### Annotated Sample Configuration
|
||
|
||
The sample file below demonstrates all the parts of the configuration.
|
||
|
||
```toml
|
||
# The configuration must start with the [rsspls] section
|
||
[rsspls]
|
||
# Optional output directory to write the feeds to. If not specified it must be supplied via
|
||
# the --output command line option.
|
||
output = "/tmp"
|
||
# Optional proxy address. If specified, all requests will be routed through it.
|
||
# The address needs to be in the format: protocol://ip_address:port
|
||
# The supported protocols are: http, https, socks and socks5h.
|
||
# It can also be specified as environment variable `http_proxy` or `HTTPS_PROXY`.
|
||
# The config file takes precedence, then the env vars in the above order.
|
||
# proxy = socks5://10.64.0.1:1080
|
||
|
||
# Next is the array of feeds, each one starts with [[feed]]
|
||
[[feed]]
|
||
# The title of the channel in the feed
|
||
title = "My Great RSS Feed"
|
||
|
||
# The output filename without the output directory to write this feed to.
|
||
# Note: this is a filename only, not a path. It should not contain slashes.
|
||
filename = "wezm.rss"
|
||
|
||
# Optional User-Agent header to be set for the HTTP request.
|
||
# user_agent = "Mozilla/5.0"
|
||
|
||
# The configuration for the feed
|
||
[feed.config]
|
||
# The URL of the web page to generate the feed from.
|
||
url = "https://www.wezm.net/"
|
||
|
||
# A CSS selector to select elements on the page that represent items in the feed.
|
||
item = "article"
|
||
|
||
# A CSS selector relative to `item` to an element that will supply the title for the item.
|
||
heading = "h3"
|
||
|
||
# A CSS selector relative to `item` to an element that will supply the link for the item.
|
||
# Note: This element must have a `href` attribute.
|
||
# Note: If not supplied rsspls will attempt to use the heading selector for link for backwards
|
||
# compatibility with earlier versions. A message will be emitted in this case.
|
||
link = "h3 a"
|
||
|
||
# Optional CSS selector relative to `item` that will supply the content of the RSS item.
|
||
summary = ".post-body"
|
||
|
||
# Optional CSS selector relative to `item` that supplies media content (audio, video, image)
|
||
# to be added as an RSS enclosure.
|
||
# Note: The media URL must be given by the `src` or `href` attribute of the selected element.
|
||
# Note: Currently if the item does not match the media selector then it will be skipped.
|
||
# media = "figure img"
|
||
|
||
# Optional CSS selector relative to `item` that supples the publication date of the RSS item.
|
||
date = "time"
|
||
|
||
# Alternatively for more control `date` can be specified as a table:
|
||
# [feed.config.date]
|
||
# selector = "time"
|
||
# # Optional type of value being parsed.
|
||
# # Defaults to DateTime, can also be Date if you're parsing a value without a time.
|
||
# type = "Date"
|
||
# # format of the date to parse. See the following for the syntax
|
||
# # https://time-rs.github.io/book/api/format-description.html
|
||
# format = "[day padding:none]/[month padding:none]/[year]" # will parse 1/2/1934 style dates
|
||
|
||
# A second example feed
|
||
[[feed]]
|
||
title = "Example Site"
|
||
filename = "example.rss"
|
||
|
||
[feed.config]
|
||
url = "https://example.com/"
|
||
item = "div"
|
||
heading = "a"
|
||
```
|
||
|
||
The first example above (for my blog WezM.net) matches HTML that looks like this:
|
||
|
||
```html
|
||
<section class="posts-section">
|
||
<h2>Recent Posts</h2>
|
||
|
||
<article id="garage-door-monitor">
|
||
<h3><a href="https://www.wezm.net/v2/posts/2022/garage-door-monitor/">Monitoring My Garage Door With a Raspberry Pi, Rust, and a 13Mb Linux System</a></h3>
|
||
<div class="post-metadata">
|
||
<div class="date-published">
|
||
<time datetime="2022-04-20T06:38:27+10:00">20 April 2022</time>
|
||
</div>
|
||
</div>
|
||
|
||
<div class="post-body">
|
||
<p>I’ve accidentally left our garage door open a few times. To combat this I built
|
||
a monitor that sends an alert via Mattermost when the door has been left open
|
||
for more than 5 minutes. This turned out to be a super fun project. I used
|
||
parts on hand as much as possible, implemented the monitoring application in
|
||
Rust, and then built a stripped down Linux image to run it.
|
||
</p>
|
||
</div>
|
||
|
||
<a href="https://www.wezm.net/v2/posts/2022/garage-door-monitor/">Continue Reading →</a>
|
||
</article>
|
||
|
||
<article id="monospace-kobo-ereader">
|
||
<!-- another article -->
|
||
</article>
|
||
|
||
<!-- more articles -->
|
||
|
||
<a href="https://www.wezm.net/v2/posts/">View more posts →</a>
|
||
</section>
|
||
```
|
||
|
||
### output
|
||
|
||
Optional output directory to write the feeds to. If not specified it must be
|
||
supplied via the `--output` command line option. Directory will be created if
|
||
it does not exist.
|
||
|
||
Tilde expansion is performed on the path in the config file. This allows you to
|
||
refer to the home directory of the user running `rsspls`. For example,
|
||
`~/Documents/rsspls` could be used to place the output in your `Documents`
|
||
folder.
|
||
|
||
### proxy
|
||
|
||
Optional proxy address. If specified, all requests will be routed through it.
|
||
The address needs to be in the format: `protocol://ip_address:port`
|
||
The supported protocols are: http, https, socks and socks5h.
|
||
|
||
The proxy for http and https requests can also be specified with the
|
||
environment variables `http_proxy` and `HTTPS_PROXY` respectively.
|
||
The config file takes precedence over environment variables.
|
||
|
||
### feed.title
|
||
|
||
The title of the channel in the generated feed.
|
||
|
||
### feed.filename
|
||
|
||
The output filename to write this feed to. Note: this is a filename only, not a
|
||
path. It should not contain slashes. It will be written to the [output](#output)
|
||
directory.
|
||
|
||
### feed.config.url
|
||
|
||
The URL of the web page to generate the feed from. The page at this address
|
||
will be fetched processed to turn it into a feed.
|
||
|
||
### feed.config.item
|
||
|
||
A CSS selector to select elements on the page that represent items in the feed.
|
||
The other CSS selectors match elements inside the elements that this selector
|
||
matches.
|
||
|
||
### feed.config.heading
|
||
|
||
A CSS selector relative to `item` to an element that will supply the title for
|
||
the item in the feed.
|
||
|
||
|
||
### feed.config.link
|
||
|
||
CSS selector relative to `item` to an element that will supply the
|
||
link for the item in the feed.
|
||
|
||
**Note:** This element must have a `href` attribute.
|
||
|
||
**Note:** If not supplied `rsspls` will attempt to use the
|
||
`feed.config.heading` selector as the `link` element for backwards compatibility
|
||
with earlier versions. A warning message will be emitted in this case. It is
|
||
recommended to specify the `link` selector explicitly.
|
||
|
||
|
||
### feed.config.summary
|
||
|
||
Optional CSS selector relative to `item` that will supply the content of the
|
||
RSS item. This value may be a single CSS selector, or an array of CSS
|
||
selectors.
|
||
|
||
The CSS selectors may also include a comma separated list of elements to match.
|
||
For example: `summary = "p, blockquote"` will match `p` or `blockquote`
|
||
elements, adding them to the RSS feed in the order then are encountered in the
|
||
HTML document.
|
||
|
||
The array form of `summary` allows the order of the matched elements to be
|
||
controlled, enabling elements to be added to the feed in a different order to
|
||
the source HTML document. For example, `summary = ["p", "blockquote"]` causes
|
||
`rsspls` to make a pass over the source HTML document, adding `p` elements to
|
||
the feed, followed by a pass adding `blockquote` elements to the feed.
|
||
|
||
### feed.config.date
|
||
|
||
The optional `date` key in the configuration can be a string or a table. If it's a
|
||
string then it's used as CSS selector relative to `item` to find the element
|
||
containing the date and `rsspls` will attempt to automatically parse the value.
|
||
|
||
If automatic parsing fails you can manually specify the format using the table
|
||
form of `date`, which looks like this:
|
||
|
||
```toml
|
||
[feed.config.date]
|
||
selector = "time" # required
|
||
type = "Date"
|
||
format = "[day padding:none]/[month padding:none]/[year]" # will parse 1/2/1934 style dates
|
||
```
|
||
|
||
If the element matched by the `date` selector is a `<time>` element then
|
||
`rsspls` will first try to parse the value in the `datetime` attribute if
|
||
present. If the attribute is missing or the element is not a `time` element
|
||
then `rsspls` will use the supplied format or attempt automatic parsing of the
|
||
text content of the element.
|
||
|
||
#### feed.config.date.selector
|
||
|
||
CSS selector relative to `item` that supples the publication date of
|
||
the RSS item.
|
||
|
||
#### feed.config.date.type
|
||
|
||
Optional type of value being parsed. Either `Date` or `DateTime`.
|
||
|
||
`type` is `Date` when you want to parse just a date. Use `DateTime` if you're
|
||
parsing a date and time with the format. Defaults to `DateTime`.
|
||
|
||
#### feed.config.date.format
|
||
|
||
Format description using the syntax described on this page:
|
||
<https://time-rs.github.io/book/api/format-description.html>
|
||
of how to parse the date.
|
||
|
||
### feed.config.media
|
||
|
||
Optional CSS selector relative to `item` that supplies media content (audio,
|
||
video, image) to be added as an RSS enclosure.
|
||
|
||
**Note:** The media URL must be given by the `src` or `href` attribute of the
|
||
selected element.
|
||
|
||
**Note:** Currently if the item does not match the media selector then it will
|
||
be skipped.
|
||
|
||
## Hosting, Updating, and Subscribing
|
||
|
||
In order to have the feeds update you will need to arrange for
|
||
`rsspls` to be run periodically. You might do this with [cron], [systemd
|
||
timers][timers], or the Windows equivalent.
|
||
|
||
To subscribe to feeds you can run `rsspls` locally and use a feed reader that
|
||
supports local file feeds. Or, more likely it is expected that `rsspls` will be
|
||
run on a web server that is serving the directory the feeds are written to.
|
||
|
||
## Logging
|
||
|
||
`rsspls` logs messages to `stderr`. Logging can be controlled by the
|
||
`RSSPLS_LOG` environment variable. Log level and target module can controlled
|
||
according to the [env_logger documentation][env_logger]. For example, to enable
|
||
debug logging for `rsspls` you would use:
|
||
|
||
`RSSPLS_LOG=rsspls=debug`
|
||
|
||
The supported log levels are:
|
||
|
||
* `error`
|
||
* `warn`
|
||
* `info`
|
||
* `debug`
|
||
* `trace`
|
||
* `off` (disable logging)
|
||
|
||
The default log level is `info`.
|
||
|
||
## Caveats & Error Handling
|
||
|
||
`rsspls` just fetches and parses the HTML of the web page you specify. It does
|
||
not run JavaScript. If the website is entirely generated by JavaScript (such as
|
||
Twitter) then `rsspls` will not work.
|
||
|
||
If errors are encountered processing the page due to invalid selectors, or
|
||
missing elements an error message will be logged. If the error is non-recoverable
|
||
`rsspls` will exit with a non-zero exit status.
|
||
|
||
If an error is encountered processing an item for the feed a warning will by
|
||
logged and processing will continue with the next item. `rsspls` will still
|
||
exit with success (0) in this case.
|
||
|
||
## Caching
|
||
|
||
When websites respond with cache headers `rsspls` will make a conditional
|
||
request on subsequent runs and will not regenerate the feed if the server
|
||
responds with 304 Not Modified. Cache data is stored in
|
||
`$XDG_CACHE_HOME/rsspls`, which defaults to `~/.cache/rsspls` on UNIX-like
|
||
systems or `C:\Users\You\AppData\Local\rsspls` on Windows.
|
||
|
||
[cron]: https://en.wikipedia.org/wiki/Cron
|
||
[env_logger]: https://docs.rs/env_logger/latest/env_logger/#enabling-logging
|
||
[platforms]: https://doc.rust-lang.org/stable/rustc/platform-support.html
|
||
[selectors]: https://developer.mozilla.org/en-US/docs/Learn/CSS/Building_blocks/Selectors
|
||
[timers]: https://wiki.archlinux.org/title/Systemd/Timers
|
||
[toml]: https://toml.io/
|