184 lines
6.7 KiB
Markdown
184 lines
6.7 KiB
Markdown
|
+++
|
|||
|
title = "Documentation"
|
|||
|
description = "Documentation"
|
|||
|
weight = 2
|
|||
|
+++
|
|||
|
|
|||
|
### Configuration
|
|||
|
|
|||
|
Unless specified via the `--config` command line option `rsspls` reads its
|
|||
|
configuration from one of the following paths:
|
|||
|
|
|||
|
* UNIX-like systems:
|
|||
|
* `$XDG_CONFIG_HOME/rsspls/feeds.toml`
|
|||
|
* `~/.config/rsspls/feeds.toml` if `XDG_CONFIG_HOME` is unset.
|
|||
|
* Windows:
|
|||
|
* `C:\Users\You\AppData\Roaming\rsspls\feeds.toml`
|
|||
|
|
|||
|
The configuration file is in [TOML][toml] format.
|
|||
|
|
|||
|
The parts of the page to extract for the feed are specified using [CSS
|
|||
|
selectors][selectors].
|
|||
|
|
|||
|
#### Annotated Sample Configuration
|
|||
|
|
|||
|
The sample file below demonstrates all the parts of the configuration.
|
|||
|
|
|||
|
```toml
|
|||
|
# The configuration must start with the [rsspls] section
|
|||
|
[rsspls]
|
|||
|
# Optional output directory to write the feeds to. If not specified it must be supplied via
|
|||
|
# the --output command line option.
|
|||
|
output = "/tmp"
|
|||
|
# Optional proxy address. If specified, all requests will be routed through it.
|
|||
|
# The address needs to be in the format: protocol://ip_address:port
|
|||
|
# The supported protocols are: http, https, socks and socks5h.
|
|||
|
# It can also be specified as environment variable `http_proxy` or `HTTPS_PROXY`.
|
|||
|
# The config file takes precedence, then the env vars in the above order.
|
|||
|
# proxy = socks5://10.64.0.1:1080
|
|||
|
|
|||
|
# Next is the array of feeds, each one starts with [[feed]]
|
|||
|
[[feed]]
|
|||
|
# The title of the channel in the feed
|
|||
|
title = "My Great RSS Feed"
|
|||
|
|
|||
|
# The output filename without the output directory to write this feed to.
|
|||
|
# Note: this is a filename only, not a path. It should not contain slashes.
|
|||
|
filename = "wezm.rss"
|
|||
|
|
|||
|
# Optional User-Agent header to be set for the HTTP request.
|
|||
|
# user_agent = "Mozilla/5.0"
|
|||
|
|
|||
|
# The configuration for the feed
|
|||
|
[feed.config]
|
|||
|
# The URL of the web page to generate the feed from.
|
|||
|
url = "https://www.wezm.net/"
|
|||
|
|
|||
|
# A CSS selector to select elements on the page that represent items in the feed.
|
|||
|
item = "article"
|
|||
|
|
|||
|
# A CSS selector relative to `item` to an element that will supply the title for the item.
|
|||
|
heading = "h3"
|
|||
|
|
|||
|
# A CSS selector relative to `item` to an element that will supply the link for the item.
|
|||
|
# Note: This element must have a `href` attribute.
|
|||
|
# Note: If not supplied rsspls will attempt to use the heading selector for link for backwards
|
|||
|
# compatibility with earlier versions. A message will be emitted in this case.
|
|||
|
link = "h3 a"
|
|||
|
|
|||
|
# Optional CSS selector relative to `item` that will supply the content of the RSS item.
|
|||
|
summary = ".post-body"
|
|||
|
|
|||
|
# Optional CSS selector relative to `item` that supplies media content (audio, video, image)
|
|||
|
# to be added as an RSS enclosure.
|
|||
|
# Note: The media URL must be given by the `src` or `href` attribute of the selected element.
|
|||
|
# Note: Currently if the item does not match the media selector then it will be skipped.
|
|||
|
# media = "figure img"
|
|||
|
|
|||
|
# Optional CSS selector relative to `item` that supples the publication date of the RSS item.
|
|||
|
date = "time"
|
|||
|
|
|||
|
# Alternatively for more control `date` can be specified as a table:
|
|||
|
# [feed.config.date]
|
|||
|
# selector = "time"
|
|||
|
# # Optional type of value being parsed.
|
|||
|
# # Defaults to DateTime, can also be Date if you're parsing a value without a time.
|
|||
|
# type = "Date"
|
|||
|
# # format of the date to parse. See the following for the syntax
|
|||
|
# # https://time-rs.github.io/book/api/format-description.html
|
|||
|
# format = "[day padding:none]/[month padding:none]/[year]" # will parse 1/2/1934 style dates
|
|||
|
|
|||
|
# A second example feed
|
|||
|
[[feed]]
|
|||
|
title = "Example Site"
|
|||
|
filename = "example.rss"
|
|||
|
|
|||
|
[feed.config]
|
|||
|
url = "https://example.com/"
|
|||
|
item = "div"
|
|||
|
heading = "a"
|
|||
|
```
|
|||
|
|
|||
|
The first example above (for my blog WezM.net) matches HTML that looks like this:
|
|||
|
|
|||
|
```html
|
|||
|
<section class="posts-section">
|
|||
|
<h2>Recent Posts</h2>
|
|||
|
|
|||
|
<article id="garage-door-monitor">
|
|||
|
<h3><a href="https://www.wezm.net/v2/posts/2022/garage-door-monitor/">Monitoring My Garage Door With a Raspberry Pi, Rust, and a 13Mb Linux System</a></h3>
|
|||
|
<div class="post-metadata">
|
|||
|
<div class="date-published">
|
|||
|
<time datetime="2022-04-20T06:38:27+10:00">20 April 2022</time>
|
|||
|
</div>
|
|||
|
</div>
|
|||
|
|
|||
|
<div class="post-body">
|
|||
|
<p>I’ve accidentally left our garage door open a few times. To combat this I built
|
|||
|
a monitor that sends an alert via Mattermost when the door has been left open
|
|||
|
for more than 5 minutes. This turned out to be a super fun project. I used
|
|||
|
parts on hand as much as possible, implemented the monitoring application in
|
|||
|
Rust, and then built a stripped down Linux image to run it.
|
|||
|
</p>
|
|||
|
</div>
|
|||
|
|
|||
|
<a href="https://www.wezm.net/v2/posts/2022/garage-door-monitor/">Continue Reading →</a>
|
|||
|
</article>
|
|||
|
|
|||
|
<article id="monospace-kobo-ereader">
|
|||
|
<!-- another article -->
|
|||
|
</article>
|
|||
|
|
|||
|
<!-- more articles -->
|
|||
|
|
|||
|
<a href="https://www.wezm.net/v2/posts/">View more posts →</a>
|
|||
|
</section>
|
|||
|
```
|
|||
|
|
|||
|
#### More Detail on Date Handling
|
|||
|
|
|||
|
The `date` key in the configuration can be a string or a table. If it's a
|
|||
|
string then it's used as selector to find the element containing the date and
|
|||
|
`rsspls` will attempt to automatically parse the value. If automatic parsing
|
|||
|
fails you can manually specify the format using the table form of `date`:
|
|||
|
|
|||
|
```toml
|
|||
|
[feed.config.date]
|
|||
|
selector = "time" # required
|
|||
|
type = "Date"
|
|||
|
format = "[day padding:none]/[month padding:none]/[year]"
|
|||
|
```
|
|||
|
|
|||
|
* `type` is `Date` when you want to parse just a date. Use `DateTime` if you're
|
|||
|
parsing a date and time with the format. Defaults to `DateTime`.
|
|||
|
* `format` is a format description using the syntax described on this page:
|
|||
|
<https://time-rs.github.io/book/api/format-description.html>.
|
|||
|
|
|||
|
If the element matched by the `date` selector is a `<time>` element then
|
|||
|
`rsspls` will first try to parse the value in the `datetime` attribute if
|
|||
|
present. If the attribute is missing or the element is not a `time` element
|
|||
|
then `rsspls` will use the supplied format or attempt automatic parsing of the
|
|||
|
text content of the element.
|
|||
|
|
|||
|
### Hosting
|
|||
|
|
|||
|
It is expected that `rsspls` will be run on a web server that is serving the
|
|||
|
directory the feeds are written to. `rsspls` just generates the feeds, it's not
|
|||
|
a server. In order to have the feeds update you will need to arrange for
|
|||
|
`rsspls` to be run periodically. You might do this with [cron], [systemd
|
|||
|
timers][timers], or the Windows equivalent.
|
|||
|
|
|||
|
### Caveats
|
|||
|
|
|||
|
`rsspls` just fetches and parses the HTML of the web page you specify. It does
|
|||
|
not run JavaScript. If the website is entirely generated by JavaScript (such as
|
|||
|
Twitter) then `rsspls` will not work.
|
|||
|
|
|||
|
### Caching
|
|||
|
|
|||
|
When websites respond with cache headers `rsspls` will make a conditional
|
|||
|
request on subsequent runs and will not regenerate the feed if the server
|
|||
|
responds with 304 Not Modified. Cache data is stored in
|
|||
|
`$XDG_CACHE_HOME/rsspls`, which defaults to `~/.cache/rsspls` on UNIX-like
|
|||
|
systems or `C:\Users\You\AppData\Local\rsspls` on Windows.
|