rsspls.7bit.org/content/documentation.md

6.7 KiB
Raw Blame History

+++ title = "Documentation" description = "Documentation" weight = 2 +++

Configuration

Unless specified via the --config command line option rsspls reads its configuration from one of the following paths:

  • UNIX-like systems:
    • $XDG_CONFIG_HOME/rsspls/feeds.toml
    • ~/.config/rsspls/feeds.toml if XDG_CONFIG_HOME is unset.
  • Windows:
    • C:\Users\You\AppData\Roaming\rsspls\feeds.toml

The configuration file is in [TOML][toml] format.

The parts of the page to extract for the feed are specified using [CSS selectors][selectors].

Annotated Sample Configuration

The sample file below demonstrates all the parts of the configuration.

# The configuration must start with the [rsspls] section
[rsspls]
# Optional output directory to write the feeds to. If not specified it must be supplied via
# the --output command line option.
output = "/tmp"
# Optional proxy address. If specified, all requests will be routed through it.
# The address needs to be in the format: protocol://ip_address:port
# The supported protocols are: http, https, socks and socks5h.
# It can also be specified as environment variable `http_proxy` or `HTTPS_PROXY`.
# The config file takes precedence, then the env vars in the above order.
# proxy = socks5://10.64.0.1:1080

# Next is the array of feeds, each one starts with [[feed]]
[[feed]]
# The title of the channel in the feed
title = "My Great RSS Feed"

# The output filename without the output directory to write this feed to.
# Note: this is a filename only, not a path. It should not contain slashes.
filename = "wezm.rss"

# Optional User-Agent header to be set for the HTTP request.
# user_agent = "Mozilla/5.0"

# The configuration for the feed
[feed.config]
# The URL of the web page to generate the feed from.
url = "https://www.wezm.net/"

# A CSS selector to select elements on the page that represent items in the feed.
item = "article"

# A CSS selector relative to `item` to an element that will supply the title for the item.
heading = "h3"

# A CSS selector relative to `item` to an element that will supply the link for the item.
# Note: This element must have a `href` attribute.
# Note: If not supplied rsspls will attempt to use the heading selector for link for backwards
#       compatibility with earlier versions. A message will be emitted in this case.
link = "h3 a"

# Optional CSS selector relative to `item` that will supply the content of the RSS item.
summary = ".post-body"

# Optional CSS selector relative to `item` that supplies media content (audio, video, image)
# to be added as an RSS enclosure.
# Note: The media URL must be given by the `src` or `href` attribute of the selected element.
# Note: Currently if the item does not match the media selector then it will be skipped.
# media = "figure img"

# Optional CSS selector relative to `item` that supples the publication date of the RSS item.
date = "time"

# Alternatively for more control `date` can be specified as a table:
# [feed.config.date]
# selector = "time"
# # Optional type of value being parsed.
# # Defaults to DateTime, can also be Date if you're parsing a value without a time.
# type = "Date" 
# # format of the date to parse. See the following for the syntax
# # https://time-rs.github.io/book/api/format-description.html
# format = "[day padding:none]/[month padding:none]/[year]" # will parse 1/2/1934 style dates

# A second example feed
[[feed]]
title = "Example Site"
filename = "example.rss"

[feed.config]
url = "https://example.com/"
item = "div"
heading = "a"

The first example above (for my blog WezM.net) matches HTML that looks like this:

<section class="posts-section">
  <h2>Recent Posts</h2>

  <article id="garage-door-monitor">
    <h3><a href="https://www.wezm.net/v2/posts/2022/garage-door-monitor/">Monitoring My Garage Door With a Raspberry Pi, Rust, and a 13Mb Linux System</a></h3>
    <div class="post-metadata">
      <div class="date-published">
        <time datetime="2022-04-20T06:38:27+10:00">20 April 2022</time>
      </div>
    </div>

    <div class="post-body">
      <p>Ive accidentally left our garage door open a few times. To combat this I built
        a monitor that sends an alert via Mattermost when the door has been left open
        for more than 5 minutes. This turned out to be a super fun project. I used
        parts on hand as much as possible, implemented the monitoring application in
        Rust, and then built a stripped down Linux image to run it.
      </p>
    </div>

    <a href="https://www.wezm.net/v2/posts/2022/garage-door-monitor/">Continue Reading →</a>
  </article>

  <article id="monospace-kobo-ereader">
    <!-- another article -->
  </article>

  <!-- more articles -->

  <a href="https://www.wezm.net/v2/posts/">View more posts →</a>
</section>

More Detail on Date Handling

The date key in the configuration can be a string or a table. If it's a string then it's used as selector to find the element containing the date and rsspls will attempt to automatically parse the value. If automatic parsing fails you can manually specify the format using the table form of date:

[feed.config.date]
selector = "time" # required
type = "Date"
format = "[day padding:none]/[month padding:none]/[year]"

If the element matched by the date selector is a <time> element then rsspls will first try to parse the value in the datetime attribute if present. If the attribute is missing or the element is not a time element then rsspls will use the supplied format or attempt automatic parsing of the text content of the element.

Hosting

It is expected that rsspls will be run on a web server that is serving the directory the feeds are written to. rsspls just generates the feeds, it's not a server. In order to have the feeds update you will need to arrange for rsspls to be run periodically. You might do this with [cron], [systemd timers][timers], or the Windows equivalent.

Caveats

rsspls just fetches and parses the HTML of the web page you specify. It does not run JavaScript. If the website is entirely generated by JavaScript (such as Twitter) then rsspls will not work.

Caching

When websites respond with cache headers rsspls will make a conditional request on subsequent runs and will not regenerate the feed if the server responds with 304 Not Modified. Cache data is stored in $XDG_CACHE_HOME/rsspls, which defaults to ~/.cache/rsspls on UNIX-like systems or C:\Users\You\AppData\Local\rsspls on Windows.