+++ title = "Documentation" description = "Documentation" weight = 2 +++ ## How it Works `rsspls` fetches each page specified by the configuration and extracts elements from the page using [CSS selectors][selectors]. For example elements are matched to determine the title and content of the feed entry. The generated feeds are written to an output directory. HTTP caching is used to only update the feed when the source page changes. ## Supported Platforms `rsspls` should work on all [platforms supported by the Rust compiler][platforms] including Linux, macOS, Windows, and BSD. Pre-compiled binaries are available for common platforms. See [the install page](@/install.md) for details. ## Usage ``` rsspls [OPTIONS] -o OUTPUT_DIR OPTIONS: -h, --help Prints this help information -c, --config Specify the path to the configuration file. $XDG_CONFIG_HOME/rsspls/feeds.toml is used if not supplied. -o, --output Directory to write generated feeds to. -V, --version Prints version information FILES: ~/$XDG_CONFIG_HOME/rsspls/feeds.toml rsspls configuration file. ~/$XDG_CONFIG_HOME/rsspls Configuration directory. ~/XDG_CACHE_HOME/rsspls Cache directory. Note: XDG_CONFIG_HOME defaults to ~/.config, XDG_CACHE_HOME defaults to ~/.cache. ``` ## Configuration Unless specified via the `--config` command line option `rsspls` reads its configuration from one of the following paths: * UNIX-like systems: * `$XDG_CONFIG_HOME/rsspls/feeds.toml` * `~/.config/rsspls/feeds.toml` if `XDG_CONFIG_HOME` is unset. * Windows: * `C:\Users\You\AppData\Roaming\rsspls\feeds.toml` The configuration file is in [TOML][toml] format. The parts of the page to extract for the feed are specified using [CSS selectors][selectors]. ### Annotated Sample Configuration The sample file below demonstrates all the parts of the configuration. ```toml # The configuration must start with the [rsspls] section [rsspls] # Optional output directory to write the feeds to. If not specified it must be supplied via # the --output command line option. output = "/tmp" # Optional proxy address. If specified, all requests will be routed through it. # The address needs to be in the format: protocol://ip_address:port # The supported protocols are: http, https, socks and socks5h. # It can also be specified as environment variable `http_proxy` or `HTTPS_PROXY`. # The config file takes precedence, then the env vars in the above order. # proxy = socks5://10.64.0.1:1080 # Optionally enable reading web pages from local files though file:// URLs. # file_urls = false # Next is the array of feeds, each one starts with [[feed]] [[feed]] # The title of the channel in the feed title = "My Great RSS Feed" # The output filename without the output directory to write this feed to. # Note: this is a filename only, not a path. It should not contain slashes. filename = "wezm.rss" # Optional User-Agent header to be set for the HTTP request. # user_agent = "Mozilla/5.0" # The configuration for the feed [feed.config] # The URL of the web page to generate the feed from. url = "https://www.wezm.net/" # A CSS selector to select elements on the page that represent items in the feed. item = "article" # A CSS selector relative to `item` to an element that will supply the title for the item. heading = "h3" # A CSS selector relative to `item` to an element that will supply the link for the item. # Note: This element must have a `href` attribute. # Note: If not supplied rsspls will attempt to use the heading selector for link for backwards # compatibility with earlier versions. A message will be emitted in this case. link = "h3 a" # Optional CSS selector relative to `item` that will supply the content of the RSS item. summary = ".post-body" # Optional CSS selector relative to `item` that supplies media content (audio, video, image) # to be added as an RSS enclosure. # Note: The media URL must be given by the `src` or `href` attribute of the selected element. # Note: Currently if the item does not match the media selector then it will be skipped. # media = "figure img" # Optional CSS selector relative to `item` that supples the publication date of the RSS item. date = "time" # Alternatively for more control `date` can be specified as a table: # [feed.config.date] # selector = "time" # # Optional type of value being parsed. # # Defaults to DateTime, can also be Date if you're parsing a value without a time. # type = "Date" # # format of the date to parse. See the following for the syntax # # https://time-rs.github.io/book/api/format-description.html # format = "[day padding:none]/[month padding:none]/[year]" # will parse 1/2/1934 style dates # A second example feed [[feed]] title = "Example Site" filename = "example.rss" [feed.config] url = "https://example.com/" item = "div" heading = "a" ``` The first example above (for my blog WezM.net) matches HTML that looks like this: ```html

Recent Posts

Monitoring My Garage Door With a Raspberry Pi, Rust, and a 13Mb Linux System

I’ve accidentally left our garage door open a few times. To combat this I built a monitor that sends an alert via Mattermost when the door has been left open for more than 5 minutes. This turned out to be a super fun project. I used parts on hand as much as possible, implemented the monitoring application in Rust, and then built a stripped down Linux image to run it.

Continue Reading →
View more posts →
``` ### output Optional output directory to write the feeds to. If not specified it must be supplied via the `--output` command line option. Directory will be created if it does not exist. Tilde expansion is performed on the path in the config file. This allows you to refer to the home directory of the user running `rsspls`. For example, `~/Documents/rsspls` could be used to place the output in your `Documents` folder. ### proxy Optional proxy address. If specified, all requests will be routed through it. The address needs to be in the format: `protocol://ip_address:port` The supported protocols are: http, https, socks and socks5h. The proxy for http and https requests can also be specified with the environment variables `http_proxy` and `HTTPS_PROXY` respectively. The config file takes precedence over environment variables. ### file_urls Since: 0.10.0 Optional boolean value (default `false`) indicating whether to allow fetching web pages from `file` URLs. When set to `true`, [feed.config.url](#feed-config-url) can be a URL using the `file` scheme to a local HTML file like: `file:///home/wmoore/Documents/example.html`. The path must be absolute. ### feed.title The title of the channel in the generated feed. ### feed.filename The output filename to write this feed to. Note: this is a filename only, not a path. It should not contain slashes. It will be written to the [output](#output) directory. ### feed.config.url The URL of the web page to generate the feed from. The page at this address will be fetched processed to turn it into a feed. ### feed.config.item A CSS selector to select elements on the page that represent items in the feed. The other CSS selectors match elements inside the elements that this selector matches. ### feed.config.heading A CSS selector relative to `item` to an element that will supply the title for the item in the feed. ### feed.config.link CSS selector relative to `item` to an element that will supply the link for the item in the feed. **Note:** This element must have a `href` attribute. **Note:** If not supplied `rsspls` will attempt to use the `feed.config.heading` selector as the `link` element for backwards compatibility with earlier versions. A warning message will be emitted in this case. It is recommended to specify the `link` selector explicitly. ### feed.config.summary Optional CSS selector relative to `item` that will supply the content of the RSS item. This value may be a single CSS selector, or an array of CSS selectors. The CSS selectors may also include a comma separated list of elements to match. For example: `summary = "p, blockquote"` will match `p` or `blockquote` elements, adding them to the RSS feed in the order then are encountered in the HTML document. The array form of `summary` allows the order of the matched elements to be controlled, enabling elements to be added to the feed in a different order to the source HTML document. For example, `summary = ["p", "blockquote"]` causes `rsspls` to make a pass over the source HTML document, adding `p` elements to the feed, followed by a pass adding `blockquote` elements to the feed. ### feed.config.date The optional `date` key in the configuration can be a string or a table. If it's a string then it's used as CSS selector relative to `item` to find the element containing the date and `rsspls` will attempt to automatically parse the value. If automatic parsing fails you can manually specify the format using the table form of `date`, which looks like this: ```toml [feed.config.date] selector = "time" # required type = "Date" format = "[day padding:none]/[month padding:none]/[year]" # will parse 1/2/1934 style dates ``` If the element matched by the `date` selector is a `