diff --git a/v2/content/posts/2024/youtube-subscriptions-opml/feedme-settings.png b/v2/content/posts/2024/youtube-subscriptions-opml/feedme-settings.png new file mode 100644 index 0000000..560794d Binary files /dev/null and b/v2/content/posts/2024/youtube-subscriptions-opml/feedme-settings.png differ diff --git a/v2/content/posts/2024/youtube-subscriptions-opml/feedme.png b/v2/content/posts/2024/youtube-subscriptions-opml/feedme.png new file mode 100644 index 0000000..a91069f Binary files /dev/null and b/v2/content/posts/2024/youtube-subscriptions-opml/feedme.png differ diff --git a/v2/content/posts/2024/youtube-subscriptions-opml/index.md b/v2/content/posts/2024/youtube-subscriptions-opml/index.md new file mode 100644 index 0000000..d77a1a6 --- /dev/null +++ b/v2/content/posts/2024/youtube-subscriptions-opml/index.md @@ -0,0 +1,282 @@ ++++ +title = "Exporting YouTube Subscriptions to OPML and Watching via RSS" +date = 2024-05-06T10:38:22+10:00 + +#[extra] +#updated = 2024-02-21T10:05:19+10:00 ++++ + +This post describes how I exported my 500+ YouTube subscriptions to an OPML +file so that I could import them into my RSS reader. I go into fine detail +about the scripts and tools I used. If you just want to see the end result the +code is in [this repository][repo], which describes the steps needed to run it. + +I was previously a YouTube Premium subscriber but I cancelled it when they +jacked up the already high prices. Since then I've been watching videos in +[NewPipe] on my Android tablet or via an [Invidious] instance on real +computers. + + + +To import my subscriptions into NewPipe I was able to use the +`subscriptions.csv` file included in the Google Takeout dump of my YouTube +data. This worked fine initially but imposed some friction when adding new +subscriptions. + +If I only subscribed to new channels in NewPipe they were only +accessible on my tablet. If I added them to YouTube then I had to remember to +also add them in NewPipe, which was inconvenient if I wasn't using the tablet +at the time. Inevitably the subscriptions would drift out of sync and I would have to +periodically re-import the subscriptions from YouTube into NewPipe. This was +cumbersome as it doesn't seem to have a way to do this incrementally. Last time +I had to nuke all its data in order to re-import. + +To solve these problems I wanted to manage my subscriptions in my RSS reader, +[Feedbin]. This way Feedbin would track my subscriptions and new/viewed videos +in a way that would sync between all my devices. Notably this is possible +because Google actually publishes an RSS feed for each YouTube channel. + +To do that I needed to export all my subscriptions to an OPML file that Feedbin +could import. I opted to do that without requesting another Google Takeout dump +as they take a long time to generate and also result in multiple gigabytes of +archives I have to download (it includes all the videos I've uploaded to my +personal account) just to get at the `subscriptions.csv` file within. + +### Generating OPML + +I started +by visiting my [subscriptions page][subscriptions] and using some JavaScript to +generate a JSON array of all the channels I am subscribed to: + +```javascript +copy(JSON.stringify(Array.from(new Set(Array.prototype.map.call(document.querySelectorAll('a.channel-link'), (link) => link.href))).filter((x) => !x.includes('/channel/')), null, 2)) +``` + +This snippet: + +- queries the page for all channel links +- gets the link URL of each matching element +- Creates a `Set` from them to de-duplicate them +- Turns the set back into an `Array` +- filters out ones that contain `/channel/` to exclude some links like Trending + that also appear on that page +- Turns the Array into pretty printed JSON +- Copies it to the clipboard + +With the list of channel URLs on my clipboard I pasted this into a +`subscriptions.json` file. The challenge now was that these URLs were of the +channel pages like: + +`https://www.youtube.com/@mooretech` + +but the RSS URL of a channel is like: + +`https://www.youtube.com/feeds/videos.xml?channel_id=`, + +which means I needed to determine the channel id for each page. To do that +without futzing around with Google API keys and APIs I needed to download the +HTML of each channel page. + +To do that I generated a config file for `curl` from the JSON file: + + jaq --raw-output '.[] | (split("/") | last) as $name | "url \(.)\noutput \($name).html"' subscriptions.json > subscriptions.curl + +[jaq] is an alternative implementation of [jq] that I use. This `jaq` expression does the following: + +- `.[]` iterate over each element of the `subscriptions.json` array. +- `(split("/") | last) as $$name` split the URL on `/` and take the last element, storing it in a variable called `$name`. + - for a URL like `https://www.youtube.com/@mooretech` this stores `@mooretech` in `$name`. +- `"url \(.)\noutput \($$name).html"` generates the output text interpolating the channel page url and channel name. + +This results in lines like this for each entry in `subscriptions.json`, output +to `subscriptions.curl`: + + url https://www.youtube.com/@mooretech + output @mooretech.html + +Curl was then run against this file to download all the pages: + + curl --location --output-dir html --create-dirs --rate 1/s --config subscriptions.curl + +- `--location` tells curl to follow redirects, for some reason three of my subscriptions redirected to alternate names when accessed. +- `--output-dir` tells curl to output the files into the `html` directory. +- `--create-dirs` tells curl to create output directories if they don't exist (just the `html` one in this case). +- `--rate 1/s` tells curl to only download at a rate of 1 page per second—I was concerned YouTube might block me if I requested the pages too quickly. +- `--config subscriptions.curl` tells curl to read additional command line arguments from the `subscriptions.curl` file generated above. + +Now that I had the HTML for each channel I needed to extract the channel id +from it. While I was processing each HTML file I also extracted the channel +title for use later. For each HTML file I ran this script on it. I called the +script `generate-json-opml`: + +```sh +#!/bin/sh + +set -eu + +URL="$1" +NAME=$(echo "$URL" | awk -F / '{ print $NF }') +HTML="html/${NAME}.html" +CHANNEL_ID=$(scraper -a content 'meta[property="og:url"]' < "$HTML" | awk -F / '{ print $NF }') +TITLE=$(scraper -a content 'meta[property="og:title"]' < "$HTML") +XML_URL="https://www.youtube.com/feeds/videos.xml?channel_id=${CHANNEL_ID}" + +json_escape() { + echo "$1" | jaq --raw-input . +} + +JSON_TITLE=$(json_escape "$TITLE") +JSON_XML_URL=$(json_escape "$XML_URL") +JSON_URL=$(json_escape "$URL") + +printf '{"title": %s, "xmlUrl": %s, "htmlUrl": %s}\n' "$JSON_TITLE" "$JSON_XML_URL" "$JSON_URL" > json/"$NAME".json +``` + +Let's break that down: + +- The channel URL is stored in `URL`. +- The channel name is determined by using `awk` to split the URL on `/` and take the last element. +- The path to the downloaded HTML page is stored in `HTML`. +- The channel id is determined by finding the `` tag in the html with a `property` attribute of `og:url` (the [OpenGraph metadata][OpenGraph] URL property). This URL is again split on `/` and the last element stored in `CHANNEL_ID`. + - Querying the HTML is done with a tool called [scraper] that allows you to use CSS selectors to extract parts of a HTML document. +- The channel title is done similarly by extracting the value of the `og:title` metadata. +- The URL of the RSS feed for the channel is stored in `XML_URL` using `CHANNEL_ID`. +- A function to escape strings destined for JSON is defined. This makes use of `jaq`. +- `TITLE`, `XML_URL`, and `URL` are escaped. +- Finally we generate a JSON object with the title, URL, and RSS URL and write it into a `json` directory under the name of the channel. + +Ok, almost there. That script had to be run for each of the channel URLs. +First I generated a file with just a plain text list of the channel URLs: + + jaq --raw-output '.[]' subscriptions.json > subscriptions.txt + +Then I used `xargs` to process them in parallel: + + xargs -n1 --max-procs=$(nproc) --arg-file subscriptions.txt --verbose ./generate-json-opml + +This does the following: + +- `-n1` read one line from `subscriptions.txt` to be passed as the argument to `generate-json-opml`. +- `--max-procs=$(nproc)` run up the number of cores my machine has in parallel. +- `--arg-file subscriptions.txt` read arguments for `generate-json-opml` from `subscriptions.txt`. +- `--verbose` show the commands being run. +- `./generate-json-opml` the command to run (this is the script above). + +Finally all those JSON files need to be turned into an OPML file. For this I +used Python: + +```python +#!/usr/bin/env python + +import email.utils +import glob +import json +import xml.etree.ElementTree as ET + +opml = ET.Element("opml") + +head = ET.SubElement(opml, "head") +title = ET.SubElement(head, "title") +title.text = "YouTube Subscriptions" +dateCreated = ET.SubElement(head, "dateCreated") +dateCreated.text = email.utils.formatdate(timeval=None, localtime=True) + +body = ET.SubElement(opml, "body") +youtube = ET.SubElement(body, "outline", {"title": "YouTube", "text": "YouTube"}) + +for path in glob.glob("json/*.json"): + with open(path) as f: + info = json.load(f) + ET.SubElement(youtube, "outline", info, type="rss", text=info["title"]) + +ET.indent(opml) +print(ET.tostring(opml, encoding="unicode", xml_declaration=True)) +``` + +This generates an OPML file (which is XML) using the ElementTree library. The +OPML file has this structure: + +```xml + + + + YouTube Subscriptions + Sun, 05 May 2024 15:57:23 +1000 + + + + + + + +``` + +I does the following: + +- Generates the top level OPML structure. +- For each JSON file, read and parse the JSON and then use that to generate an `outline` entry for that channel. +- Indent the OPML document. +- Write it to stdout using a Unicode encoding with an XML declaration (``). + +Whew that was a lot! With the OMPL file generated I was finally able to import +all my subscriptions into Feedbin. + +All the code is available in [this +repository](https://forge.wezm.net/wezm/youtube-to-opml). In practice I used a +`Makefile` to run the various commands so that I didn't have to remember them. + +### Watching videos from Feedbin + +Now that Feedbin is the source of truth for subscriptions, how do I actually +watch them? I set up the [FeedMe] app on my Android tablet. In the settings I +enabled the NewPipe integration and set it to open the video page when tapped: + +{{ figure(image="posts/2024/youtube-subscriptions-opml/feedme-settings.png", link="posts/2024/youtube-subscriptions-opml/feedme-settings.png", alt='Screenshot of the FeedMe integration settings. There are lots of apps listed. The entry for NewPipe is turned on.', caption="Screenshot of the FeedMe integration settings") }} + +Now when viewing an item in FeedMe there is a NewPipe button that I can tap to +watch it: + +{{ figure(image="posts/2024/youtube-subscriptions-opml/feedme.png", link="posts/2024/youtube-subscriptions-opml/feedme.png", alt='Screenshot of FeedMe viewing a video item. In the top left there is a NewPipe button, which when tapped opens the video in NewPipe.', caption="Screenshot of FeedMe viewing a video item") }} + +### Closing Thoughts + +Could I have done all the processing to generate the OPML file with a single +Python file? Yes, but I rarely write Python so I preferred to just cobble +things together from tools I already knew. + +Should I ever become a YouTube Premium subscriber again I can continue to +use this workflow and watch the videos from the YouTube embeds that +Feedbin generates, or open the item in the YouTube app instead of NewPipe. + +Lastly, what about desktop usage? When I'm on a real computer I read my RSS via +the Feedbin web app. It supports [custom sharing +integrations][feedbin-sharing]. In order to open a video on an Invidious +instance I need to rewrite it from a URL like: + + + +to one like: + +. + +I can't do that +directly with a Feedbin custom sharing service definition but it would be +trivial to set up a little redirector application to do it. I even published [a +video on building a very similar thing][url-shortener] last year. Alternatively +I could install a [redirector browser +plugin](https://docs.invidious.io/redirector/), although that would require set +up on each of the computers and OS installs I use so I prefer the former +option. + +[url-shortener]: https://www.youtube.com/watch?v=d-tsfUVg4II +[Invidious]: https://invidious.io/ +[Feedbin]: https://feedbin.com/ +[scraper]: https://github.com/causal-agent/scraper +[repo]: https://forge.wezm.net/wezm/youtube-to-opml +[NewPipe]: https://newpipe.net/ +[subscriptions]: https://www.youtube.com/feed/channels +[jaq]: https://github.com/01mf02/jaq +[jq]: https://jqlang.github.io/jq/ +[FeedMe]: https://play.google.com/store/apps/details?id=com.seazon.feedme +[feedbin-sharing]: https://feedbin.com/help/sharing-read-it-later-services/ +[OpenGraph]: https://ogp.me/