Add YouTube subscription to OPML tooling
This commit is contained in:
parent
b10a7442c9
commit
1fb35e8048
5 changed files with 121 additions and 0 deletions
1
.gitignore
vendored
1
.gitignore
vendored
|
@ -1,2 +1,3 @@
|
||||||
/html
|
/html
|
||||||
/json
|
/json
|
||||||
|
/subscriptions.*
|
||||||
|
|
20
Makefile
Normal file
20
Makefile
Normal file
|
@ -0,0 +1,20 @@
|
||||||
|
JAQ?=jaq
|
||||||
|
|
||||||
|
subscriptions.txt: subscriptions.json
|
||||||
|
$(JAQ) --raw-output '.[]' subscriptions.json > $@
|
||||||
|
|
||||||
|
subscriptions.curl: subscriptions.json
|
||||||
|
$(JAQ) --raw-output '.[] | (split("/") | last) as $$name | . | "url \(.)\noutput \($$name).html"' subscriptions.json > $@
|
||||||
|
|
||||||
|
fetch: subscriptions.curl
|
||||||
|
curl --location --output-dir html --create-dirs --rate 1/s --config subscriptions.curl
|
||||||
|
|
||||||
|
channel-json: subscriptions.txt
|
||||||
|
# nproc is not portable :-/
|
||||||
|
xargs -n1 --max-procs=$$(nproc) --arg-file subscriptions.txt --verbose ./generate-json-opml
|
||||||
|
|
||||||
|
# turn all the channel json files into an OPML file
|
||||||
|
subscriptions.opml:
|
||||||
|
./generate-opml > $@
|
||||||
|
|
||||||
|
.PHONY: channel-json subscriptions.opml
|
39
README.md
Normal file
39
README.md
Normal file
|
@ -0,0 +1,39 @@
|
||||||
|
# YouTube Subscriptions to OPML
|
||||||
|
|
||||||
|
This repo contains a small collection of scripts that I used to turn my YouTube subscriptions into an OPML file for import in to [Feedbin].
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
The scripts have only been run on a Linux system using GNU coreutils. They will
|
||||||
|
probably need some tweaking to run on other UNIX-like systems.
|
||||||
|
|
||||||
|
- [Scraper](https://lib.rs/crates/scraper)
|
||||||
|
- [jaq](https://github.com/01mf02/jaq)
|
||||||
|
- curl
|
||||||
|
- Python
|
||||||
|
- awk
|
||||||
|
- GNU make (I haven't tested non-GNU make)
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
1. Visit your [subscriptions page](https://www.youtube.com/feed/channels)
|
||||||
|
2. Repeatedly scroll to the end of the page to make them all load
|
||||||
|
3. Run the following in the JavaScript console to copy the list of subscriptions to you clipboard as JSON array:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
copy(JSON.stringify(Array.from(new Set(Array.prototype.map.call(document.querySelectorAll('a.channel-link'), (link) => link.href))).filter((x) => !x.includes('/channel/')), null, 2))
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note:** I only tested the above on Firefox.
|
||||||
|
|
||||||
|
Also why do this instead of processing the subscriptions.csv from Google Takeout?
|
||||||
|
|
||||||
|
1. Takeout generates multiple gigabytes of archives I have to download to get the CSV file.
|
||||||
|
2. It's slow to generate. This process can be done whenever you want.
|
||||||
|
|
||||||
|
4. Paste the list of subscriptions into `subscriptions.json`.
|
||||||
|
5. Run `make fetch` to fetch the channel pages of all the subscriptions. This only needs to be run once.
|
||||||
|
6. Run `make channel-json` to extract info from each channel page.
|
||||||
|
7. Run `make subscriptions.opml` to generate the OPML file.
|
||||||
|
|
||||||
|
[Feedbin]: https://feedbin.com/
|
20
generate-json-opml
Executable file
20
generate-json-opml
Executable file
|
@ -0,0 +1,20 @@
|
||||||
|
#!/bin/sh
|
||||||
|
|
||||||
|
set -eu
|
||||||
|
|
||||||
|
URL="$1"
|
||||||
|
NAME=$(echo "$URL" | awk -F / '{ print $NF }')
|
||||||
|
HTML="html/${NAME}.html"
|
||||||
|
CHANNEL_ID=$(scraper -a content 'meta[property="og:url"]' < "$HTML" | awk -F / '{ print $NF }')
|
||||||
|
TITLE=$(scraper -a content 'meta[property="og:title"]' < "$HTML")
|
||||||
|
XML_URL="https://www.youtube.com/feeds/videos.xml?channel_id=${CHANNEL_ID}"
|
||||||
|
|
||||||
|
json_escape() {
|
||||||
|
echo "$1" | jaq --raw-input .
|
||||||
|
}
|
||||||
|
|
||||||
|
JSON_TITLE=$(json_escape "$TITLE")
|
||||||
|
JSON_XML_URL=$(json_escape "$XML_URL")
|
||||||
|
JSON_URL=$(json_escape "$URL")
|
||||||
|
|
||||||
|
printf '{"title": %s, "xmlUrl": %s, "htmlUrl": %s}\n' "$JSON_TITLE" "$JSON_XML_URL" "$JSON_URL" > json/"$NAME".json
|
41
generate-opml
Executable file
41
generate-opml
Executable file
|
@ -0,0 +1,41 @@
|
||||||
|
#!/usr/bin/env python
|
||||||
|
|
||||||
|
import email.utils
|
||||||
|
import glob
|
||||||
|
import json
|
||||||
|
import xml.etree.ElementTree as ET
|
||||||
|
|
||||||
|
# This is what we're aiming to generate:
|
||||||
|
#
|
||||||
|
# <?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
# <opml version="1.0">
|
||||||
|
# <head>
|
||||||
|
# <title>RSS subscriptions for wes@wezm.net</title>
|
||||||
|
# <dateCreated>Sun, 05 May 2024 02:54:31 +0000</dateCreated>
|
||||||
|
# <ownerEmail>wes@wezm.net</ownerEmail>
|
||||||
|
# </head>
|
||||||
|
# <body>
|
||||||
|
# <outline text="3D Printing" title="3D Printing">
|
||||||
|
# <outline text="CadHub Blog" title="CadHub Blog" type="rss" xmlUrl="https://learn.cadhub.xyz/blog/rss.xml" htmlUrl="https://learn.cadhub.xyz/blog"/>
|
||||||
|
# </outline>
|
||||||
|
# </body>
|
||||||
|
# </opml>
|
||||||
|
|
||||||
|
opml = ET.Element("opml")
|
||||||
|
|
||||||
|
head = ET.SubElement(opml, "head")
|
||||||
|
title = ET.SubElement(head, "title")
|
||||||
|
title.text = "YouTube Subscription"
|
||||||
|
dateCreated = ET.SubElement(head, "dateCreated")
|
||||||
|
dateCreated.text = email.utils.formatdate(timeval=None, localtime=True)
|
||||||
|
|
||||||
|
body = ET.SubElement(opml, "body")
|
||||||
|
youtube = ET.SubElement(body, "outline", {"title": "YouTube", "text": "YouTube"})
|
||||||
|
|
||||||
|
for path in glob.glob("json/*.json"):
|
||||||
|
with open(path) as f:
|
||||||
|
info = json.load(f)
|
||||||
|
ET.SubElement(youtube, "outline", info, type="rss", text=info["title"])
|
||||||
|
|
||||||
|
ET.indent(opml)
|
||||||
|
print(ET.tostring(opml, encoding="unicode", xml_declaration=True))
|
Loading…
Reference in a new issue