diff --git a/v2/content/posts/2020/turning-one-hundred-tweets-into-a-blog-post.md b/v2/content/posts/2020/turning-one-hundred-tweets-into-a-blog-post.md new file mode 100644 index 0000000..67a4256 --- /dev/null +++ b/v2/content/posts/2020/turning-one-hundred-tweets-into-a-blog-post.md @@ -0,0 +1,318 @@ ++++ +title = "Turning One Hundred Tweets Into a Blog Post" +date = 2020-11-03T11:40:00+11:00 + +[extra] +#updated = 2020-06-19T09:30:00+10:00 ++++ + +Near the conclusion of my [#100binaries] Twitter series I started working on +[the blog post that contained all the tweets](@/posts/2020/100-rust-binaries/index.md). +It ended up posing a number of interesting challenges and design decisions, as +well as a couple of Rust binaries. Whilst I don't think the process was +necessary optimal I thought I'd share the process to show my approach to +solving the problem. Perhaps the tools used and approach taken is +interesting to others. + + + +My initial plan was to use Twitter embeds. Given a tweet URL it's relatively +easy to turn it into some HTML markup. By including Twitter's embed JavaScript +on the page the markup turns into rich Twitter embed. However there were a few +things I didn't like about this option: + +* The page was going to end up massive, even split across a couple of pages + because the Twitter JS was loading all the images for each tweet up front. +* I didn't like relying on JavaScript for the page to render media. +* I didn't really want to include Twitter's JavaScript (it's likely it would be + blocked by visitors with an ad blocker anyway). + +So I decided I'd render the content myself. I also decided that I'd host the +original screenshots and videos instead of saving them from the tweets. This +was relatively time consuming as they were across a couple of computers and +not named well but I found them all in the end. + +To ensure the page wasn't enormous I used the [`loading="lazy"`][lazy-loading] +attribute on images. This is a relatively new attribute that tells the browser +to delay loading of images until they're within some threshold of the +view port. It currently works in Firefox and Chrome. + +I used `preload="none"` on videos to ensure video data was only loaded if the +visitor attempted to play it. + +To prevent the blog post from being too long/heavy I split it across two pages. + +### Collecting All the Tweet URLs + +With the plan in mind the first step was getting the full list of tweets. For +better or worse I decided to avoid using any of Twitter's APIs that require +authentication. Instead I turned to [nitter] (an alternative Twitter +front-end) for its simple markup and JS free rendering. + +For each page of [search results for '#100binaries from:@wezm'][search] I ran +the following in the JS Console in Firefox: + +```javascript +tweets = [] +document.querySelectorAll('.tweet-date a').forEach(a => tweets.push(a.href)) +copy(tweets.join("\n")) +``` + +and pasted the result into [tweets.txt] in Neovim. + +When all pages had be processed I turned the nitter.net URLs in to twitter.com URLs: +`:%s/nitter\.net/twitter.com/`. + +This tells Neovim: for every line (`%`) substitute (`s`) `nitter.net` with `twitter.com`. + +### Turning Tweet URLs Into Tweet Content + +Now I needed to turn the tweet URLs into tweet content. In hindsight it may +have been better to use [Twitter's GET statuses/show/:id][get-status] API to do +this (possibly via [twurl]) but that is not what I did. Onwards! + +I used the unauthenticated [oEmbed API][oembed] to get some markup for each +tweet. `xargs` was used to take a line from `tweets.txt` and make the API +(HTTP) request with `curl`] + +``` +xargs -I '{url}' -a tweets.txt -n 1 curl https://api.twitter.com/1/statuses/oembed.json\?omit_script\=true\&dnt\=true\&lang\=en\&url\=\{url\} > tweets.json +``` + +This tells `xargs` to replace occurrences of `{url}` in the command with a line +(`-n 1`) read from `tweets.txt` (`-a tweets.txt`). + +The result of one of these API requests is JSON like this (formatted with +[`jq`][jq] for readability): + +```json +{ + "url": "https://twitter.com/wezm/status/1322855912076386304", + "author_name": "Wesley Moore", + "author_url": "https://twitter.com/wezm", + "html": "
\n", + "width": 550, + "height": null, + "type": "rich", + "cache_age": "3153600000", + "provider_name": "Twitter", + "provider_url": "https://twitter.com", + "version": "1.0" +} +``` + +The output from `xargs` is lots of these JSON objects all concatenated +together. I needed to turn [tweets.json] into an array of objects to make it +valid JSON. I opened up the file in Neovim and: + +* Added commas between the JSON objects: `%s/}{/},\r{/g`. + * This is, substitute `}{` with `},{` and a newline (`\r`), multiple times (`/g`). +* Added `[` and `]` to start and end of the file. + +I then reversed the order of the objects and formatted the document with `jq` (from within Neovim): `%!jq '.|reverse' -`. + +This filters the whole file though a command (`%!`). The command is `jq` and it +filters the entire document `.`, read from stdin (`-`), through the `reverse` +filter to reverse the order of the array. `jq` automatically pretty prints. + +It would have been better to have reversed `tweets.txt` but I didn't +realise they were in reverse chronological ordering until this point and +doing it this way avoided making another 100 HTTP requests. + +### Rendering tweets.json + +I created a custom [Zola shortcode][shortcode], [tweet_list] that reads +`tweets.json` and renders each item in an ordered list. It evolved over time as +I kept adding more information to the JSON file. It allowed me to see how +the blog post looked as I implemented the following improvements. + +### Expanding t.co Links + +{% aside(title="You used Rust for this!?", float="right") %} +This is the sort of thing that would be well suited to a scripting language +too. These days I tend to reach for Rust, even for little tasks like this. +It's what I'm most familiar with nowadays and I can mostly write a "script" +like this off the cuff with little need to refer to API docs. +{% end %} + +The markup Twitter returns is full of `t.co` redirect links. I wanted to avoid +sending my visitors through the Twitter redirect so I needed to expand these +links to their target. I whipped up a little Rust program to do this: +[expand-t-co]. It finds all `t.co` links with a regex +(`https://t\.co/[a-zA-Z0-9]+`) and replaces each occurrence with the target +of the link. + +The target URL is determined by making making a HTTP HEAD request for the +`t.co` URL and noting the value of the `Location` header. The tool +caches the result in a `HashMap` to avoid repeating a request for +the same `t.co` URL if it's encountered again. + +I used the [ureq] crate to make the HTTP requests. Arguably it would have been +better to use an async client so that more requests were made in parallel but +that was added complexity I didn't want to deal with for a mostly one-off +program. + +### Adding the Media + +At this point I did a lot of manual work to find all the screenshots and videos +that I shared in the tweets and [added them to my blog][media-files]. I also +renamed them after the tool they depicted. As part of this process I noted the +source of media files that I didn't create in a `"media_source"` key in +`tweets.json` so that I could attribute them. I also added a `"media"` key with +the name of the media file for each binary. + +Some of the externally sourced images were animated GIFs, which lack +playback controls and are very inefficient file size wise. Whenever I encountered an +animated GIF I converted it to an MP4 with `ffmpeg`, resulting in large space savings: + +``` +ffmpeg -i ~/Downloads/so.gif -movflags faststart -pix_fmt yuv420p -vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" so.mp4 +``` + +This converts `so.gif` to `so.mp4` and ensures the dimensions are a divisible +by 2, which is apparently a requirement of H.264 streams encapsulated in MP4. I +worked out how to do this from:Day 100 of #100binaries
— Wesley Moore (@wezm) November 1, 2020
Today I'm featuring the Rust compiler — the binary that made the previous 99 fast, efficient, user-friendly, easy-to-build, and reliable binaries possible.
Thanks to all the people that have worked on it past, present, and future. https://t.co/aBEdLE87eq pic.twitter.com/jzyJtIMGn1