mirror of
https://github.com/wezm/wezm.net.git
synced 2024-11-18 04:42:47 +00:00
Add turning-one-hundred-tweets-into-a-blog-post post
This commit is contained in:
parent
59c8c21b14
commit
8acfbbcba6
2 changed files with 319 additions and 1 deletions
|
@ -0,0 +1,318 @@
|
|||
+++
|
||||
title = "Turning One Hundred Tweets Into a Blog Post"
|
||||
date = 2020-11-03T11:40:00+11:00
|
||||
|
||||
[extra]
|
||||
#updated = 2020-06-19T09:30:00+10:00
|
||||
+++
|
||||
|
||||
Near the conclusion of my [#100binaries] Twitter series I started working on
|
||||
[the blog post that contained all the tweets](@/posts/2020/100-rust-binaries/index.md).
|
||||
It ended up posing a number of interesting challenges and design decisions, as
|
||||
well as a couple of Rust binaries. Whilst I don't think the process was
|
||||
necessary optimal I thought I'd share the process to show my approach to
|
||||
solving the problem. Perhaps the tools used and approach taken is
|
||||
interesting to others.
|
||||
|
||||
<!-- more -->
|
||||
|
||||
My initial plan was to use Twitter embeds. Given a tweet URL it's relatively
|
||||
easy to turn it into some HTML markup. By including Twitter's embed JavaScript
|
||||
on the page the markup turns into rich Twitter embed. However there were a few
|
||||
things I didn't like about this option:
|
||||
|
||||
* The page was going to end up massive, even split across a couple of pages
|
||||
because the Twitter JS was loading all the images for each tweet up front.
|
||||
* I didn't like relying on JavaScript for the page to render media.
|
||||
* I didn't really want to include Twitter's JavaScript (it's likely it would be
|
||||
blocked by visitors with an ad blocker anyway).
|
||||
|
||||
So I decided I'd render the content myself. I also decided that I'd host the
|
||||
original screenshots and videos instead of saving them from the tweets. This
|
||||
was relatively time consuming as they were across a couple of computers and
|
||||
not named well but I found them all in the end.
|
||||
|
||||
To ensure the page wasn't enormous I used the [`loading="lazy"`][lazy-loading]
|
||||
attribute on images. This is a relatively new attribute that tells the browser
|
||||
to delay loading of images until they're within some threshold of the
|
||||
view port. It currently works in Firefox and Chrome.
|
||||
|
||||
I used `preload="none"` on videos to ensure video data was only loaded if the
|
||||
visitor attempted to play it.
|
||||
|
||||
To prevent the blog post from being too long/heavy I split it across two pages.
|
||||
|
||||
### Collecting All the Tweet URLs
|
||||
|
||||
With the plan in mind the first step was getting the full list of tweets. For
|
||||
better or worse I decided to avoid using any of Twitter's APIs that require
|
||||
authentication. Instead I turned to [nitter] (an alternative Twitter
|
||||
front-end) for its simple markup and JS free rendering.
|
||||
|
||||
For each page of [search results for '#100binaries from:@wezm'][search] I ran
|
||||
the following in the JS Console in Firefox:
|
||||
|
||||
```javascript
|
||||
tweets = []
|
||||
document.querySelectorAll('.tweet-date a').forEach(a => tweets.push(a.href))
|
||||
copy(tweets.join("\n"))
|
||||
```
|
||||
|
||||
and pasted the result into [tweets.txt] in Neovim.
|
||||
|
||||
When all pages had be processed I turned the nitter.net URLs in to twitter.com URLs:
|
||||
`:%s/nitter\.net/twitter.com/`.
|
||||
|
||||
This tells Neovim: for every line (`%`) substitute (`s`) `nitter.net` with `twitter.com`.
|
||||
|
||||
### Turning Tweet URLs Into Tweet Content
|
||||
|
||||
Now I needed to turn the tweet URLs into tweet content. In hindsight it may
|
||||
have been better to use [Twitter's GET statuses/show/:id][get-status] API to do
|
||||
this (possibly via [twurl]) but that is not what I did. Onwards!
|
||||
|
||||
I used the unauthenticated [oEmbed API][oembed] to get some markup for each
|
||||
tweet. `xargs` was used to take a line from `tweets.txt` and make the API
|
||||
(HTTP) request with `curl`]
|
||||
|
||||
```
|
||||
xargs -I '{url}' -a tweets.txt -n 1 curl https://api.twitter.com/1/statuses/oembed.json\?omit_script\=true\&dnt\=true\&lang\=en\&url\=\{url\} > tweets.json
|
||||
```
|
||||
|
||||
This tells `xargs` to replace occurrences of `{url}` in the command with a line
|
||||
(`-n 1`) read from `tweets.txt` (`-a tweets.txt`).
|
||||
|
||||
The result of one of these API requests is JSON like this (formatted with
|
||||
[`jq`][jq] for readability):
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://twitter.com/wezm/status/1322855912076386304",
|
||||
"author_name": "Wesley Moore",
|
||||
"author_url": "https://twitter.com/wezm",
|
||||
"html": "<blockquote class=\"twitter-tweet\" data-lang=\"en\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">Day 100 of <a href=\"https://twitter.com/hashtag/100binaries?src=hash&ref_src=twsrc%5Etfw\">#100binaries</a><br><br>Today I'm featuring the Rust compiler — the binary that made the previous 99 fast, efficient, user-friendly, easy-to-build, and reliable binaries possible.<br><br>Thanks to all the people that have worked on it past, present, and future. <a href=\"https://t.co/aBEdLE87eq\">https://t.co/aBEdLE87eq</a> <a href=\"https://t.co/jzyJtIMGn1\">pic.twitter.com/jzyJtIMGn1</a></p>— Wesley Moore (@wezm) <a href=\"https://twitter.com/wezm/status/1322855912076386304?ref_src=twsrc%5Etfw\">November 1, 2020</a></blockquote>\n",
|
||||
"width": 550,
|
||||
"height": null,
|
||||
"type": "rich",
|
||||
"cache_age": "3153600000",
|
||||
"provider_name": "Twitter",
|
||||
"provider_url": "https://twitter.com",
|
||||
"version": "1.0"
|
||||
}
|
||||
```
|
||||
|
||||
The output from `xargs` is lots of these JSON objects all concatenated
|
||||
together. I needed to turn [tweets.json] into an array of objects to make it
|
||||
valid JSON. I opened up the file in Neovim and:
|
||||
|
||||
* Added commas between the JSON objects: `%s/}{/},\r{/g`.
|
||||
* This is, substitute `}{` with `},{` and a newline (`\r`), multiple times (`/g`).
|
||||
* Added `[` and `]` to start and end of the file.
|
||||
|
||||
I then reversed the order of the objects and formatted the document with `jq` (from within Neovim): `%!jq '.|reverse' -`.
|
||||
|
||||
This filters the whole file though a command (`%!`). The command is `jq` and it
|
||||
filters the entire document `.`, read from stdin (`-`), through the `reverse`
|
||||
filter to reverse the order of the array. `jq` automatically pretty prints.
|
||||
|
||||
It would have been better to have reversed `tweets.txt` but I didn't
|
||||
realise they were in reverse chronological ordering until this point and
|
||||
doing it this way avoided making another 100 HTTP requests.
|
||||
|
||||
### Rendering tweets.json
|
||||
|
||||
I created a custom [Zola shortcode][shortcode], [tweet_list] that reads
|
||||
`tweets.json` and renders each item in an ordered list. It evolved over time as
|
||||
I kept adding more information to the JSON file. It allowed me to see how
|
||||
the blog post looked as I implemented the following improvements.
|
||||
|
||||
### Expanding t.co Links
|
||||
|
||||
{% aside(title="You used Rust for this!?", float="right") %}
|
||||
This is the sort of thing that would be well suited to a scripting language
|
||||
too. These days I tend to reach for Rust, even for little tasks like this.
|
||||
It's what I'm most familiar with nowadays and I can mostly write a "script"
|
||||
like this off the cuff with little need to refer to API docs.
|
||||
{% end %}
|
||||
|
||||
The markup Twitter returns is full of `t.co` redirect links. I wanted to avoid
|
||||
sending my visitors through the Twitter redirect so I needed to expand these
|
||||
links to their target. I whipped up a little Rust program to do this:
|
||||
[expand-t-co]. It finds all `t.co` links with a regex
|
||||
(`https://t\.co/[a-zA-Z0-9]+`) and replaces each occurrence with the target
|
||||
of the link.
|
||||
|
||||
The target URL is determined by making making a HTTP HEAD request for the
|
||||
`t.co` URL and noting the value of the `Location` header. The tool
|
||||
caches the result in a `HashMap` to avoid repeating a request for
|
||||
the same `t.co` URL if it's encountered again.
|
||||
|
||||
I used the [ureq] crate to make the HTTP requests. Arguably it would have been
|
||||
better to use an async client so that more requests were made in parallel but
|
||||
that was added complexity I didn't want to deal with for a mostly one-off
|
||||
program.
|
||||
|
||||
### Adding the Media
|
||||
|
||||
At this point I did a lot of manual work to find all the screenshots and videos
|
||||
that I shared in the tweets and [added them to my blog][media-files]. I also
|
||||
renamed them after the tool they depicted. As part of this process I noted the
|
||||
source of media files that I didn't create in a `"media_source"` key in
|
||||
`tweets.json` so that I could attribute them. I also added a `"media"` key with
|
||||
the name of the media file for each binary.
|
||||
|
||||
Some of the externally sourced images were animated GIFs, which lack
|
||||
playback controls and are very inefficient file size wise. Whenever I encountered an
|
||||
animated GIF I converted it to an MP4 with `ffmpeg`, resulting in large space savings:
|
||||
|
||||
```
|
||||
ffmpeg -i ~/Downloads/so.gif -movflags faststart -pix_fmt yuv420p -vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" so.mp4
|
||||
```
|
||||
|
||||
This converts `so.gif` to `so.mp4` and ensures the dimensions are a divisible
|
||||
by 2, which is apparently a requirement of H.264 streams encapsulated in MP4. I
|
||||
worked out how to do this from: <https://unix.stackexchange.com/a/294892/5444>
|
||||
|
||||
I also wanted to know the media dimensions for each file so that I could have them
|
||||
scaled properly on the page — most images are HiDPI and need to be presented at
|
||||
half their pixel width to appear the right size.
|
||||
|
||||
For this I used `ffprobe`, which is part of `ffmpeg`. I originally planned to
|
||||
use another tool to handle images (as opposed to videos) but it turns out
|
||||
`ffprobe` handles them too.
|
||||
|
||||
Since I wanted to update the values of JSON objects in `tweets.json` I opted to
|
||||
parse the JSON this time. Again I whipped up a little Rust "script":
|
||||
[add-media-dimensions]. It parses `tweets.json` and for each object in the
|
||||
array runs `ffprobe` on the media file, like this:
|
||||
|
||||
```
|
||||
ffprobe -v quiet -print_format json -show_format -show_streams file.mp4
|
||||
```
|
||||
|
||||
I learned how to do this from: <https://stackoverflow.com/a/11236144/38820>
|
||||
|
||||
With this invocation `ffprobe` produces JSON so `add-media-dimensions` also
|
||||
parses that and adds the width and height values to `tweets.json`. At the end
|
||||
the updated JSON document is printed to stdout. This turned out to be a handy
|
||||
sanity check as it detected a couple of copy/paste errors and typos in the
|
||||
manually added `"media"` values.
|
||||
|
||||
### Cleaning Up pic.twitter.com Links
|
||||
|
||||
The oEmbed markup that Twitter returns includes links for each piece of media. Now that
|
||||
I'm handling that myself these can be deleted. Neovim is used for this:
|
||||
|
||||
```
|
||||
:%s/ <a href=\\"https:\/\/twitter\.com[^"]\+\(photo\|video\)[^"]\+">pic.twitter.com[^<]\+<\/a>//
|
||||
```
|
||||
|
||||
For each line of the file (`%`) substitute (`s`) matches with nothing. And that
|
||||
took care of them. Yes I'm matching HTML with a regex, no you shouldn't do this
|
||||
for something that's part of a program. For one-off text editing it's fine
|
||||
though, especially since you can eyeball the differences with `git diff`, or in
|
||||
my case `tig status`.
|
||||
|
||||
### Adding a HiDPI Flag
|
||||
|
||||
I initially tried using a heuristic in `tweet_list` to determine if a media
|
||||
file was HiDPI or not but there were a few exceptions to the rule. I decided to
|
||||
add a `"hidpi"` value to the JSON to indicate if it was HiDPI media or not. A
|
||||
bit of trial and error with [jq] led to this:
|
||||
|
||||
```
|
||||
jq 'map(. + if .width > 776 then {hidpi: true} else {hidpi:false} end)' tweets.json > tweets-hidpi.json
|
||||
```
|
||||
|
||||
If the image is greater then 776 pixels wide then set the `hidpi` property to
|
||||
`true`, otherwise `false`. 776 was picked via visual inspection of the rendered
|
||||
page. Once satisfied with the result I examined the rendered result and flipped
|
||||
the `hidpi` value on some items where the heuristic was wrong.
|
||||
|
||||
### Adding alt Text
|
||||
|
||||
[Di], ever my good conscience when it comes to such things enquired at one
|
||||
point if I'd added `alt` text to the images. I was on the fence since the
|
||||
images were mostly there to show what the tools looked like — I didn't think
|
||||
they were really essential content — but she made a good argument for including
|
||||
some `alt` text even if it was fairly simplistic.
|
||||
|
||||
I turned to `jq` again to add a basic `"media_description"` to the JSON,
|
||||
which `tweet_list` would include as `alt` text:
|
||||
|
||||
```
|
||||
jq 'map(. + {media_description: ("Screenshot of " + (.media // "????" | sub(".(png|gif|mp4|jpg)$"; "")) + " running in a terminal.")})' tweets.json > tweets-alt.json
|
||||
```
|
||||
|
||||
For each object in the JSON array it adds a `media_description` key with a
|
||||
value derived from the `media` key (the file name with the extension removed).
|
||||
If the object doesn't have a `media` value then it is defaulted to "????"
|
||||
(`.media // "????"`).
|
||||
|
||||
After these initial descriptions were added I went though the rendered page and
|
||||
updated the text of items where the description was incorrect or inadequate.
|
||||
|
||||
### Video Poster Images
|
||||
|
||||
As it stood all the videos were just white boxes with playback controls since I
|
||||
has used `preload="none"` to limit the data usage of the page. I decided to pay
|
||||
the cost of the larger page weight and add poster images to each of the videos.
|
||||
I used `ffmpeg` to extract the first frame of each video as a PNG:
|
||||
|
||||
```
|
||||
for m in *.mp4; do ffmpeg -i $m -vf "select=1" -vframes 1 $m.png; done
|
||||
```
|
||||
|
||||
I learned how to do this from: <https://superuser.com/a/1010108>
|
||||
|
||||
I then converted the PNGs to JPEGs for smaller files. I could have generated
|
||||
JPEGs directly from `ffmpeg` but I didn't know how to control the quality — I
|
||||
wanted a relatively low quality for smaller files.
|
||||
|
||||
```
|
||||
for f in *.mp4.png; do convert "$f" -quality 60 $f.jpg ; done
|
||||
```
|
||||
|
||||
This produced files named `filename.mp4.png.jpg`. I'm yet to memorise how to
|
||||
manipulate file extensions in `zsh`, despite having [been told how to do
|
||||
it][zsh-ext], so I did a follow up step to rename them:
|
||||
|
||||
```
|
||||
for f in *.mp4; do mv $f.png.jpg $f.jpg ; done
|
||||
```
|
||||
|
||||
### Wrapping Up
|
||||
|
||||
Lastly I ran [`pngcrush`][pngcrush] on all of the PNGs. It reliably reduces the file size
|
||||
in a lossless manner:
|
||||
|
||||
```
|
||||
for f in *.png; do pngcrush -reduce -ow $f; done
|
||||
```
|
||||
|
||||
With that I did some styling tweaks, added a little commentary and published
|
||||
[the page](@/posts/2020/100-rust-binaries/index.md).
|
||||
|
||||
If you made it this far, thanks for sticking with it to the end. I'm not sure
|
||||
how interesting or useful this post is but if you liked it let me know and I
|
||||
might do more like it in the future.
|
||||
|
||||
[#100binaries]: https://twitter.com/search?q=%23100binaries%20from%3A%40wezm&src=typed_query&f=live
|
||||
[nitter]: https://nitter.net/about
|
||||
[search]: https://nitter.net/search?f=tweets&q=%23100binaries+from%3A%40wezm
|
||||
[tweets.txt]: https://github.com/wezm/wezm.net/blob/master/v2/content/posts/2020/100-rust-binaries/tweets.txt
|
||||
[tweets.json]: https://github.com/wezm/wezm.net/blob/master/v2/content/posts/2020/100-rust-binaries/tweets.json
|
||||
[get-status]: https://developer.twitter.com/en/docs/twitter-api/v1/tweets/post-and-engage/api-reference/get-statuses-show-id
|
||||
[twurl]: https://github.com/twitter/twurl
|
||||
[oembed]: https://developer.twitter.com/en/docs/twitter-api/v1/tweets/post-and-engage/api-reference/get-statuses-oembed
|
||||
[jq]: https://stedolan.github.io/jq/
|
||||
[expand-t-co]: https://github.com/wezm/expand-t-co
|
||||
[media-files]: https://github.com/wezm/wezm.net/tree/master/v2/content/posts/2020/100-rust-binaries
|
||||
[add-media-dimensions]: https://github.com/wezm/add-media-dimensions
|
||||
[shortcode]: https://www.getzola.org/documentation/content/shortcodes/
|
||||
[tweet_list]: https://github.com/wezm/wezm.net/blob/master/v2/templates/shortcodes/tweet_list.html
|
||||
[Di]: https://didoesdigital.com/
|
||||
[zsh-ext]: https://twitter.com/Sasha_Boyd/status/1300666988608454656
|
||||
[ureq]: https://github.com/algesten/ureq
|
||||
[lazy-loading]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img#attr-loading
|
||||
[pngcrush]: https://pmt.sourceforge.io/pngcrush/index.html
|
|
@ -31,7 +31,7 @@ body.home {
|
|||
pre, code {
|
||||
font-family: "Pragmata Pro", "Pragmata Pro Mono", "JetBrains Mono", "Iosevka", "Consolas", monospace;
|
||||
}
|
||||
code {
|
||||
:not(pre) > code {
|
||||
background-color: #ffedf0;
|
||||
padding: 0.1em 0.2em;
|
||||
font-size: 16px;
|
||||
|
|
Loading…
Reference in a new issue