The power of blogging with plain old versioned text

composition.al 2017-12-29

Something I really like about my blogging setup here on composition.al is that my posts are just plain old version-controlled text files. Years ago, Dave Herman remarked on how nice it is to be able to blog using one’s text editor, and I feel the same way. But there’s so much more to it than just being able to write my posts in my editor of choice. For me, the more important thing is that I can make use of the vast number of tools out there for manipulating text, including and especially versioned text. Because of that, I don’t have to wait for my blogging framework to implement this or that feature. I can just use tools that already exist.

Word counts

Here’s an example: let’s say I want to know what the ten longest posts on this blog are.1 If I were using, say, WordPress, I’m not sure how I would go about figuring that out. The interface might or might not expose that information; I might have to install a plugin or something, or manually copy and paste the text of my posts into a tool that will show a word count. But with Octopress, since my posts are just text files, it’s just a couple of quick shell commands chained together and run in the Octopress _posts directory:

1
2
3
4
5
6
7
8
9
10
11
$ wc -w * | sort -n -r | head
  151604 total
    4899 2016-08-31-experiencing-computing-viscerally-my-pg-podcast-interview-about-bangbangcon.markdown
    4526 2013-12-24-the-lvar-that-wasnt.markdown
    4145 2017-05-31-proving-that-safety-critical-neural-networks-do-what-theyre-supposed-to-where-we-are-where-were-going-part-2-of-2.markdown
    3537 2016-11-17-an-economics-analogy-for-why-adversarial-examples-work.markdown
    3328 2013-09-22-some-example-mvar-ivar-and-lvar-programs-in-haskell.markdown
    3313 2013-05-25-how-to-read-from-an-lvar-an-illustrated-guide.markdown
    3196 2017-05-30-proving-that-safety-critical-neural-networks-do-what-theyre-supposed-to-where-we-are-where-were-going-part-1-of-2.markdown
    3095 2015-07-30-a-month-of-daily-check-ins.markdown
    3090 2017-03-31-scaling-bangbangcon.markdown

From the output of that command, I can see that my longest post is “‘Experiencing computing viscerally’: my PG Podcast interview about !!Con” from August 2016, although that one hardly counts because it’s a transcript of an audio interview. The next longest post is “The LVar that wasn’t”, from December 2013 which weighed in at 4526 words. The word count includes a small amount of overhead for stuff like the Markdown front matter, but it’s mostly accurate.

Spelling checks

Let’s say that I want to spell-check all my posts. Since I have aspell installed, that’s pretty easy, too:

1
$ for post in `ls`; do aspell -c $post; done

This launches the aspell interface, in which I can interactively correct typos that it finds in each file, or add words it doesn’t know to my local dictionary. I tried it just now on the whole blog and found misspellings of ‘heterogeneous’, ‘narrative’, ‘necessarily’, ‘difference’, and ‘which’, all of which are now fixed.

History of a post

Because I use git to version all my posts, I can see a version history of any post using git-diff. Let’s say that I want to see all the changes made to the post “Refactoring as a way to understand code” from a couple years back. The command

1
git log -p --word-diff 2015-12-29-refactoring-as-a-way-to-understand-code.markdown

shows me the answer. It turns out that I made a small copy edit about an hour after writing the post, correcting the phrase “get a more visceral sense of what it’s doing” to “give me a more visceral sense of what I’m doing”. A couple of days later, I added the sentence “I can sympathize.” to the post. A year and a half later, I removed the tag “programming” and added “refactoring” instead.

Seeing the most-edited posts

With only three edits after the original post, I suspect that “Refactoring as a way to understand code” is actually one of my less heavily-edited posts. How can I see which ones are most heavily edited? The git-effort command from the wonderful git-extras collection can help. If I want to see all my posts that have been edited twenty times or more, I can do this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ git effort --above 20

  path                                                                                                                                       commits    active days

  2013-03-31-a-new-paper-draft-and-a-debugging-story.markdown............................................................................... 21          7
  2015-11-29-a-better-way-to-add-labels-to-bar-charts-with-matplotlib.markdown.............................................................. 22          10
  2016-07-28-getting-into-a-phd-program-without-previous-research-experience.markdown....................................................... 23          10
  2016-09-29-thoughts-on-adversarial-examples-in-the-physical-world.markdown................................................................ 34          9


  path                                                                                                                                       commits    active days

  2016-09-29-thoughts-on-adversarial-examples-in-the-physical-world.markdown................................................................ 34          9
  2016-07-28-getting-into-a-phd-program-without-previous-research-experience.markdown....................................................... 23          10
  2015-11-29-a-better-way-to-add-labels-to-bar-charts-with-matplotlib.markdown.............................................................. 22          10
  2013-03-31-a-new-paper-draft-and-a-debugging-story.markdown............................................................................... 21          7

git-effort lists those posts for me, first ordered by filename (which, because of the naming convention I use, is also by date) and then ordered by the number of commits. It turns out that I have four posts that have been edited twenty or more times, with “Thoughts on ‘Adversarial examples in the physical world’” being the most edited post by a pretty wide margin. Fascinating!

The tools I use

I use Octopress to generate this blog. When I started the blog in early 2013, Octopress was still quite popular; by 2015, it had fallen far enough out of fashion that a commenter on Hacker News thought my use of it was strange enough that they saw fit to remark on it in a discussion that should have been about more interesting things. I suppose that person would think I’m a dinosaur for still using Octopress even now, as 2017 is coming to an end. There are any number of other static site generators out there that would probably do everything I want and are probably less janky than Octopress, but I keep using it out of inertia and because it still does the job well enough.

I’ll probably switch from Octopress to plain Jekyll at some point, since Octopress is just a wrapper around Jekyll, and I don’t really use most of the features Octopress adds. The particular blogging framework I use is beside the point, though. What’s more important to me is how easy it is to work with my blog using the wider world of command-line text manipulation tools, and it’s the fact that the posts are stored as versioned text that makes that possible. That, to me, is the real power of static site generation tools — it’s not so much about what those tools themselves do as it is about everything else that the posts-as-versioned-text approach enables.

  1. I periodically check what my longest posts have been because I’m constantly worried that My Best Blogging Days Are Behind Me and that I Never Blog About Anything Substantial Anymore. So it’s reassuring to look back and see that my nine longest posts are actually pretty evenly spread over the years that this blog has existed, and in fact, three of them were written within the last year. (Length isn’t the same thing as substance, of course, but there’s no shell command for quantifying substance yet.)