home

Julia Evans

http://jvns.ca/atom.xml
2019-12-05T10:41:54+00:00
https://jvns.ca/atom.xml


Challenge: find Twitter memes with suffix arrays

This challenge is a mix of data analysis and using fun algorithms! It’s the second challenge in a a short series of programming challenge I’m writing with Julian. (the first one was to write a tiny fun window manager)

Twitter has a lot of memes. For example, if you search Twitter for Flight attendant: is there a doctor on this flight?, you’ll find a bunch of tweets making jokes like this:

Flight Attendant: is there a doctor on board?
Parent: *nudging* That should've been you
Me: Not now, this is serious
Parent: Not asking for a hacker to help, are they?
Me: AAAAAAAA\x00\xd0X?\xfc\x7fBBBBj\x0bX\x99Rfh-p\x89\xe1Rjhh/bash/bin\x89\xe3RQS\x89\xe1\xcd\x80
Parent:~#

or if you search as a kpop fan there are thousands of these:

me as a kpop fan 

- kpop fan age: 10 years
- first group ever stan: super junior
- current ult groups: iKON, X1, Day6
- number of albums: >20 
- concerts attended: 6
- lightsticks owned: 2

So! Suppose you have a million tweets from the last 2 days. How do you find the jokes / quizzes / memes people are playing with on Twitter?

Challenge: find the twitter memes in 1 million tweets

This is a pretty open ended challenge and you can do it any way you want. Here’s a SQLite database with 1.2 million tweets, collected from the twitter streaming api over 2 days. It’s 250MB (70MB compressed), it only has English tweets. It excludes retweets and many tweets that are generated by bots.

The challenge: find at least 5 Twitter memes using that dataset.

memes as common substrings

The idea here is that memes are substrings like “me as a kpop fan” that many different people are using. The tricky thing is that you don’t really know how long those substrings will be, and maybe you’re interested in phrases of different lengths.

You can probably do this challenge without using anything fancy (with a hashmap of phrases or something) but I think it’s a nice opportunity to play with a fun data structure: suffix arrays! So let’s talk about what those are.

suffix arrays: sort all suffixes

Suffix arrays sort all suffixes of a string. For example, here’s the suffix array for “plantain” which has the suffixes plantain, lantain, antain, ntain, tain, ain, in, n.

ain
antain
in
lantain
n
ntain
plantain
tain

Representing this as a list of strings would be very inefficient (quadratic space), so instead we replace each suffix with the index of its first character in the original string – [5,2,6,1,7,3,0,4].

5 (ain)
2 (antain)
6 (in)
1 (lantain)
7 (n)
3 (ntain)
0 (plantain)
4 (tain)

Here’s a real example of what a suffix array of 1 million tweets concatenated looks like. This is an excerpt from the middle of the suffix array, with some of the suffixes that start with A little.

...
 A little distracted for a bit ...what do i do w my life hon.........
 A little exercise I did this afternoon.  #comics #art #clip.........
 A little extra Christmas Cash on me! Good Luck to everyone!.........
 A little girl in Savannah, Ga., appears to be the 38th huma.........
 A little heavy on the smut t… https://t.co/nvoxE7SNjTI wa.........
 A little in state battle tonight. #nova vs #penn. two very .........
 A little kiss...” one more time I’m going to vomit. #TT.........
 A little late catching up on last nights @GoodDoctorABC. On.........
 A little less bling never hurt anyone! Next project...🎄 .........
 A little more intensity to augment their talent and a coupl.........
 A little more time, because I have never lived really  - Os.........
 A little mor… https://t.co/kcq3zf9jgeWe love MX ❤️<F0><9F><A7>.........
 A little over 50k! Can We Guess How Much Is In Your Account.........
 A little ray of joy &amp; light in the midst of these very .........
 A little refreshment… https://t.co/HgX8PmYwPIThank you @L.........
 A little respect goes a long way. .........
 A little salt in d country's troubled legal system“Grant................
 A little snow &amp; people lose all common senseromantic st...............
 A little sun for the soul @realfreewebcams https://t.co/3CB...............
 A little sunkissed moment for y’all. ...............
 ....

Again, this is actually represented by a bunch of integer indexes into a concatenated string of all the tweets, like [18238223, 1921812, ...] so it’s a LOT more memory efficient than actually repeating all those strings.

suffix arrays let you find common substrings!

So what does this have to do with Twitter memes? Well, we can basically

  1. concatenate all tweets into a big string
  2. make a suffix array of that string
  3. iterate through the suffix array and notice when you see a lot of repeated substrings, like here:
me as a kpop fan ✨kpop fan age: 15 y/o ✨first group ever stan: blackpink ✨current ult groups: btxt ✨number of albu… https://t.co/24diHX9sLm
me as a kpop fan ⭐k-pop fan age: 12 y/o ⭐first group ever stan: bts ⭐current ult gps: bts and txt ⭐number of albu… https://t.co/8R95roQXoE
me as a kpop fan ⭐k-pop fan age: 14 y/o ⭐first group ever stan: girls generation ⭐current ult gp: txt ⭐number of a… https://t.co/010hLuJscF
me as a kpop fan ⭐k-pop fan age: 14-16 y/o ⭐first group ever stan: bts ⭐current ult gps: bts txt ⭐number of albums… https://t.co/0fDcxZGRrh
me as a kpop fan ⭐k-pop fan age: 15 y/o ⭐first group ever stan: blackpink ⭐current ult gps: txt ⭐number of albums… https://t.co/d8zZL83TvV
me as a kpop fan 🌸 k-pop fan age: 12 years old 🌸 first group ever stan: bts 🌸 current ult gps: bts &amp; wanna one 🌸 n… https://t.co/22R1nJpwNX
me as a kpop fan 🌸k-pop fan age: 10 🌸first group ever stan: 2pm 🌸current ult gps: skz,got7,itzy,twice, 🌸number of… https://t.co/mAluaP2yxH
me as a kpop fan 🌸k-pop fan age: 11 yo 🌸first group ever stan: beast 🌸current ult gps: ateez 🌸number of albums:  1… https://t.co/qxtFHG9HDg
me as a kpop fan 🌸k-pop fan age: 11 🌸first group ever stan: bts 🌸current ult gps: bts and ateez 🌸number of albums:… https://t.co/mKXlkrBBtC
me as a kpop fan 🌸k-pop fan age: 13 (now im 19) 🌸first group ever stan: snsd 🌸current ult gps: nct day6 aoa mamam… https://t.co/8XyQ3r5hwz
me as a kpop fan 🌸k-pop fan age: 13 years 🌸first group ever stan: 2pm,suju,bigbang 🌸current ult gps: bts,tbz,ateez… https://t.co/Zs1nQQz6Lt
me as a kpop fan 🌸k-pop fan age: 14 (2005) 🌸first group ever stan: super junior 🌸current ult gps: exo, gfriend, rv… https://t.co/vgmhe2vFMY
me as a kpop fan 🌸k-pop fan age: 14 y/o 🌸first group ever stan: nct dream 🌸current ult gps: svt and,,*insert stan… https://t.co/I38Ui69PvL
me as a kpop fan 🌸k-pop fan age: 15 y/o 🌸first group ever stan: 5sos 🌸current ult gps: bts and 5sos also some ggs… https://t.co/61ZmRkzmdl
me as a kpop fan 🌸k-pop fan age: 15 y/o 🌸first group ever stan: bts 🌸current ult gps: SVT, GOT7, Day6 🌸number of… https://t.co/16SWb3mSPg
me as a kpop fan 🌸k-pop fan age: 18 🌸first group ever stan: suju &amp; soshi 🌸current ult gps: snsd &amp; izone 🌸number of… https://t.co/SmSBFqJnGk
me as a kpop fan 🌸k-pop fan age: 19 y/o marupok 🌸first group ever stan: APINK 🌸current ult gps: SEVENTEEN 🌸number… https://t.co/StYjxr6uq9
me as a kpop fan 🌸k-pop fan age: 19 🌸first group ever stan: SuJu 🌸current ult gps: SuJu, SF9, SKZ, VIXX, ONEUS, NO… https://t.co/2o2DulCY5b

As an aside, the reason I got interested in suffix arrays in the first place was actually not for finding Twitter memes at all but for search.

I’ve spent a lot of time using Nelson Elhage’s livegrep at work to search code. It creates a suffix array using the divsufsort library. He has a blog post Regular Expression Search with Suffix Arrays where he talks about some of the implementation details.

The reason suffix arrays work for fast search is basically that if you’re looking for the string A little, you can do a binary search over the suffix array to find every instance of A little in your dataset. Binary searches are extremely fast so every search is guaranteed to run very quickly (in less than a microsecond I believe). What livegrep does is more complicated than that because it does a regular expression search, but that’s the idea to start.

There’s another blog post How to use suffix arrays to combat common limitations of full-text search applying suffix arrays to searching through a patent database. In that example, like with code search, the patent officers want to search patents for exact strings.

How do you make a suffix array?

You can use an existing suffix array library, for example index/suffixarray in Go, which is what I used, or divsufsort. There are Python bindings for divsufsort.

If you’re more excited about the data structures/algorithms aspect of suffix arrays you can also implement a suffix array-building algorithm yourself! I did not do this but you can see an implementation of qsufsort here in Go. That implementation links to a paper. There are lots of algorithms for constructing suffix arrays –sais and divsufsort are a couple of others.

5 or so hours, 100 lines of Go

As always with these challenges, I did this one to make sure that it’s both doable in a reasonable amount of time and fun for at least one person (me).

I did this one in about 5 hours and 100 lines of Go using the suffixarray implementation in the Go standard library, with a bit of bash shell scripting to postprocess the results. This is a messy data analysis challenge – as an example of a messy thing, Spotify released their end-of-2019 results while I was building the dataset and so there are a lot of tweets generated by the Spotify app.

My results ended up looking something like this:

5  an Aries and that’s why I gotta 
5  an Aries and that’s why I am so 
5  an Aquarius and that’s why I 
5  AM SO PROUD OF YOU 
5  am not a fan of 
5  am I the only one who 
5  am going to have to 

Then I sifted through them pretty manually to find the Twitter memes.

suffix arrays are used in bioinformatics

This “find twitter memes using suffix arrays” approach is a silly thing but it does have some relationship to reality – DNA sequences are basically really long strings, and biologists need to find patterns in them, and they sometimes use suffix arrays to do it.

I looked up packages in Debian that use libdivsufsort and I found infernal:

Infernal (“INFERence of RNA ALignment”) is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). A CM is like a sequence profile, but it scores a combination of sequence consensus and RNA secondary structure consensus, so in many cases, it is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.

email me the twitter memes you find if you do this!

If you do this exercise, I’d love it if you emailed me (julia@jvns.ca) with the twitter memes you found and/or your code! I found about 8 but I’m sure there are more.

I’ll publish any solutions I get (unless you don’t want me to publish your solution – just let me know!).

Thanks to Julian for discussing suffix arrays and suffix trees and trigram indexes with me at length, and to Kamal who had the idea of using suffix arrays to find Twitter memes.

2019-12-05T10:41:54+00:00


Solutions to the tiny window manager challenge

Hello! Last week I posted a small programming challenge to write a tiny window manager that bounces windows around the screen.

I’ll write a bit about my experience of solving the challenge, or you can just skip to the end to see the solutions.

what’s a window manager?

An X window manager is a program that sends messages to the X server (which is in charge of drawing your windows) to tell it which windows to display and where.

I found out that you can trace those events with xtrace. Here’s some example output from xtrace (for the toy window manager which is just moving windows about)

000:<:02d8: 20: Request(12): ConfigureWindow window=0x004158e5 values={x=560 y=8}
000:<:02da: 20: Request(12): ConfigureWindow window=0x004158e5 values={x=554 y=12}
000:<:02dc: 20: Request(12): ConfigureWindow window=0x004158e5 values={x=548 y=16}
000:<:02de: 20: Request(12): ConfigureWindow window=0x004158e5 values={x=542 y=20}
000:<:02e0: 20: Request(12): ConfigureWindow window=0x004158e5 values={x=536 y=24}
000:<:02e2: 20: Request(12): ConfigureWindow window=0x004158e5 values={x=530 y=28}
000:<:02e4: 20: Request(12): ConfigureWindow window=0x004158e5 values={x=524 y=32}

you can run programs without a window manager

You technically don’t need a window manager to run graphical programs – if you want to start an xterm in a window-manager-less X session you can just run

xterm -display :1

and it’ll start the xterm. Here’s a screenshot of an X session with no window manager open. I even have 2 windows open! (chrome and an xterm). It has some major usability problems, for example I don’t think you can resize or move or switch between windows. Which is where the window manager comes in!

move a window with XMoveWindow

The challenge was to make the window bounce around the screen.

In the tinywm source they use XMoveResizeWindow to move and resize windows, but I found in the docs that there’s also a function called XMoveWindow. Perfect!

Here’s what it looks like. What could be simpler, right? And it works just the way I’d expect!

XMoveWindow(display, windowID, x, y)

Except…

problem: multiple XMoveWindows don’t work

I ran into a problem (which I got stuck on for a couple of hours) where when I ran XMoveWindow twice, it would only apply the last move.

XMoveWindow(display, windowID, 100, 200)
usleep(2000 * 1000); # sleep for 2 seconds
XMoveWindow(display, windowID, 300, 400)

I’d expect this to move the window once, wait 2 seconds, and them move it again. But that was not what happened! Instead, it would pause for 2 seconds and then move the window once (to the second location).

use xtrace to trace window manager events

I used xtrace to trace the events and found out that my ConfigureWindow events that XMoveWindow was sending were all being sent at the same time. So it seemed like X was batching the events. But why?

XSync forces X to process events

I didn’t know why this was happening, but I emailed Julian about it and he pointed me in the direction of XSync, which forces X to process all the events you’ve sent it. Sure enough, I used XSync and everything worked beautifully.

solutions

I asked people to email me if they completed the challenge, and 4 people did! Here are their solutions. All the solutions I got implemented more features than I did, so I’d encourage you to look at all the solutions if you’re interested in how to solve this problem!

Here’s a gif of Alexsey’s solution. Apparently XQuartz on a Mac performs better than Xephyr!

And Aldrin’s solution, with a great use of xeyes:

thanks!

Thanks to everyone who emailed me a solution, and if you write your own implementation I’d love to post it here too, especially if you write one that isn’t in C or Ruby! I’m julia@jvns.ca. (and if you write a solution but don’t want me to post it I’d still love to see it!)

2019-12-03T14:57:10+00:00


Challenge: Write a bouncy window manager

Hello! I’m writing a short series of programming challenges with Julian, and this is the first one!

the challenge

requirements

The goal here is to make a very silly Linux window manager that bounces its windows around the screen, like in the gif above.

anti-requirements

The window manager doesn’t need to do anything else! It doesn’t need to support:

It turns out implementing this kind of toy window manager is surprisingly approachable!

the setup: start with tinywm

All the instructions here only work on Linux (since this is about writing a Linux window manager).

starter kit: tinywm

Writing a window manager from scratch seems intimidating (at first I didn’t even know how to start!). But then I found tinywm, which is a tiny window manager written in only 50 lines of C. This is a GREAT starting point and there’s an annotated version of the source code which explains a lot of the details. There’s a Python version of tinywm too, but I wasn’t able to get it to work.

I did this challenge by modifying tinywm and it worked really well.

tools

documentation

Some useful references:

If you’re not comfortable writing C, there are also libraries that let you work with X in other languages. I personally found C easier to use because a lot of the window manager documentation and examples I found were for the Xlib C library.

my experience: 5 hours, 50 lines of code

To give you a very rough idea of the difficulty of this exercise: I did this in 4 or 5 hours this morning and last night, producing the window manager you see in the gif at the top of the blog post (which is 50 lines of code). I’d never looked at the source code for a window manager before yesterday.

As usual when working with a new library I spent most of that time being confused about various basic things about how X works. (and as a result I learned several new things about X!)

For me this challenge was a fun way to:

send me your solution if you do this!

I’ll post the solution I came up in a week. If you think this window manager challenge sounds fun and end up doing it, I’d love it if you sent me your solution (to julia@jvns.ca)!

I’d be delighted to post any solutions you send me in the solutions blog post.

2019-11-25T18:03:40+00:00


What makes a programming exercise good?

I’ve been thinking about programming exercises lately, because I want to move into teaching people skills. But what makes a good programming exercise? I asked about this on Twitter today and got some useful responses so here are some criteria:

it’s fun

This one is sort of self-explanatory and I think it’s really important. Programming is fun and learning is fun so I can’t see why programming exercises would need to be boring.

it teaches you something you care about

I don’t think this has to strictly mean “relevant to your job right this second” – people don’t just have jobs, we also want to make art and games and fun personal projects and sometimes just understand the world around us. But it’s important to know what goals the exercise can help you with and what it’s related to!

Some arbitrary examples:

it’s a challenge

I don’t know if this is everyone’s experience but I often start programming exercises and get bored quickly (“oh, I know how to do this, this is boring”). For me it’s really important for the exercise to teach me something I really don’t know how to do and that’s a little bit hard for me.

My favourite set of programming exercises is the cryptopals crypto challenges because they get harder pretty fast – by exercise #6, you’re already breaking toy encryption protocols, and by #12 you’re breaking an Actual Encryption Protocol (AES in ECB mode)!

you can tell if you succeeded

It’s easy to write exercises that are too vaguely specified (“write a toy tcp stack!“). But what does that mean? How much of a TCP stack am I supposed to write? Having test cases and clear criteria for “yay! you did it! congratulations!” is really important.

you can do it quickly

In less than 2-3 hours (an evening after work), say. It’s hard to find time to spend like 8 hours on an exercise unless it’s REALLY exciting.

I also think that giving some specific real-world benchmark data seems nice (“I did this from scratch in 97 minutes”).

the author believes in you

This is a bit fuzzier but very lovely – this person on Twitter wrote:

Similar to that, the writing is patient and gives me the impression that it believes in my ability to accomplish the task. … I learned a ton in the early days from Linux HOWTOs. Some gave me the sense that it was impossible to fail. Just follow the steps. It’s all there.

Especially if you’re doing a somewhat challenging exercise like we talked about above, I think it’s nice for the author to believe in your! (and of course it’s crucial that they’ve actually written the exercises so that they’re right and you can likely do the thing!)

it’s been tested

I read the (great) biography Dearie: The Remarkable Life of Julia Child recently and one thing that stood out to me is that she tested all of the recipes in Mastering the Art Of French Cooking. It took her years to write the book and test the recipes and make sure that American home cooks actually had access to all the ingredients and had the.

I don’t think all cookbook authors test their recipes, but I think testing really improves cookbooks.

I started writing some SQL exercises (like this prototype of one on GROUP BY) a while back, and at some point I realized the big thing holding me back was that I didn’t have testers! I couldn’t find out if people were actually learning from them or not!

This is a new thing for me because when I write blog posts I don’t test them (I barely even proofread them!). I just write them and publish and people often like them and that’s it! I said to Amy Hoy (who is amazing) on Twitter that I didn’t understand why you have to test exercises if you don’t have to test blog posts and she pointed out that people have much higher expectations for exercises than for blog posts – with the blog posts you maybe expect to learn 1-2 new facts, but with exercises you expect to actually develop a new skill!

Also, people are often investing a lot more time in exercises (especially if they have to set up a dev environment or something!), so it’s extra important to make sure that they actually work.

you won’t get stuck

It’s SO EASY to get stuck on some random irrelevant point in a programming exercise that’s totally unrelated to the skill you’re trying to learn. For example there might be an easily-avoidable mistake that you can make with the exercise and spend a lot of time debugging but it doesn’t actually teach you a lot.

it’s easy to get help

If you’re doing a challenging exercise, you might want to get help from your friends / colleagues / the internet!

Some things that can go wrong:

One obvious way to accomplish this is by letting people use the programming language they’re most comfortable in, because they probably already know how to Google for help in that environment.

no time-consuming setup required

Installing software is boring, and a lot of programming projects require installing software! A few things that can go wrong with this (though there are a lot more than this!)

This kind of thing is a huge waste of time and super demoralizing. And it’s not trivial to avoid! If you’re trying to teach someone a specific piece of software, often that software

A few options I’ve seen or used to manage this:

it’s easy to extend

@tef has this great talk on Scratch A million things to do with a computer! which explains the 3 ideas of Scratch:

It sucks when you start learning something and then learn that what you can do with the Thing is very limited! It’s exciting when you learn something and see “oh, wow, there are SO MANY POSSIBILITIES, what if I did X instead?”

that’s a lot of things!

The criteria we arrived at:

That seems pretty hard, but it seems like a good goal to aspire to! I’m going to keep very slowly working on exercises!

2019-11-20T20:33:20+00:00


How containers work: overlayfs

I wrote a comic about overlay filesystems for a potential future container zine this morning, and then I got excited about the topic and wanted to write a blog post with more details. Here’s the comic, to start out:

container images are big

Container images can be pretty big (though some are really small, like alpine linux is 2.5MB). Ubuntu 16.04 is about 27MB, and the Anaconda Python distribution is 800MB to 1.5GB.

Every container you start with an image starts out with the same blank slate, as if it made a copy of the image just for that container to use. But for big container images, like that 800MB Anaconda image, making a copy would be both a waste of disk space and pretty slow. So Docker doesn’t make copies – instead it uses an overlay.

how overlays work

Overlay filesystems, also known as “union filesystems” or “union mounts” let you mount a filesystem using 2 directories: a “lower” directory, and an “upper” directory.

Basically:

When a process reads a file, the overlayfs filesystem driver looks in the upper directory and reads the file from there if it’s present. Otherwise, it looks in the lower directory.

When a process writes a file, overlayfs will just write it to the upper directory.

let’s make an overlay with mount!

That was all a little abstract, so let’s make an overlay filesystem and try it out! This is just going to have a few files in it: I’ll make upper and lower directories, and a merged directory to mount the combined filesystem into:

$ mkdir upper lower merged work
$ echo "I'm from lower!" > lower/in_lower.txt 
$ echo "I'm from upper!" > upper/in_upper.txt
$ # `in_both` is in both directories
$ echo "I'm from lower!" > lower/in_both.txt 
$ echo "I'm from upper!" > upper/in_both.txt 

Combining the upper and lower directories is pretty easy: we can just do it with mount!

$ sudo mount -t overlay overlay 
    -o lowerdir=/home/bork/test/lower,upperdir=/home/bork/test/upper,workdir=/home/bork/test/work 
    /home/bork/test/merged

There’s was an extremely annoying error message I kept getting while doing this, that said mount: /home/bork/test/merged: special device overlay does not exist.. This message is a lie, and actually just means that one of the directories I specified was missing (I’d written ~/test/merged but it wasn’t being expanded).

Okay, let’s try to read one of the files from the overlay filesystem! The file in_both.txt exists in both lower/ and upper/, so it should read the file from the upper/ directory.

$ cat merged/in_both.txt 
"I'm from upper!

It worked!

And the contents of our directories are what we’d expect:

find lower/ upper/ merged/
lower/
lower/in_lower.txt
lower/in_both.txt
upper/
upper/in_upper.txt
upper/in_both.txt
merged/
merged/in_lower.txt
merged/in_both.txt
merged/in_upper.txt

what happens when you create a new file?

$ echo 'new file' > merged/new_file
$ ls -l */new_file 
-rw-r--r-- 1 bork bork 9 Nov 18 14:24 merged/new_file
-rw-r--r-- 1 bork bork 9 Nov 18 14:24 upper/new_file

That makes sense, the new file gets created in the upper directory.

what happens when you delete a file?

Reads and writes seem pretty straightforward. But what happens with deletes? Let’s do it!

$ rm merged/in_both.txt

What happened? Let’s look with ls:

ls -l upper/in_both.txt  lower/lower1.txt  merged/lower1.txt
ls: cannot access 'merged/in_both.txt': No such file or directory
-rw-r--r-- 1 bork bork    6 Nov 18 14:09 lower/in_both.txt
c--------- 1 root root 0, 0 Nov 18 14:19 upper/in_both.txt

So:

What happens if we try to copy this weird character device file?

$ sudo cp upper/in_both.txt upper/in_lower.txt
cp: cannot open 'upper/in_both.txt' for reading: No such device or address

Okay, that seems reasonable, being able to copy this weird deletion signal file doesn’t really make sense.

you can mount multiple “lower” directories

Docker images are often composed of like 25 “layers”. Overlayfs supports having multiple lower directories, so you can run

mount -t overlay overlay
      -o lowerdir:/dir1:/dir2:/dir3:...:/dir25,upperdir=...

So I assume that’s how containers with many Docker layers work, it just unpacks each layer into a separate directory and then asks overlayfs to combine them all together together with an empty upper directory that the container will write its changes to it.

docker can also use btrfs snapshots

Right now I’m using ext4, and Docker uses overlayfs snapshots to run containers. But I used to use btrfs, and then Docker would use btrfs copy-on-write snapshots instead. (Here’s a list of when Docker uses which storage drivers)

Using btrfs snapshots this way had some interesting consequences – at some point last year I was running hundreds of short-lived Docker containers on my laptop, and this resulted in me running out of btrfs metadata space (like this person). This was really confusing because I’d never heard of btrfs metadata before and it was tricky to figure out how to clean up my filesystem so I could run Docker containers again. (this docker github issue describes a similar problem with Docker and btrfs)

it’s fun to try out container features in a simple way!

I think containers often seem like they’re doing “complicated” things and I think it’s fun to break them down like this – you can just run one mount incantation without actually doing anything else related to containers at all and see how overlays work!

2019-11-18T13:34:29+00:00


Some notes on vector drawing apps

For the last year and a half I’ve been using the iPad Notability app to draw my zines. Last week I decided I wanted more features, did a bit of research, and decided to switch to Affinity Designer (a much more complicated program). So here are a few quick notes about it.

The main difference between them is that Notability is a simple note taking app (aimed at regular people), and Affinity Designer is a vector graphics app (aimed at illustrators / graphic designers), like Adobe Illustrator.

I’ve never used a serious vector graphics program before, so it’s been cool to learn what kinds of features are available!

Notability is super simple

This is what the Notability UI looks like. There’s a pencil, an eraser, a text tool, and a selection tool. That’s basically it. I LOVED this simplicity when I started using Notability, and I made 4 zines using it (help! i have a manager!, oh shit, git!, bite size networking!, and http: use your browser’s language).

Recently though, I’ve had a couple of problems with it, the main one being that text boxes and things drawn with the pencil tool don’t mix well. (In general Notability has been GREAT though and their support team has always been incredibly helpful when I’ve had questions.)

Affinity Designer is really complicated

Affinity Designer, by comparison, is WAY more complicated. Here’s what the UI looks like:

There are

I still don’t understand what all the tools do (what’s the difference between Pencil and Vector Brush? I don’t know!). But I’m pretty excited about this because (unlike with Notability) there are so many options that if I’m frustrated about something, 90% of the time there’s a way to do the thing I want!

switching from Notability to Affinity Designer is really easy

Switching to Notability wasn’t the best: I reverse engineered the file format to transfer some files over but the quality was never the best (probably because of problems with my script) and I ended up having to redraw a lot of them in practice.

With Affinity Designer, I can just

It’s not perfect – the vector paths it comes up with are kind of weird, probably because of the way the PDF is – but it’s very good! It makes me feel confident that if I need to make a small edit to something I made in the past I can just import the PDF!

what can a vector drawing app do?

here are a few things Affinity Designer can do that Notability can’t:

There are also a LOT more features that I’m not interested in but I’m pretty excited about those 6 things and it feels like an app that I won’t grow out of.

iPad apps are great

I’ve been exclusively using Linux for the last 15 years where the image editing/media tools aren’t always great (though I really like Inkscape and I hear good things about Krita!), so it’s really cool to have access to all these great iPad apps. And the prices seem pretty reasonable:

It doesn’t make me want a Mac (I like the Linux desktop experience!), but it’s nice to have access to a bunch of these great tools. And I think a lot of these art tools work better on an iPad than on a computer anyway since you can just draw on the screen :)

2019-11-18T10:47:41+00:00


Some research on shipping print zines

I’ve been doing some preliminary research on shipping printed zines, since Your Linux Toolbox is out now and a bunch more people have been asking about print copies of my other zines. I thought I’d write down what I’ve learned so far because it turns out shipping is pretty complicated!

My original question I was trying to answer was “can I ship a single zine anywhere in the world for less than $6 or so?“, so let’s start there.

Surprisingly the best single resource I found was this very extensive PCMag article on e-commerce fulfillment services.

why not use letter mail?

The most obvious way to send a zine inexpensively in the mail is with letter mail – letters smaller than 10” x 6” and under 60g or so can be sent by letter mail. My zines definitely fit that criteria, even when printed on nice paper. This would be really good because international package postage is EXPENSIVE, but sending a letter to Belgium only costs $2.13 according to USPS’s website.

The issue with this is the small print on that USPS page:

Value of contents can not exceed $0.00

So it seems like you’re not actually allowed to send things worth money via letter mail. Probably that’s related to customs or something. Or maybe letter mail is subsidized by the government? Not sure why.

Option 0: Ship zines myself

I’ve done this before and it was actually really fun to do once but I think this is pretty unlikely to be a good idea because:

a. cost: I live in Canada, almost everyone I sell zines to is outside of Canada, and b. availability: I’d like for people to be able to get shipments when I’m out of town / on vacation

Option 1: Amazon

One obvious answer to how to sell book-like things is “sell them on Amazon!”. Amazon actually has at least 3 different programs that you can use to sell books online (Amazon Advantage, Fulfilled By Amazon, Kindle Direct Publishing), and since Amazon is such a big thing I looked into all of them a little bit.

In general the forums on https://sellercentral.amazon.com seem to be a good way to understand how the various Amazon options work for people.

I wrote a lot about Amazon here but overall it doesn’t seem that great of an option – it’s really complicated, selling on Amazon’s website isn’t very appealing, and I think there would be a lot of additional fees.

Kindle Direct Publishing

Kindle Direct Publishing is a service where Amazon will take care of everything from printing to shipping. (It has “Kindle” in the same but they actually do printing as well). Brian Kernighan’s new Unix book is an example of a book published with KDP.

KDP won’t work for this project because they don’t support saddle stitching (stapling the zine), so I didn’t look into it too much. Here’s a link to their paper and cover options though.

Amazon Advantage

Amazon Advantage doesn’t do printing – you ship them books, and then they take care of shipping them to people. This seems great on its surface (“amazon just takes care of it!“).

advantages:

disadvantages:

The biggest issue for me here seems to be “you have to ship them books every week or two”, which seems like it could get very expensive if Amazon Advantage keeps asking you to ship them small quantities of zines.

Fulfilled By Amazon

Fulfilled By Amazon seems like the Amazon option that involves the least Amazon magic. Basically I’d ship books to their warehouses and then they ship the things from those warehouses.

how it works:

Option 2: Blackbox

Blackbox is a shipping company by the Cards Against Humanity folks. This is their Pricing PDF. I’m not 100% sure if I can work with them – the first time I filled out the form on their website saying I was interested they said they weren’t accepting new customers, but I think now they may be?

Here’s the summary:

Option 3: Shipbob

Shipbob is an shipping company for smaller ecommerce companies. Here’s the pricing PDF I found.

The main difference as far as I can tell between Shipbob and Blackbox is that Shipbox lets you include up to 5 items per order for free and Blackbox charges $0.70 per additional item in an order.

Shipbob advertises a 99.8% fulfillment accuracy rate which is pretty interesting – it means they expect 2 in 1000 orders to have a problem. That seems pretty good!

Option 4: Whiplash

Similar to the other two above. Their pricing page. Shipbob and Blackbox both include everything (shipping, packaging materials, and someone packing your order) in their fee, and Whiplash seems to charge separately for shipping.

Shipping 1 thing costs the same as shipping 5 things

Zines are pretty light – I just weighed some of my zines printed on high quality paper on my kitchen scale and they’re 40g each on average. Most of these shipping services seem to charge in increments of half a pound, so shipping 5 zines (about 200g) costs about the same as shipping 1 zine (about 40g).

This makes me think it would be more reasonable to focus on shipping packages of many zines – right now the 6 pack of all my zines costs $58 for a PDF collection, and $6-$12 shipping for something around that price seems super reasonable (and I could probably even do “free” shipping, aka pay the shipping costs myself).

The other thing that I think could be work well is shipping packages of 5 or 10 of the same zine so that a group of people can each get a zine and save on shipping costs.

this seems like it could work!

I still have no plan for how to print zines, but writing all this down makes me feel pretty optimistic about being able to ship zines to people. Even though shipping individual zines doesn’t seem that practical, I think shipping packs of 5-10 zines could be really reasonable!

Speaking of print – I printed a zine with Lulu last week and just got it in the mail yesterday. I didn’t think Lulu would be able to print it in the way I wanted, and they didn’t, so I’m really happy to know that and be able to move on to trying other non-print-on-demand printers.

2019-10-29T10:51:14+00:00


SQLite is really easy to compile

In the last week I’ve been working on another SQL website (https://sql-steps.wizardzines.com/, a list of SQL examples). I’m running all the queries on that site with sqlite, and I wanted to use window functions in one of the examples (this one).

But I’m using the version of sqlite from Ubuntu 18.04, and that version is too old and doesn’t support window functions. So I needed to upgrade sqlite!

This turned to out be surprisingly annoying (as usual), but in a pretty interesting way! I was reminded of some things about how executables and shared libraries work and it had a very satisfying conclusion. So I wanted to write it up here.

(spoiler: the summary is that https://www.sqlite.org/howtocompile.html explains how to compile SQLite and it takes like 5 seconds to do and it’s 20x easier than my usual experiences compiling software from source)

attempt 1: download a SQLite binary from their website

The SQLite download page has a link to a Linux binary for the SQLite command line tool. I downloaded it, it worked on my laptop, and I thought I was done.

But then I tried to run it on a build server I was using (Netlify), and I got this extremely strange error message: “File not found”. I straced it, and sure enough execve was returning the error code ENOENT, which means “File not found”. This was kind of maddening because the file was DEFINITELY there and it had the correct permissions and everything.

I googled this problem (by searching “execve enoent”), found this stack overflow answer, which pointed out that to run a binary, you don’t just need the binary to exist! You also need its loader to exist. (the path to the loader is inside the binary)

To see the path for the loader you can use ldd, like this:

$ ldd sqlite3
	linux-gate.so.1 (0xf7f9d000)
	libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0xf7f70000)
	libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xf7e6e000)
	libz.so.1 => /lib/i386-linux-gnu/libz.so.1 (0xf7e4f000)
	libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf7c73000)
	/lib/ld-linux.so.2

So /lib/ld-linux.so.2 is the loader,and that file doesn’t exist on the build server, probably because that Xenial installation didn’t have support for 32-bit binaries (?), and I needed to try something different.

attempt 2: install the Debian sqlite3 package

Okay, I thought, maybe I can install the sqlite package from debian testing. Trying to install a package from a different Debian version that I’m not using is literally never a good idea, but for some reason I decided to try it anyway.

Doing this completely unsurprisingly broke the sqlite installation on my computer (which also broke git), but I managed to recover from that with a bunch of sudo dpkg --purge --force-all libsqlite3-0 and make everything that depended on sqlite work again.

attempt 3: extract the Debian sqlite3 package

I also briefly tried to just extract the sqlite3 binary from the Debian sqlite package and run it. Unsurprisingly, this also didn’t work, but in a more understandable way: I had an older version of libreadline (.so.7) and it wanted .so.8.

$ ./usr/bin/sqlite3
./usr/bin/sqlite3: error while loading shared libraries: libreadline.so.8: cannot open shared object file: No such file or directory

attempt 4: compile it from source

The whole reason I spent all this time trying to download sqlite binaries is that I assumed it would be annoying or time consuming to compile sqlite from source. But obviously downloading random sqlite binaries was not working for me at all, so I finally decided to try to compile it myself.

Here are the directions: How to compile SQLite. And they’re the EASIEST THING IN THE UNIVERSE. Often compiling things feels like this:

Compiling SQLite works like this:

All the code is in one file (sqlite.c), and there are no weird dependencies! It’s amazing.

For my specific use case I didn’t actually need threading support or readline support or anything, so I used the instructions on the compile page to create a very simple binary that only used libc and no other shared libraries.

$ ldd sqlite3
	linux-vdso.so.1 (0x00007ffe8e7e9000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fbea4988000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fbea4d79000)

this is nice because it makes it easy to experiment with sqlite

I think it’s cool that SQLite’s build process is so simple because in the past I’ve had fun editing sqlite’s source code to understand how its btree implementation works.

This isn’t really super surprising given what I know about SQLite (it’s made to work really well in restricted / embedded contexts, so it makes sense that it would be possible to compile it in a really simple/minimal way). But it is super nice!

2019-10-28T11:19:19+00:00


Your Linux Toolbox: a box set of my free zines

About a year and a half ago, No Starch Press got in touch with me about publishing a print box set of my zines. I have two kinds of zines right now:

This set is basically a really lovely box set of all of the free zines, plus Bite Size Linux :). Here’s what’s in the box:

I’m really happy to get these zines into print, and that I can still give away all of the zines in the box away for free on my website – I asked them to write it into my publishing contract that I could still give them away, and they did :)

what it looks like

Here are the front covers of the zines in the box. We got colour covers illustrated for all of them, done by Vladimir Kašiković.

We had the idea to make the back covers a rainbow and I’m delighted about it:

There’s this fun “this toolbox belongs to:” detail on the bottom:

where to get it

It’s in a bunch of physical bookstores, and online! Here are a bunch of links to places you could get it:

North America:

Europe:

Asia/Pacific:

More:

why I’m doing this: to learn about print!

I don’t necessarily expect to make a lot of money from this box set (I get 10% or less of each sale, vs 97% for sales of my other zines online) but that’s not my priority with this project – I did it because I love the free zines I wrote, I wanted to make a really nice print version of them, and I wanted to learn about how print works and how traditional publishing works! I’ve already learned a lot about how publishing works and it’s been super interesting.

People have been very excited about this print project so far which has been really nice to see! Next I want to make it possible for people to order print copies of my newer zines, and I’m trying to figure out how to do that now. (if you have a print company that you’ve really loved using, let me know!)

I’m super happy about the print quality and if you get the box set I really hope you like it!

2019-10-21T16:15:19+00:00


SQL queries don't start with SELECT

Okay, obviously many SQL queries do start with SELECT (and actually this post is only about SELECT queries, not INSERTs or anything).

But! Yesterday I was working on an explanation of window functions, and I found myself googling “can you filter based on the result of a window function”. As in – can you filter the result of a window function in a WHERE or HAVING or something?

Eventually I concluded “window functions must run after WHERE and GROUP BY happen, so you can’t do it”. But this led me to a bigger question – what order do SQL queries actually run in?.

This was something that I felt like I knew intuitively (“I’ve written at least 10,000 SQL queries, some of them were really complicated! I must know this!“) but I struggled to actually articulate what the order was.

SQL queries happen in this order

I looked up the order, and here it is! (SELECT isn’t the first thing, it’s like the 5th thing!) (here it is in a tweet).

(I really want to find a more accurate way of phrasing this than “sql queries happen/run in this order” but I haven’t figured it out yet)

In a non-image format, the order is:

questions this diagram helps you answer

This diagram is about the semantics of SQL queries – it lets you reason through what a given query will return and answers questions like:

Database engines don’t actually literally run queries in this order because they implement a bunch of optimizations to make queries run faster – we’ll get to that a little later in the post.

So:

confounding factor: column aliases

Someone on Twitter pointed out that many SQL implementations let you use the syntax:

SELECT CONCAT(first_name, ' ', last_name) AS full_name, count(*)
FROM table
GROUP BY full_name

This query makes it look like GROUP BY happens after SELECT even though GROUP BY is first, because the GROUP BY references an alias from the SELECT. But it’s not actually necessary for the GROUP BY to run after the SELECT for this to work – the database engine can just rewrite the query as

SELECT CONCAT(first_name, ' ', last_name) AS full_name, count(*)
FROM table
GROUP BY CONCAT(first_name, ' ', last_name)

and run the GROUP BY first.

Your database engine also definitely does a bunch of checks to make sure that what you put in SELECT and GROUP BY makes sense together before it even starts to run the query, so it has to look at the query as a whole anyway before it starts to come up with an execution plan.

queries aren’t actually run in this order (optimizations!)

Database engines in practice don’t actually run queries by joining, and then filtering, and then grouping, because they implement a bunch of optimizations reorder things to make the query run faster as long as reordering things won’t change the results of the query.

One simple example of a reason why need to run queries in a different order to make them fast is that in this query:

SELECT * FROM
owners LEFT JOIN cats ON owners.id = cats.owner
WHERE cats.name = 'mr darcy'

it would be silly to do the whole left join and match up all the rows in the 2 tables if you just need to look up the 3 cats named ‘mr darcy’ – it’s way faster to do some filtering first for cats named ‘mr darcy’. And in this case filtering first doesn’t change the results of the query!

There are lots of other optimizations that database engines implement in practice that might make them run queries in a different order but there’s no room for that and honestly it’s not something I’m an expert on.

LINQ starts queries with FROM

LINQ (a querying syntax in C# and VB.NET) uses the order FROM ... WHERE ... SELECT. Here’s an example of a LINQ query:

var teenAgerStudent = from s in studentList
                      where s.Age > 12 && s.Age < 20
                      select s;

pandas (my favourite data wrangling tool) also basically works like this, though you don’t need to use this exact order – I’ll often write pandas code like this:

df = thing1.join(thing2)      # like a JOIN
df = df[df.created_at > 1000] # like a WHERE
df = df.groupby('something', num_yes = ('yes', 'sum')) # like a GROUP BY
df = df[df.num_yes > 2]       # like a HAVING, filtering on the result of a GROUP BY
df = df[['num_yes', 'something1', 'something']] # pick the columns I want to display, like a SELECT
df.sort_values('sometthing', ascending=True)[:30] # ORDER BY and LIMIT
df[:30]

This isn’t because pandas is imposing any specific rule on how you have to write your code, though. It’s just that it often makes sense to write code in the order JOIN / WHERE / GROUP BY / HAVING. (I’ll often put a WHERE first to improve performance though, and I think most database engines will also do a WHERE first in practice)

dplyr in R also lets you use a different syntax for querying SQL databases like Postgres, MySQL and SQLite, which is also in a more logical order.

I was really surprised that I didn’t know this

I’m writing a blog post about this because when I found out the order I was SO SURPRISED that I’d never seen it written down that way before – it explains basically everything that I knew intuitively about why some queries are allowed and others aren’t. So I wanted to write it down in the hopes that it will help other people also understand how to write SQL queries.

2019-10-03T10:24:53+00:00


Zine revenue for 2019

I occasionally get questions like “Can you share what you’ve learned about running a business?” The most surprising thing I’ve learned is that it’s possible to make money by teaching people computer things on the internet, so I want to make that a little more concrete by sharing the revenue from the zine business so far in 2019. Here’s a graph of revenue by month (the last month is September 2019):

This adds up to $87,858 USD for 2019 so far, which (depending on what I release in the rest of this year) is on track to be similar to revenue for 2018 ($101,558).

Until quite recently I’d been writing zines in my spare time, and now I’m taking a year to focus on it.

how $30,000 for September breaks down

The most obvious thing in that monthly revenue graph above is that 2 months (September and March) have way more revenue than all the others. This is because I released new zines (Bite Size Networking and HTTP: Learn your browser’s language) in those months.

Here’s how the $30,000 for September breaks down:

This September was the month with the most sales ever, which is mostly because of individual humans who find the zines useful (thank you!!).

expenses

The main expenses are paying illustrators and an accountant, a mailing list, and various books I buy to learn how to do things better. They probably come out to about 10% of revenue or so, and then there are taxes after that.

giving away free copies has been great

With the HTTP zine, like many of my previous zines, I’ve been giving away one free copy for every copy that people buy, so that people can get it even if $12 is hard for them to afford. (if you can’t afford $12, here’s the link, there are about 70 available as I’m writing this). I’m pretty happy with this setup – we’ve given away 1358 copies so far. (I think of this as kind of a “sales” statistic too)

I think I want to automate the system to give away free copies a bit more soon (like by automatically updating the number of free zines available using the Gumroad API instead of periodically doing it manually).

hopefully this is a useful data point!

Writing about money on the internet is weird, so this will probably be the first and last zine revenue post, but I’m writing it down in the hopes that it’s a useful data point for others. I thought for a long time that you could only really make money from writing on the internet with ads or sponsorships, but it’s not true!

The goal of this isn’t to say “you should run a business” or anything, just that this is a thing that’s possible in the world and that many developers do really value good educational materials and are happy to pay for them (if you’re one of those people, thank you!)

2019-10-01T12:53:58+00:00


Notes on building SQL exercises

In the last couple of weeks I’ve been working on some interactive SQL exercises to help people get better at writing SQL queries. This is a pretty new thing for me so I thought I’d write a few notes about my process so far!

why SQL is exciting: distributed SQL engines

To me the reason why SQL is exciting is that a lot of companies are storing their data in distributed SQL databases (Google BigQuery, Amazon Redshift, Spark SQL, Presto, etc) that let you run a complicated query across a billion rows pretty quickly! They’re fast partly because they’re designed to run your query across possibly tens or hundreds of computers.

At my last job I wrote thousands of SQL queries to do data analysis while I was working on the machine learning team, mostly ad hoc queries to answer questions I had about our data. I learned a lot of fun tricks to make them faster / easier to write and I’ve never really talked about it!

So I think SQL is a really nice way to go from “I have this sort of complicated question about billions of rows of data” to “ok, that’s the answer, great, I can move on”.

why write exercises: knowledge != skills

This is the first time I’m really trying in earnest to write exercises to teach something, instead of just explanations of the thing. The reason I’m doing this is that I read Design for how people learn by Julie Dirksen and she makes the point that knowledge is different from skills.

She defines a “skill” as “something you have to practice”. And SQL is definitely something that you have to practice if you want to learn it! So I thought – SQL is a relatively simple skill (as programming/programming-adjacent skills go!), maybe I can make something interactive and relatively simple to help people improve their SQL skills!

It’s also, well, a challenge, and I like trying things I haven’t tried before.

how I’m doing it: start with a challenge

I started out doing these SQL exercises in kind of the obvious way: start out with easy exercises, and then make them harder and harder over time to introduce new concepts. But when I watched people trying it out, I noticed a problem – a lot of people already know some SQL, and sometimes they would go through all the exercises without learning anything at all! That’s no fun!

So I came up with a different structure for each section of the SQL exercises:

  1. Start with a “challenge” that tests the skill the section is trying to teach.
  2. If the challenge is too hard, move on to a bunch of easier exercises that teach you the skills you need to solve the challenge.

Since showing is easier than explaining: here’s a draft of a page teaching GROUP BY. Here’s a screenshot of what the initial “challenge” for basic group by looks like:

I think that challenge in particular isn’t very good yet (I have a lot of work to do!) but that’s the idea.

how I’m getting feedback: anonymously track responses

Early on I also realized that I needed to get feedback about which challenges people were finding hard / easy. Every time someone runs a query, I track

I’ve already learned a lot from this, for example:

So basically (in addition to making more exercises) I think I need to spend more time cataloguing where/how people are getting stuck in practice and helping make sure fewer people get stuck.

the tech stack

To build this, I’m using:

I also bought the Refactoring UI book to try to improve my web design skills a tiny bit. I think it’s helped a little so far.

Vue components let me really easily add new challenges/exercises to a page like this:

<Puzzle
   id="count-the-owners"
   title='Count the number of different cat owners'
   description="
   You can use <code>COUNT(DISTINCT column)</code> to count distinct values of a column. (you can also do <code>SUM(DISTINCT column)</code> or <code>AVG(DISTINCT column)</code> but I'm not sure why that would be useful.
   "
   answer= "
   SELECT count(distinct(owner)) AS num_owners
   from cats
   "
   v-bind:table_names='["cats"]'
   >
</Puzzle>

the goal: make something that’s worth $100 or so

What I’m working towards is making exercises & challenges that would help someone with beginner/intermediate SQL skills improve their SQL fluency enough that it’d easily be worth $100 to them. We’ll see if I can get there! I don’t know whether I’ll price it at $100, but that’s my goal for how useful it should be.

The person I have in mind is sort of (as usual) myself 6 years ago, when I’d heard of SQL and could write a basic query but if you gave me a table of VERY INTERESTING DATA I couldn’t really effectively use SQL to answer the questions I had about it.

2019-09-30T12:35:51+00:00


Taking a year to explain computer things

I’ve been working on explaining computer things I’m learning on this blog for 6 years. I wrote one of my first posts, what does a shell even do? on Sept 30, 2013. Since then, I’ve written 11 zines, 370,000 words on this blog, and given 20 or so talks. So it seems like I like explaining things a lot.

tl;dr: I’m going to work on explaining computer things for a year

Here’s the exciting news: I left my job a month ago and my plan is to spend the next year working on explaining computer things!

As for why I’m doing this – I was talking through some reasons with my friend Mat last night and he said “well, sometimes there are things you just feel compelled to do”. I think that’s all there is to it :)

what does “explain computer things” mean?

I’m planning to:

  1. write some more zines (maybe I can write 10 zines in a year? we’ll see! I want to tackle both general-interest and slightly more niche topics, we’ll see what happens).
  2. work on some more interactive ways to learn things. I learn things best by trying things out and breaking them, so I want to see if I can facilitate that a little bit for other people. I started a project around this in May which has been on the backburner for a bit but which I’m excited about. Hopefully I’ll release it soon and then you can try it out and tell me what you think!

I say “a year” because I think I have at least a year’s worth of ideas and I can’t predict how I’ll feel after doing this for a year.

how: run a business

I started a corporation almost exactly a year ago, and I’m planning to keep running my explaining-things efforts as a business. This business has been making more than I made in my first programming job (that is, definitely enough money to live on!), which has been really surprising and great (thank you!).

some parameters of the business:

It’s been pretty interesting to learn more about running a small business and so far I like it more than I thought I would. (except for taxes, which I like exactly as much as I thought I would)

that’s all!

I’m excited to keep making explanations of computer things and to have more time to do it. This blog might change a bit away from “here’s what I’m learning at work these days” and towards “here are attempts at explaining things that I mostly already know”. It’ll be different! We’ll see how it goes!

2019-09-13T11:05:15+00:00


New zine: HTTP: Learn your browser's language!

Hello! I’ve released a new zine! It’s called “HTTP: Learn your browsers language!”

You can get it for $12 at https://gum.co/http-zine. If you buy it, you’ll get a PDF that you can either read on your computer or print out.

Here’s the cover and table of contents:

why http?

I got the idea for this zine from talking to Marco Rogers – he mentioned that he thought that new web developers / mobile developers would really benefit from understanding the fundamentals of HTTP better, I thought “OOH I LOVE TALKING ABOUT HTTP”, wrote a few pages about HTTP, saw they were helping people, and decided to write a whole zine about HTTP.

HTTP is important to understand because it runs the entire web – if you understand how HTTP requests and responses work, then it makes it WAY EASIER to debug why your web application isn’t working properly. Caching, cookies, and a lot of web security are implemented using HTTP headers, so if you don’t understand HTTP headers those things seem kind of like impenetrable magic. But actually the HTTP protocol is fundamentally pretty simple – there are a lot of complicated details but the basics are pretty easy to understand.

So the goal of this zine is to teach you the basics so you can easily look up and understand the details when you need them.

what it looks like printed out

All of my zines are best printed out (though you get a PDF you can read on your computer too!), so here are a couple of pictures of what it looks like when printed. I always ask my illustrator to make both a black and white version and a colour version of the cover so that it looks great when printed on a black and white printer.

(if you click on that “same origin policy” image, you can make it bigger)

The zine comes with 4 print PDFs in addition to a PDF you can just read on your computer/phone:

zines for your team

You can also buy this zine for your team members at work to help them learn HTTP!

I’ve been trying to get the pricing right for this for a while – I used to do it based on size of company, but that didn’t seem quite right because sometimes people would want to buy the zine for a small team at a big company. So I’ve switched to pricing based on the number of copies you want to distribute at your company.

Here’s the link: zines for your team!.

the tweets

When I started writing zines, I would just sit down, write down the things I thought were important, and be done with it.

In the last year and a half or so I’ve taken a different approach – instead of writing everything and then releasing it, instead I write a page at a time, post the page to Twitter, and then improve it and decide what page to write next based on the questions/comments I get on Twitter. If someone replies to the tweet and asks a question that shows that what I wrote is unclear, I can improve it! (I love getting replies on twitter asking clarifiying questions!).

Here are all the initial drafts of the pages I wrote and posted on twitter, in chronological order. Some of the pages didn’t make it into the zine at all, and I needed to do a lot of editing at the end to figure out the right order and make them all work coherently together in a zine instead of being a bunch of independent tweets.

Writing zines one tweet at a time has been really fun. I think it improves the quality a lot, because I get a ton of feedback along the way that I can use to make the zine better. There are also some experimental 45 second tiny videos in that list, which are definitely not part of the zine, but which were fun to make and which I might expand on in the future.

examplecat.com

One tiny easter egg in the zine: I have a lot of examples of HTTP requests, and I wasn’t sure for a long time what domain I should use for the examples. I used example.com a bunch, and google.com and twitter.com sometimes, but none of those felt quite right.

A couple of days before publishing the zine I finally had an epiphany – my example on the cover was requesting a picture of a cat, so I registered https://examplecat.com which just has a single picture of a cat. It also has an ASCII cat if you’re browsing in your terminal.

$ curl https://examplecat.com/cat.txt  -i
HTTP/2 200 
accept-ranges: bytes
cache-control: public, max-age=0, must-revalidate
content-length: 33
content-type: text/plain; charset=UTF-8
date: Thu, 12 Sep 2019 16:48:16 GMT
etag: "ac5affa59f554a1440043537ae973790-ssl"
strict-transport-security: max-age=31536000
age: 5
server: Netlify
x-nf-request-id: c5060abc-0399-4b44-94bf-c481e22c2b50-1772748

\    /\
 )  ( ')
(  /  )
 \(__)|

more zines at wizardzines.com

If you’re interested in the idea of programming zines and haven’t seen my zines before, I have a bunch more at https://wizardzines.com. There are 6 free zines there:

next zine: not sure yet!

Some things I’m considering for the next zine:

2019-09-12T12:13:27+00:00


How to put an HTML page on the internet

One thing I love about the internet is that it’s SO EASY to put static HTML websites on the internet. Someone asked me today how to do it, so I thought I’d write down how really quickly!

just an HTML page

All of my sites are just static HTML and CSS. My web design skills are relatively minimal (https://wizardzines.com is the most complicated site I’ve developed on my own), so keeping all my internet sites relatively simple means that I have some hope of being able to make changes / fix things without spending a billion hours on it.

So we’re going to take as minimal of an approach as possible in this blog post – just one HTML page.

the HTML page

The website we’re going to put on the internet is just one file, called index.html. You can find it at https://github.com/jvns/website-example, which is a Github repository with exactly one file in it.

The HTML file has some CSS in it to make it look a little less boring, which is partly copied from https://example.com.

how to put the HTML page on the internet

Here are the steps:

  1. sign up for a Neocities account
  2. copy the index.html into the index.html in your neocities site
  3. done

The index.html page above is on the internet at julia-example-website.neocities.com, if you view source you’ll see that it’s the same HTML as in the github repo.

I think this is probably the simplest way to put an HTML page on the internet (and it’s a throwback to Geocities, which is how I made my first website in 2003) :). I also like that Neocities (like glitch, which I also love) is about experimentation and learning and having fun..

other options

This is definitely not the only easy way – Github pages and Gitlab pages and Netlify will all automatically publish a site when you push to a Git repository, and they’re all very easy to use (just connect them to your github repository and you’re done). I personally use the Git repository approach because not having things in Git makes me nervous – I like to know what changes to my website I’m actually pushing. But I think if you just want to put an HTML site on the internet for the first time and play around with HTML/CSS, Neocities is a really nice way to do it.

If you want to actually use your website for a Real Thing and not just to play around you probably want to buy a domain and link it to your website so that you can change hosting providers in the future, but that is a bit less simple.

this is a good possible jumping off point for learning HTML

If you are a person who is comfortable editing files in a Git repository but wants to practice HTML/CSS, I think this is a fun way to put a website on the internet and play around! I really like the simplicity of it – there’s literally just one file, so there’s no fancy extra magic to get in the way of understanding what’s going on.

There are also a bunch of ways to complicate/extend this, like this blog is actually generated with Hugo which generates a bunch of HTML files which then go on the internet, but it’s always nice to start with the basics.

2019-09-06T15:56:37+00:00


How to write zines with simple tools

People often ask me what tools I use to write my zines (the answer is here). Answering this question as written has always felt slightly off to me, though, and I couldn’t figure out why for a long time.

I finally realized last week that instead of “what tools do you use to write zines?” some people may have actually wanted to know “how can I do this myself?”! And “buy a $500 iPad” is not a terribly useful answer to that question – it’s not how I got started, iPads are kind of a weird fancy way to write zines, and most people don’t have them.

So this blog post is about more traditional (and easier to get started with) ways to write zines.

We’re going to start out by talking about the mechanics of how to write the zine, and then talk about how to assemble it into a booklet.

Way 1: Write it on paper

This is how I made my first zine (spying on your programs with strace) which you can see here: https://jvns.ca/strace-zine-unfolded.pdf.

Here’s an example of a page I drew on paper this morning pretty quickly. It looks kind of bad because I scanned it with my phone, but if you use a real scanner (like I did with the strace PDF above), the scanned version comes out better.

Way 2: Use a Google doc

The next option is to use a Google doc (or whatever other word processor you prefer). Here’s the Google doc I wrote for the below image, and here’s what it looks like:

They key thing about this Google doc approach is to apply some “less is more”. It’s intended to be printed as part of a booklet on half a sheet of letter paper, which means everything needs to be twice as big for it to look good.

Way 3: Use an iPad

This is what I do (use the Notability app on iPad). I’m not going to talk about this method much because this post is about using more readily available tools.

Way 4: Use a single sheet of paper

This is a subset of “Write it on paper” – the Wikibooks page on zine making has a great guide that shows how to write out a tiny zine on 1 piece of paper and then fold it up to make a little booklet. Here are the pictures of the steps from the Wikibooks page:

Sumana Harihareswara’s Playing with python zine is a nice example of a zine that’s intended to be folded up in that way.

Way 5: Adobe Illustrator

I’ve never used Adobe Illustrator so I’m not going to pretend that I know anything about it or put together an example using it, but I hear it’s a way people do book layout.

booklets: the photocopier method

So you’ve written a bunch of pages and want to assemble them into a booklet. One way to do this (and what I did for my first zine about strace!) is the photocopier method. There’s a great guide by Julia Gfrörer in this tweet, which I’m going to reproduce here:




That explanation is excellent and I don’t have anything to add. I did it that way and it worked great.

If you want to buy a print copy of that how-to-make-zines zine from Thruban Press, you can get it here on Etsy.

booklets: the computer method

If you’ve made your zine in Google Docs or in another computery way, you probably want a more computery way of assembling the pages into a booklet.

what I use: pdflatex

I do this using the pdfpages LaTeX extension. This sounds complicated but it’s not really, you don’t need to learn latex or anything. You just need to have pdflatex on your system, which is a sudo apt install texlive-base away on Ubuntu. The steps are:

  1. Get a PDF with the pages from your zine (pages need to be a multiple of 4)
  2. Get the latex file from this gist
  3. Replace /home/bork/http-zine.pdf with the path to your PDF and 1-28 with 1-however many pages are in your zine.
  4. run pdflatex formatted-zine.tex
  5. Tweak the parameters until it looks the way you want. The documentation for the pdfpages package is here

I like using this relatively complicated method because there are always small tweaks I want to make like “oh, the right margin is too big, crop it a little bit” and the pdfpages package has tons of options that let me make those tweaks.

other methods

  1. On Linux you can use the pdfjam bash script, which is just a wrapper around the pdfpages latex package. This is what I used to do but today I find it simpler to use the pdfpages latex package directly.
  2. There’s a program called Booklet Creator for Mac and Windows that @mrfb uses. It looks pretty simple to use.
  3. If you convert your PDF to a ps file (with pdf2ps for instance), psnup can do this. I tried cat file.ps | psbook | psnup -2 > booklet.ps and it worked, though the resulting PDFs are a little slow to load in my PDF viewer for some reason.
  4. there are probably a ton more ways to do this, if you know more let me know

making zines is easy and low tech

That’s all! I mostly wanted to explain that zines are an easy low tech thing to do and if you think making them sounds fun, you definitely 100% do not need to use any fancy expensive tools to do it, you can literally use some sheets of paper, a Sharpie, a pen, and spend $3 at your local print shop to use the photocopier.

resources

summary of the resources I linked to:

2019-09-01T09:02:43+00:00


git exercises: navigate a repository

I think the curl exercises the other day went well, so today I woke up and wanted to try writing some Git exercises. Git is a big thing to learn, probably too big to learn in a few hours, so my first idea for how to break it down was by starting by navigating a repository.

I was originally going to use a toy test repository, but then I thought – why not a real repository? That’s way more fun! So we’re going to navigate the repository for the Ruby programming language. You don’t need to know any C to do this exercise, it’s just about getting comfortable with looking at how files in a repository change over time.

clone the repository

To get started, clone the repository:

git clone https://github.com/ruby/ruby

The big different thing about this repository (as compared to most of the repositories you’ll work with in real life) is that it doesn’t have branches, but it DOES have lots of tags, which are similar to branches in that they’re both just pointers to a commit. So we’ll do exercises with tags instead of branches. The way you change tags and branches are very different, but the way you look at tags and branches is exactly the same.

a git SHA always refers to the same code

The most important thing to keep in mind while doing these exercises is that a git SHA like 9e3d9a2a009d2a0281802a84e1c5cc1c887edc71 always refers to the same code, as explained in this page. This page is from a zine I wrote with Katie Sylor-Miller called Oh shit, git!. (She also has a great site called https://ohshitgit.com/ that inspired the zine).

We’ll be using git SHAs really heavily in the exercises to get you used to working with them and to help understand how they correspond to tags and branches.

git subcommands we’ll be using

All of these exercises only use 5 git subcommands:

git checkout
git log (--oneline, --author, and -S will be useful)
git diff (--stat will be useful)
git show
git status

exercises

  1. Check out matz’s commit of Ruby from 1998. The commit ID is 3db12e8b236ac8f88db8eb4690d10e4a3b8dbcd4. Find out how many lines of code Ruby was at that time.
  2. Check out the current master branch
  3. Look at the history for the file hash.c. What was the last commit ID that changed that file?
  4. Get a diff of how hash.c has changed in the last 20ish years: compare that file on the master branch to the file at commit 3db12e8b236ac8f88db8eb4690d10e4a3b8dbcd4.
  5. Find a recent commit that changed hash.c and look at the diff for that commit
  6. This repository has a bunch of tags for every Ruby release. Get a list of all the tags.
  7. Find out how many files changed between tag v1_8_6_187 and tag v1_8_6_188
  8. Find a commit (any commit) from 2015 and check it out, look at the files very briefly, then go back to the master branch.
  9. Find out what commit the tag v1_8_6_187 corresponds to.
  10. List the directory .git/refs/tags. Run cat .git/refs/tags/v1_8_6_187 to see the contents of one of those files.
  11. Find out what commit ID HEAD corresponds to right now.
  12. Find out how many commits have been made to the test/ directory
  13. Get a diff of lib/telnet.rb between the commits 65a5162550f58047974793cdc8067a970b2435c0 and 9e3d9a2a009d2a0281802a84e1c5cc1c887edc71. How many lines of that file were changed?
  14. How many commits were made between Ruby 2.5.1 and 2.5.2 (tags v2_5_1 and v2_5_3)
  15. How many commits were authored by matz (Ruby’s creator)?
  16. What’s the most recent commit that included the word tkutil?
  17. Check out the commit e51dca2596db9567bd4d698b18b4d300575d3881 and create a new branch that points at that commit.
  18. Run git reflog to see all the navigating of the repository you’ve done so far

Question #1: Check out matz's commit of Ruby from 1998. The commit ID is `3db12e8b236ac8f88db8eb4690d10e4a3b8dbcd4`. Find out how many lines of code Ruby was at that time.

Solution #1:

git checkout 3db12e8b236ac8f88db8eb4690d10e4a3b8dbcd4
find . -name '*.c' | xargs wc -l

Question #2: Check out the current master branch

Solution #2:

git checkout master

Question #3: Look at the history for the file `hash.c`. What was the last commit ID that changed that file?

Solution #3:

git log hash.c
# look at the first line to get the commit ID. 
# I got 3df37259d81d9fc71f8b4f0b8d45dc9d0af81ab4.

Question #4: Get a diff of how `hash.c` has changed in the last 20ish years: compare that file on the master branch to the file at commit `3db12e8b236ac8f88db8eb4690d10e4a3b8dbcd4`.

Solution #4:

git diff 3db12e8b236ac8f88db8eb4690d10e4a3b8dbcd4 hash.c

Question #5: Find a recent commit that changed `hash.c` and look at the diff for that commit

Solution #5:

git log hash.c
# look at the first line to get the commit ID. 
# I got 3df37259d81d9fc71f8b4f0b8d45dc9d0af81ab4.
git show 3df37259d81d9fc71f8b4f0b8d45dc9d0af81ab4

Question #6: This repository has a bunch of **tags** for every Ruby release. Get a list of all the tags.

Solution #6:

git tags

Question #7: Find out how many files changed between tag `v1_8_6_187` and tag `v1_8_6_188`

Solution #7:

git diff v1_8_6_187 v1_8_6_188 --stat
# 5 files!

Question #8: Find a commit (any commit) from 2015 and check it out, look at the files very briefly, then go back to the master branch.

Solution #8:

git log | grep -C 2 ' 2015 ' | head
git checkout bd5d443a56ee4bcb59a0a08776c07dea3ee60121
ls
git checkout master

Question #9: Find out what commit the tag `v1_8_6_187` corresponds to.

Solution #9:

git show v1_8_6_187

Question #10: List the directory `.git/refs/tags`. Run `cat .git/refs/tags/v1_8_6_187` to see the contents of one of those files.

Solution #10:

$ cat .git/refs/tags/v1_8_6_187
928e6916b25aee5b2b379999a3fa8816d40db714

Question #11: Find out what commit ID `HEAD` corresponds to right now.

Solution #11:

git show HEAD

Question #12: Find out how many commits have been made to the `test/` directory

Solution #12:

git log --oneline test/ | wc

Question #13: Get a diff of `lib/telnet.rb` between the commits `f2a91397fd7f9ca5bb3d296ec6df2de6f9cfc7cb` and `e44c9b11475d0be2f63286c1332a48da1b4d8626 `. How many lines of that file were changed?

Solution #13:

git diff f2a91397fd7f9..e44c9b11475d0 lib/tempfile.rb

Question #14: How many commits were made between Ruby 2.5.1 and 2.5.2 (tags `v2_5_1` and `v2_5_3`)

Solution #14:

git log v2_5_1..v2_5_3 --oneline | wc

Question #15: How many commits were authored by `matz` (Ruby's creator)?

Solution #15:

git log --oneline --author matz | wc -l

Question #16: What's the most recent commit that included the word `tkutil`?

Solution #16:

git log -S tkutil
# result is 6c5f5233db596c2c7708d5807d9a925a3a0ee73a

Question #17: Check out the commit `e51dca2596db9567bd4d698b18b4d300575d3881` and create a new branch that points at that commit.

Solution #17:

git checkout e51dca2596db9567bd4d698b18b4d300575d3881
git branch my-branch

Question #18: Run `git reflog` to see all the navigating of the repository you've done so far

Solution #18:

git reflog

2019-08-30T09:25:15+00:00


curl exercises

Recently I’ve been interested in how people learn things. I was reading Kathy Sierra’s great book Badass: Making Users Awesome. It talks about the idea of deliberate practice.

The idea is that you find a small micro-skill that can be learned in maybe 3 sessions of 45 minutes, and focus on learning that micro-skill. So, as an exercise, I was trying to think of a computer skill that I thought could be learned in 3 45-minute sessions.

I thought that making HTTP requests with curl might be a skill like that, so here are some curl exercises as an experiment!

what’s curl?

curl is a command line tool for making HTTP requests. I like it because it’s an easy way to test that servers or APIs are doing what I think, but it’s a little confusing at first!

Here’s a drawing explaining curl’s most important command line arguments (which is page 6 of my Bite Size Networking zine). You can click to make it bigger.

fluency is valuable

With any command line tool, I think having fluency is really helpful. It’s really nice to be able to just type in the thing you need. For example recently I was testing out the Gumroad API and I was able to just type in:

curl https://api.gumroad.com/v2/sales \
                         -d "access_token=<SECRET>" \
                         -X GET  -d "before=2016-09-03"

and get things working from the command line.

21 curl exercises

These exercises are about understanding how to make different kinds of HTTP requests with curl. They’re a little repetitive on purpose. They exercise basically everything I do with curl.

To keep it simple, we’re going to make a lot of our requests to the same website: https://httpbin.org. httpbin is a service that accepts HTTP requests and then tells you what request you made.

  1. Request https://httpbin.org
  2. Request https://httpbin.org/anything. httpbin.org/anything will look at the request you made, parse it, and echo back to you what you requested. curl’s default is to make a GET request.
  3. Make a POST request to https://httpbin.org/anything
  4. Make a GET request to https://httpbin.org/anything, but this time add some query parameters (set value=panda).
  5. Request google’s robots.txt file (www.google.com/robots.txt)
  6. Make a GET request to https://httpbin.org/anything and set the header User-Agent: elephant.
  7. Make a DELETE request to https://httpbin.org/anything
  8. Request https://httpbin.org/anything and also get the response headers
  9. Make a POST request to https://httpbin.org/anything with the JSON body {"value": "panda"}
  10. Make the same POST request as the previous exercise, but set the Content-Type header to application/json (because POST requests need to have a content type that matches their body). Look at the json field in the response to see the difference from the previous one.
  11. Make a GET request to https://httpbin.org/anything and set the header Accept-Encoding: gzip (what happens? why?)
  12. Put a bunch of a JSON in a file and then make a POST request to https://httpbin.org/anything with the JSON in that file as the body
  13. Make a request to https://httpbin.org/image and set the header ‘Accept: image/png’. Save the output to a PNG file and open the file in an image viewer. Try the same thing with with different Accept: headers.
  14. Make a PUT request to https://httpbin.org/anything
  15. Request https://httpbin.org/image/jpeg, save it to a file, and open that file in your image editor.
  16. Request https://www.twitter.com. You’ll get an empty response. Get curl to show you the response headers too, and try to figure out why the response was empty.
  17. Make any request to https://httpbin.org/anything and just set some nonsense headers (like panda: elephant)
  18. Request https://httpbin.org/status/404 and https://httpbin.org/status/200. Request them again and get curl to show the response headers.
  19. Request https://httpbin.org/anything and set a username and password (with -u username:password)
  20. Download the Twitter homepage (https://twitter.com) in Spanish by setting the Accept-Language: es-ES header.
  21. Make a request to the Stripe API with curl. (see https://stripe.com/docs/development for how, they give you a test API key). Try making exactly the same request to https://httpbin.org/anything.

2019-08-27T09:29:38+00:00


Get your work recognized: write a brag document

There’s this idea that, if you do great work at your job, people will (or should!) automatically recognize that work and reward you for it with promotions / increased pay. In practice, it’s often more complicated than that – some kinds of important work are more visible/memorable than others. It’s frustrating to have done something really important and later realize that you didn’t get rewarded for it just because the people making the decision didn’t understand or remember what you did. So I want to talk about a tactic that I and lots of people I work with have used!

This blog post isn’t just about being promoted or getting raises though. The ideas here have actually been more useful to me to help me reflect on themes in my work, what’s important to me, what I’m learning, and what I’d like to be doing differently. But they’ve definitely helped with promotions!

You can also skip to the brag document template at the end.

you don’t remember everything you did

One thing I’m always struck by when it comes to performance review time is a feeling of “wait, what did I do in the last 6 months?“. This is a kind of demoralizing feeling and it’s usually not based in reality, more in “I forgot what cool stuff I actually did”.

I invariably end up having to spend a bunch of time looking through my pull requests, tickets, launch emails, design documents, and more. I always end up finding small (and sometimes not-so-small) things that I completely forgot I did, like:

your manager doesn’t remember everything you did

And if you don’t remember everything important you did, your manager (no matter how great they are!) probably doesn’t either. And they need to explain to other people why you should be promoted or given an evaluation like “exceeds expectations” (“X’s work is so awesome!!!!” doesn’t fly).

So if your manager is going to effectively advocate for you, they need help.

here’s the tactic: write a document listing your accomplishments

The tactic is pretty simple! Instead of trying to remember everything you did with your brain, maintain a “brag document” that lists everything so you can refer to it when you get to performance review season! This is a pretty common tactic – when I started doing this I mentioned it to more experienced people and they were like “oh yeah, I’ve been doing that for a long time, it really helps”.

Where I work we call this a “brag document” but I’ve heard other names for the same concept like “hype document” or “list of stuff I did” :).

There’s a basic template for a brag document at the end of this post.

share your brag document with your manager

When I first wrote a brag document I was kind of nervous about sharing it with my manager. It felt weird to be like “hey, uh, look at all the awesome stuff I did this year, I wrote a long document listing everything”. But my manager was really thankful for it – I think his perspective was “this makes my job way easier, now I can look at the document when writing your perf review instead of trying to remember what happened”.

Giving them a document that explains your accomplishments will really help your manager advocate for you in discussions about your performance and come to any meetings they need to have prepared.

Brag documents also really help with manager transitions – if you get a new manager 3 months before an important performance review that you want to do well on, giving them a brag document outlining your most important work & its impact will help them understand what you’ve been doing even though they may not have been aware of any of your work before.

share it with your peer reviewers

Similarly, if your company does peer feedback as part of the promotion/perf process – share your brag document with your peer reviewers!! Every time someone shares their doc with me I find it SO HELPFUL with writing their review for much the same reasons it’s helpful to share it with your manager – it reminds me of all the amazing things they did, and when they list their goals in their brag document it also helps me see what areas they might be most interested in feedback on.

On some teams at work it’s a team norm to share a brag document with peer reviewers to make it easier for them.

explain the big picture

In addition to just listing accomplishments, in your brag document you can write the narrative explaining the big picture of your work. Have you been really focused on security? On building your product skills & having really good relationships with your users? On building a strong culture of code review on the team?

In my brag document, I like to do this by making a section for areas that I’ve been focused on (like “security”) and listing all the work I’ve done in that area there. This is especially good if you’re working on something fuzzy like “building a stronger culture of code review” where all the individual actions you do towards that might be relatively small and there isn’t a big shiny ship.

use your brag document to notice patterns

In the past I’ve found the brag document useful not just to hype my accomplishments, but also to reflect on the work I’ve done. Some questions it’s helped me with:

you can write it all at once or update it every 2 weeks

Many people have told me that it works best for them if they take a few minutes to update their brag document every 2 weeks ago. For me it actually works better to do a single marathon session every 6 months or every year where I look through everything I did and reflect on it all at once. Try out different approaches and see what works for you!

don’t forget to include the fuzzy work

A lot of us work on fuzzy projects that can feel hard to quantify, like:

A lot of people will leave this kind of work out because they don’t know how to explain why it’s important. But I think this kind of work is especially important to put into your brag document because it’s the most likely to fall under the radar! One way to approach this is to, for each goal:

  1. explain your goal for the work (why do you think it’s important to refactor X piece of code?)
  2. list some things you’ve done towards that goal
  3. list any effects you’ve seen of the work, even if they’re a little indirect

If you tell your coworkers this kind of work is important to you and tell them what you’ve been doing, maybe they can also give you ideas about how to do it more effectively or make the effects of that work more obvious!

encourage each other to celebrate accomplishments

One nice side effect of having a shared idea that it’s normal/good to maintain a brag document at work is that I sometimes see people encouraging each other to record & celebrate their accomplishments (“hey, you should put that in your brag doc, that was really good!”). It can be hard to see the value of your work sometimes, especially when you’re working on something hard, and an outside perspective from a friend or colleague can really help you see why what you’re doing is important.

Brag documents are good when you use them on your own to advocate for yourself, but I think they’re better as a collaborative effort to recognize where people are excelling.

Next, I want to talk about a couple of structures that we’ve used to help people recognize their accomplishments.

the brag workshop: help people list their accomplishments

The way this “brag document” practice started in the first place is that my coworker Karla and I wanted to help other women in engineering advocate for themselves more in the performance review process. The idea is that some people undersell their accomplishments more than they should, so we wanted to encourage those people to “brag” a little bit and write down what they did that was important.

We did this by running a “brag workshop” just before performance review season. The format of the workshop is like this:

Part 1: write the document: 1-2 hours. Everybody sits down with their laptop, starts looking through their pull requests, tickets they resolved, design docs, etc, and puts together a list of important things they did in the last 6 months.

Part 2: pair up and make the impact of your work clearer: 1 hour. The goal of this part is to pair up, review each other’s documents, and identify places where people haven’t bragged “enough” – maybe they worked on an extremely critical project to the company but didn’t highlight how important it was, maybe they improved test performance but didn’t say that they made the tests 3 times faster and that it improved everyone’s developer experience. It’s easy to accidentally write “I shipped $feature” and miss the follow up (“… which caused $thing to happen”). Another person reading through your document can help you catch the places where you need to clarify the impact.

biweekly brag document writing session

Another approach to helping people remember their accomplishments: my friend Dave gets some friends together every couple of weeks or so for everyone to update their brag documents. It’s a nice way for people to talk about work that they’re happy about & celebrate it a little bit, and updating your brag document as you go can be easier than trying to remember everything you did all at once at the end of the year.

These don’t have to be people in the same company or even in the same city – that group meets over video chat and has people from many different companies doing this together from Portland, Toronto, New York, and Montreal.

In general, especially if you’re someone who really cares about your work, I think it’s really positive to share your goals & accomplishments (and the things that haven’t gone so well too!) with your friends and coworkers. It makes it feel less like you’re working alone and more like everyone is supporting each other in helping them accomplish what they want.

thanks

Thanks to Karla Burnett who I worked with on spreading this idea at work, to Dave Vasilevsky for running brag doc writing sessions, to Will Larson who encouraged me to start one of these in the first place, to my manager Jay Shirley for always being encouraging & showing me that this is a useful way to work with a manager, and to Allie, Dan, Laura, Julian, Kamal, Stanley, and Vaibhav for reading a draft of this.

I’d also recommend the blog post Hype Yourself! You’re Worth It! by Aashni Shah which talks about a similar approach.

Appendix: brag document template

Here’s a template for a brag document! Usually I make one brag document per year. (“Julia’s 2017 brag document”). I think it’s okay to make it quite long / comprehensive – 5-10 pages or more for a year of work doesn’t seem like too much to me, especially if you’re including some graphs/charts / screenshots to show the effects of what you did.

One thing I want to emphasize, for people who don’t like to brag, is – you don’t have to try to make your work sound better than it is. Just make it sound exactly as good as it is! For example “was the primary contributor to X new feature that’s now used by 60% of our customers and has gotten Y positive feedback”.

Goals for this year:

Goals for next year

Projects

For each one, go through:

Remember: don’t forget to explain what the results of you work actually were! It’s often important to go back a few months later and fill in what actually happened after you launched the project.

Collaboration & mentorship

Examples of things in this category:

Design & documentation

List design docs & documentation that you worked on

Company building

This is a category we have at work – it basically means “things you did to help the company overall, not just your project / team”. Some things that go in here:

What you learned

My friend Julian suggested this section and I think it’s a great idea – try listing important things you learned or skills you’ve acquired recently! Some examples of skills you might be learning or improving:

It’s really easy to lose track of what skills you’re learning, and usually when I reflect on this I realize I learned a lot more than I thought and also notice things that I’m not learning that I wish I was.

Outside of work

It’s also often useful to track accomplishments outside of work, like:

I think this can be a nice way to highlight how you’re thinking about your career outside of strictly what you’re doing at work.

This can also include other non-career-related things you’re proud of, if that feels good to you! Some people like to keep a combined personal + work brag document.

General prompts

If you’re feeling stuck for things to mention, try:

2019-06-28T18:46:02+00:00


What does debugging a program look like?

I was debugging with a friend who’s a relatively new programmer yesterday, and showed them a few debugging tips. Then I was thinking about how to teach debugging this morning, and mentioned on Twitter that I’d never seen a really good guide to debugging your code. (there are a ton of really great replies by Anne Ogborn to that tweet if you are interested in debugging tips)

As usual, I got a lot of helpful answers and now I have a few ideas about how to teach debugging skills / describe the process of debugging.

a couple of debugging resources

I was hoping for more links to debugging books/guides, but here are the 2 recommendations I got:

“Debugging” by David Agans: Several people recommended the book Debugging, which looks like a nice and fairly short book that explains a debugging strategy. I haven’t read it yet (though I ordered it to see if I should be recommending it) and the rules laid out in the book (“understand the system”, “make it fail”, “quit thinking and look”, “divide and conquer”, “change one thing at a time”, “keep an audit trail”, “check the plug”, “get a fresh view”, and “if you didn’t fix it, it ain’t fixed”) seem extremely resaonable :). He also has a charming debugging poster.

“How to debug” by John Regehr: How to Debug is a very good blog post based on Regehr’s experience teaching a university embedded systems course. Lots of good advice. He also has a blog post reviewing 4 books about debugging, including Agans’ book.

reproduce your bug (but how do you do that?)

The rest of this post is going to be an attempt to aggregate different ideas about debugging people tweeted at me.

Somewhat obviously, everybody agrees that being able to consistently reproduce a bug is important if you want to figure out what’s going on. I have an intuitive sense for how to do this but I’m not sure how to explain how to go from “I saw this bug twice” to “I can consistently reproduce this bug on demand on my laptop”, and I wonder whether the techniques you use to do this depend on the domain (backend web dev, frontend, mobile, games, C++ programs, embedded etc).

reproduce your bug quickly

Everybody also agrees that it’s extremely useful be able to reproduce the bug quickly (if it takes you 3 minutes to check if every change helped, iterating is VERY SLOW).

A few suggested approaches:

accept that it’s probably your code’s fault

Sometimes I see a problem and I’m like “oh, library X has a bug”, “oh, it’s DNS”, “oh, SOME OTHER THING THAT IS NOT MY CODE is broken”. And sometimes it’s not my code! But in general between an established library and my code that I wrote last month, usually it’s my code that I wrote last month that’s the problem :).

start doing experiments

@act_gardner gave a nice, short explanation of what you have to do after you reproduce your bug

I try to encourage people to first fully understand the bug - What’s happening? What do you expect to happen? When does it happen? When does it not happen? Then apply their mental model of the system to guess at what could be breaking and come up with experiments.

Experiments could be changing or removing code, making API calls from a REPL, trying new inputs, poking at memory values with a debugger or print statements.

I think the loop here may be:

change one thing at a time

Everybody definitely agrees that it is important to change one thing a time when doing an experiment to verify an assumption.

check your assumptions

A lot of debugging is realizing that something you were sure was true (“wait this request is going to the new server, right, not the old one???“) is actually… not true. I made an attempt to list some common incorrect assumptions. Here are some examples:

weird methods to get information

There are a lot of normal ways to do experiments to check your assumptions / guesses about what the code is doing (print out variable values, use a debugger, etc). Sometimes, though, you’re in a more difficult environment where you can’t print things out and don’t have access to a debugger (or it’s inconvenient to do those things, maybe because there are too many events). Some ways to cope:

The point here is that information is the most important thing and you need to do whatever’s necessary to get information.

write your code so it’s easier to debug

Another point a few people brought up is that you can improve your program to make it easier to debug. tef has a nice post about this: Write code that’s easy to delete, and easy to debug too. here. I thought this was very true:

Debuggable code isn’t necessarily clean, and code that’s littered with checks or error handling rarely makes for pleasant reading.

I think one interpretation of “easy to debug” is “every single time there’s an error, the program reports to you exactly what happened in an easy to understand way”. Whenever my program has a problem and says sometihng “error: failure to connect to SOME_IP port 443: connection timeout” I’m like THANK YOU THAT IS THE KIND OF THING I WANTED TO KNOW and I can check if I need to fix a firewall thing or if I got the wrong IP for some reason or what.

One simple example of this recently: I was making a request to a server I wrote and the reponse I got was “upstream connect error or disconnect/reset before headers”. This is an nginx error which basically in this case boiled down to “your program crashed before it sent anything in response to the request”. Figuring out the cause of the crash was pretty easy, but having better error handling (returning an error instead of crashing) would have saved me a little time because instead of having to go check the cause of the crash, I could have just read the error message and figured out what was going on right away.

error messages are better than silently failing

To get closer to the dream of “every single time there’s an error, the program reports to you exactly what happened in an easy to understand way” you also need to be disciplined about immediately returning an error message instead of silently writing incorrect data / passing a nonsense value to another function which will do WHO KNOWS WHAT with it and cause you a gigantic headache. This means adding code like this:

if UNEXPECTED_THING:
    raise "oh no THING happened"

This isn’t easy to get right (it’s not always obvious where you should be raising errors!“) but it really helps a lot.

failure: print out a stack of errors, not just one error.

Related to returning helpful errors that make it easy to debug: Rust has a really incredible error handling library called failure which basicaly lets you return a chain of errors instead of just one error, so you can print out a stack of errors like:

"error starting server process" caused by
"error initializing logging backend" caused by
"connection failure: timeout connecting to 1.2.3.4 port 1234".

This is SO MUCH MORE useful than just connection failure: timeout connecting to 1.2.3.4 port 1234 by itself because it tells you the significance of 1.2.3.4 (it’s something to do with the logging backend!). And I think it’s also more useful than connection failure: timeout connecting to 1.2.3.4 port 1234 with a stack trace, because it summarizes at a high level the parts that went wrong instead of making you read all the lines in the stack trace (some of which might not be relevant!).

tools like this in other languages:

If you know how to do this in other languages I’d be interested to hear!

understand what the error messages mean

One sub debugging skill that I take for granted a lot of the time is understanding what error messages mean! I came across this nice graphic explaining common Python errors and what they mean, which breaks down things like NameError, IOError, etc.

I think a reason interpreting error messages is hard is that understanding a new error message might mean learning a new concept – NameError can mean “Your code uses a variable outside the scope where it’s defined”, but to really understand that you need to understand what variable scope is! I ran into this a lot when learning Rust – the Rust compiler would be like “you have a weird lifetime error” and I’d like be “ugh ok Rust I get it I will go actually learn about how lifetimes work now!“.

And a lot of the time error messages are caused by a problem very different from the text of the message, like how “upstream connect error or disconnect/reset before headers” might mean “julia, your server crashed!“. The skill of understanding what error messages mean is often not transferable when you switch to a new area (if I started writing a lot of React or something tomorrow, I would probably have no idea what any of the error messages meant!). So this definitely isn’t just an issue for beginner programmers.

that’s all for now!

I feel like the big thing I’m missing when talking about debugging skills is a stronger understanding of where people get stuck with debugging – it’s easy to say “well, you need to reproduce the problem, then make a more minimal reproduction, then start coming up with guesses and verifying them, and improve your mental model of the system, and then figure it out, then fix the problem and hopefully write a test to make it not come back”, but – where are people actually getting stuck in practice? What are the hardest parts? I have some sense of what the hardest parts usually are for me but I’m still not sure what the hardest parts usually are for someone newer to debugging their code.

2019-06-23T18:48:35+00:00


Page created: Fri, Dec 13, 2019 - 09:05 AM GMT