Cataclysmic Mutation

Machine Learning and Whatever Else

Icelandish

First off, apologies to the excellent Yessica over at the real Icelandish blog for cribbing her title. It’ll only happen just this once, but it was too appropriate to pass up.

This was an apartment listing when I was looking last fall, as rendered by Google Translate:

Climbing Mt. Esja

Summer has finally announced its imminent arrival in Iceland, and I went this weekend with some friends to Mt. Esja, a popular hiking destination just outside Reykjavík. It’s about 1000 meters high at the summit, and the walk/climb to the top was pretty brutal for my woefully out-of-shape body. With some brief rest stops, getting to the top wasn’t so bad. However, the trip left my legs a bit jellied, and that in combination with my poor choice of footwear made the descent very taxing.

I think once I buy a set of proper hiking boots, I may make a regular occurrence out of this hike.

Debunking the “Supermoon” With Clojure

So some jackass realized that the moon doesn’t orbit in a perfect circle and decided that he alone in the annals of history was in possession of the information that Earthquakes and other doom might ensue, and since the Moon was closer than usual (more on that in a bit), something *bad* was about to happen. No big deal. There are about 7,000,000,000 people on planet Earth, and a good percentage of them are, even now, right as you’re reading this, drooling uncontrollably from their slack jaws. However, this happened to make what now passes for news just before a massive earthquake hit off the coast of Japan, and so now the idiots who have taught themselves to operate a television are convinced that they’re correct. Oh, dear.

There are so many problems here, it’s hard to know where to start. Possibly that the moon isn’t actually especially close to Earth right now, and couldn’t cause this earthquake even if it was. Anyway, Phil Plait has a nice takedown up right now, and he knows a lot more about planetary physics than I do anyway, so just go read what he has to say. I’m writing this for another reason. One of my Facebook friends wondered what the actual plot of earthquake frequency and magnitude versus lunar distance actually would look like. That sounded like a fun little exercise in programming.

There are basically two questions that need to be answered. First, when have there been major earthquakes? I went to Wikipedia and grabbed a list of the 24 largest earthquakes in known history (today’s Japanese quake comes in number six). This gives me the magnitudes, but also the dates the quakes occurred. The range of dates covered by these 24 mega-quakes spans from a 1575 Chilean quake right up to today’s Japanese quake. The second thing we need is an ability to calculate the distance between the Earth and moon on any given day in history. That’s a tougher problem, but a bit of Google-fu turned up the ELP2000-82B model of lunar ephemerides. That’s what we need here. However, it’s a very complex model, and I didn’t want to spend a week debugging scientific code, but fortunately, I found an implementation on Github.

A bit more reading of code and comments showed that the model needs Julian dates. A bit more searching to find the exact formula for this conversion, and a quick implementation of said formula in Clojure yields the following.

1
2
3
4
5
6
7
(defn gregorian-to-julian
  [year month day hour minute second]
  (let [a (/ (- 14 month) 12),
        y (+ year 4800 (- a)),
        m (+ month (* 12 a) (- 3)),
        jdn (+ day (/ (+ (* 153 m) 2) 5) (* 365 y) (/ y 4) (- (/ y 100)) (/ y 400) (- 32045))]
    (+ jdn (/ (- hour 12) 24) (/ minute 1440) (/ second 86400))))

Because this was intended to be quick and dirty, I just manually set up a simple data structure for my earthquake data, and rather than spend lots of time integrating my Clojure code with the C library that implemented the lunar model, I simply used the C code to compute the distances, and then copied them directly into Clojure as well.

1
2
3
4
5
6
7
8
9
(def *quakes*
     (list [[1960 5 22] 9.5] [[2004 12 26] 9.3] [[1964 3 27] 9.2]
           [[1952 11 4] 9.0] [[1868 8 13] 9.0] [[2011 3 11] 8.9]
           [[1700 1 26] 8.9] [[2010 2 27] 8.8] [[1906 1 31] 8.8]
           [[1833 11 25] 8.8] [[1965 2 4] 8.7] [[1755 11 1] 8.7]
           [[1730 7 8] 8.7] [[2005 3 28] 8.6] [[1957 3 9] 8.6]
           [[1950 8 15] 8.6] [[2007 9 12] 8.5] [[1963 10 13] 8.5]
           [[1938 2 1] 8.5] [[1923 2 3] 8.5] [[1922 11 11] 8.5]
           [[1751 5 24] 8.5] [[1687 10 20] 8.5] [[1575 12 16] 8.5]))

To get the Julian dates for each quake, it’s sufficient to just run a quick and dirty map over the quake data as follows.

1
2
3
4
(defn build-julian-dates
  [quake-list]
  (let [gregdates (map first quake-list)]
    (map #((comp int gregorian-to-julian) (nth % 0) (nth % 1) (nth % 2) 11 0 0) gregdates)))

Now we feed that data into the C code from Github to calculate the distance the moon was from the earth at noon on the day of the quake (I made that approximation rather than look up the precise time each quake occurred). After those distances have been computed, I just copied them back into a Clojure data structure.

1
2
3
4
5
(def *distances* (list 401457.7391 406456.7146 397480.5329 391134.7747
                       367611.4645 384653.3206 391978.4295 357983.2221 404404.0269
                       388230.3723 395956.3832 360256.3522 401928.1778 375390.8061
                       368042.0999 371760.8433 403062.0740 399405.1708 395555.1038
                       364759.5125 373865.8605 361623.8578 366941.6539 401339.3053))

With that, we have enough data to plot earthquake strength versus lunar distance for the 24 largest quakes in recorded history. Clojure has a really nice library available called Incanter that makes charting and statistical analysis very easy.

1
2
3
4
5
6
7
8
9
(defn make-plot
  []
  (let [plot (scatter-plot (map second *quakes*) *distances*
                           :title "Quake Strength vs. Lunar Distance"
                           :y-label "Distance to Moon in km"
                           :x-label "Magnitude")]
    (set-y-range plot 356380 406710) ;; min/max lunar distance between 1575 and 2011
    (view plot)
    (save plot "plot.png")))

The resulting plot is shown below. As you can see, there’s no relationship at all between the strength of the strongest quakes and the current distance from the Earth to the moon. Note that the Y-axis is scaled to the minimum and maximum distances the moon has been in about 450 years, so the quakes really are distributed all over the place.

One last thing. Although the graph is pretty obvious, we really shouldn’t try and use intuition as a substitute for actual analysis, so let’s go ahead and fit a linear model to the data. Again, using Incanter,

1
(:r-square (linear-model *distances* (map second *quakes*)))

we get an value of 0.1193, or basically no relationship at all.

Building Your Own Keyboard Layout for Fun and Profit

It’s commonly known, if you’re the kind of person who commonly knows this sort of thing, that the normal QWERTY keyboard layout is not particularly well suited to typing English prose. The usual story is that it was designed to intentionally slow typists down so that the physical hammer mechanism used by early typewriters would have time to clear between key punches so as to not jam. Probably, that’s a bit suspect. But certainly the design was affected by the implementation on these typewriters to some degree. You can switch to the Dvorak layout, and there’s Colemak, and a few lesser known ones as well, but what is the optimal layout? It turns out that mathematicians and computer scientists have actually spent a bit of time thinking about this problem, and it’s a fun little exercise to look at how we might go about answering it.

I’ve seen this question appear two or three times on Reddit now, and I’ve always written a lengthy explanation. My Ph.D. work consisted of a large segment the general optimization problem that we’ll use to model keyboard layout, and I actually gave a talk on using it for that purpose in something of an “Ig nobel” approach. Anyway, it finally occurred to me that I have a convenient place where I can publish this sort of thing.

First, what is the actual problem we’re trying to solve? Basically, we want to be able to type as quickly as possible. Anyone who types with greater than “hunt-and-peck” speeds knows that certain words are harder to type than others. Personally, I notice this whenever I visit “news.bbc.co.uk”. Let’s look at that URL to see why. The first bit isn’t so bad really. “N-E-W-S” is typed with my right index finger reaching a bit, the “E-W” rolls off the left hand quite easily as the two characters are adjacent on my QWERTY keyboard layout so you can sort of drum your fingers across the two letters quite quickly. But then I have to hit the “S” with the same finger that just typed the “W”, but down a row. Moving my fingers up and down rows takes much longer than simply pressing a different finger on the home row. So there’s a little delay while I get my ring finger in position after typing the “W”. The “.bbc.” isn’t so bad. The “B” is one of the harder keys to reach, and I generally find the bottom row to be harder to type on than the top row (both of which are harder than the home row obviously), but the repeated keypress on the “B” is as easy as possible, and the “C” uses a different finger, and the “.” is on the other hand, both characteristics helping to allow faster typing of those keys. But the end of the URL, “.co.uk” has a lot of right hand action, and a lot of jumping up and down rows. Because it will be useful later, let’s think about the difficulty of pressing two physical keys on the keyboard in succession as the distance between those keys.

Note a critical insight here. This distance we’ve defined is between keys or buttons on the keyboard. It has nothing to do with letters. We can (and will) think about removing the binding of letters and symbols to keys so that we can rebind them in more optimal ways. To do this, we need this distance measure between the physical keys where we can put them.

We need a bit more information before we can start though. Namely, what is it that we hope to type quickly. After all, you may think that QWERTY is an awful layout, but if you want to type the sentence, “asdf asdf asdf asdf asdf,” it’s a pretty good system. Generally, we’re probably interested in typing ordinary English prose, but it’s not clear exactly what that means. We need to formalize that description somewhat. The key is to understand that typing quickly is simply being able to type successive pairs of characters needed quickly and easily. If “T-H” and “H-E” are both very easy to type successively, then the word “the” will be easy to type. This means we need a convenient source of what we’ll call 2-gram frequencies. The idea is that, given the type of text you wish to write, can you estimate the frequency of each pair of characters appearing in succession? We could sit down and attempt to reason our way through it, e.g., “t-h appears a lot, e-r does too…” However, a more reliable solution is to simply analyze a bunch of text that seems to be a good representative sample of the type of thing you’ll be typing, and compute the frequencies directly from that.

Let’s take a little detour into classical optimization problems. Introduced in 1957 by Koopmans and Beckman, the Quadratic Assignment Problem (QAP) asks the following question. Can you find the best way to put buildings in available physical locations such that overall amount of traffic between the buildings is minimized? Imagine you’re tasked with building a new campus for a university with 25,000 students, faculty, and staff. The university planning committee hands you a list of all the academic buildings they’ll have (physics, engineering, mathematics, nursing, psychology, literature, etc.) and a table of numbers that tell you how many people walk from one building to another during the course of an average week, for all pairs of buildings on campus. They’ve noticed anecdotally that there are a lot of students who complain that they’re always late for class because they have to walk all the way across campus to get from their calculus class to their physics classes. They want you to try and put related buildings like that closer together when you design their new campus.

Looking at the problem, you realize that what you really want to do is minimize the total amount of miles walked by all the people on campus. If 20 people walk one mile each, and two people walk five miles each, the total walked is just miles. That is, you multiply the distance between the two locations by the amount of traffic flowing between the buildings you assign to those two locations. Sum this over all the location/building assignments, and you have the total miles walked. Minimizing this is the QAP.

Because we’re computer scientists, let’s make this easy and number everything. We have buildings and locations to put them in. Let’s number the buildings from 0 to . If we agree to always write them in the same order, we can represent a possible assignment as a valid permutation of those numbers. For example,

2 7 1 4 0 6 5 3

denotes that building 2 goes in location 0, building 7 in location 1, building 1 in location 2, and so on.

We can now formalize the problem. Take all those distances between locations and stuff them in a matrix such that is the distance between location and location . Similarly, we define a cost or “flow” matrix such that is the amount of traffic flowing between building and building . Our goal is to find the permutation such that the following objective function

is minimized. Expressed in pseudocode, this function is as shown below.

int objective(P)
    f = 0
    for i=0 to N-1
        for j=0 to N-1
            f  = D[i,j] * C[P[i],P[j]]
        end for
    end for
    return f

It should hopefully be starting to fall into place. The problem of finding an optimal keyboard layout for a given set of 2-gram frequencies is just an instance of the QAP. Instead of some form of physical distance between locations, we have a generalized distance metric that intends to conform to the idea that harder to type pairs of keys are “farther apart” than easier to type pairs. And our flow or cost matrix comes directly from the 2-gram frequencies calculated from a large sample corpus of representative text we might wish to type.

To actually solve the underlying QAP instance is challenging. Unlike, say the traveling salesrep problem (TSP), QAP is very difficult to solve well even in practice. Whereas TSP instances of tens of thousands of cities can be routinely solved within a few percent of optimality by simple heuristics, QAP instances much larger than about 35 or so remain very difficult to handle in practice. Smallish instances can be directly solved via integer programming techniques, often based on some form of branch-and-bound algorithm. However, these methods can be quite slow, and fail completely for large instances. Depending on precisely how you formulate the problem, keyboard layout is likely small enough to be handled directly. However, as my expertise is on heuristics, that’s where I will focus my attention.

Going back to Reddit, another common theme is a fascination with genetic algorithms. It’s a fascination that I understand completely. Unfortunately, GAs rather, well…how to put this delicately…suck at QAP. Actually, they’re not so fantastic on a lot of problems, but QAP in particular is kryptonite to whatever super powers your GA might ever have. That said, they are fun to explore, and looking at how we might approach QAP with a GA is instructive on several levels, so I want to talk a bit about how you’d build a GA for this problem.

Without going into all the details on GAs in general, the real issue here is crossover. Recall that every valid potential solution is a permutation, and all the simple crossover operators you might be familiar with from a GA tutorial or two share the problem that they will almost never actually produce permutations if we use them here. Assume the following two permutations:

p1 = 0 1 2 3 4 5 6 7
p2 = 4 5 7 3 1 0 6 2

Let’s imagine doing a typical one-point crossover operation with crossover point equal to three. This gives us the following situation, with the parts of the parents that will not be swapped replaced by xs.

p1' = 0 1 2 x x x x
p2' = 4 5 7 x x x x

If I just swap those two pieces, I get nonsensical results.

c1 = 0 1 2 3 1 0 6 2
c2 = 4 5 7 3 4 5 6 7

In this example, offspring c1 has building 0 in two different places (and the same for buildings 1 and 2) and it doesn’t put buildings 4, 5, or 7 anywhere.

Obviously, this is not good. The reason I get bad results is because the subset of values I chose from each parent contains values not contained in the other, which means that those values must be in the other part of the parents, and will be repeated when I swap the pieces out. For example, if I swap them, the child 2 is going to get “0 1 2” as its first three numbers, but because those weren’t in the first three numbers of parent 2, they must be somewhere else, and since child 2 gets the rest of the values from parent two, you repeat them, which is bad. What if instead, I choose indexes 0, 1, 4, and 5. Now I have

p1' = 0 1 x x 4 5 x x
p2' = 4 5 x x 1 0 x x

If I swap those values to create two partial offspring, I get

c1 = 4 5 x x 1 0 x x
c2 = 0 1 x x 4 5 x x

Completing the crossover operation by copying the x bits down from the two parents yields

c1 = 4 5 2 3 1 0 6 7
c2 = 0 1 7 3 4 5 6 2

which works nicely.

Fortunately, it turns out that there is an algorithm that guarantees exactly this situation. The method is called Cycle crossover (CX), and it’s one of a number of methods that can be used on permutations. More precisely, what CX does is define the algorithm by which you can repeatedly generate these subsets of indices in such a way as to allow you go do a crossover operation without mucking up your permutations. Slightly more formally, what CX tries to do is find subsets of positions such that a “cycle” is formed, where a cycle is defined as a closure on the contained values. OK, that doesn’t help at all. That’s why I provided an example (and code).

Note that the code below actually does a bit more than I let on in the example. In general, finding one cycle isn’t enough. Cycles can be quite short compared to the length of the string, so finding one cycle would not tend to mix the parents very well. Instead, what we do is repeat the process of finding and crossing cycles until we’ve found them all. By the same token, if we crossed every cycle over, we’d end up just swapping the parents whole. Instead, every other cycle is swapped; the rest are passed down unmodified.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
void cycle_crossover(const chromosome& p1,
                     const chromosome& p2,
                     chromosome& c1,
                     chromosome& c2)
{
    mtrandom mt;

    // find a cycle starting from a random point
    int n = (int)p1.length();
    int cycle_len = 0;
    vector<int> perm = mt.permutation(n);
    int index;

    bool mix = true;            // mix cycles from different parents
    while(cycle_len < n) {
        // pick the first point in the cycle
        for(index=0; index<n; index++) {
            if(perm[index] != -1) {
                break;
            }
        }
        index = perm[index];

        // store the points in the cycle
        int start = p1[index];
        vector<int> cycle;
        do {
            cycle.push_back(index);
            perm[index] = -1;   // this point has been included in a cycle

            // set index equal to position of p2[index] in p1
            for(int i=0; i<(int)p2.length(); i++) {
                if(p1[i] == p2[index]) {
                    index = i;
                    break;
                }
            }
            cycle_len++;
        } while(p1[index] != start);

        // now that we have a cycle, update the offspring
        for(unsigned int i=0; i<cycle.size(); i++) {
            if(mix) {
                c1[cycle[i]] = p2[cycle[i]];
                c2[cycle[i]] = p1[cycle[i]];
            } else {
                c1[cycle[i]] = p1[cycle[i]];
                c2[cycle[i]] = p2[cycle[i]];
            }
        }
        mix = !mix;
    }
}

An interesting question is why one would choose CX over one of the other permutation crossover operators. The answer lies in understanding what properties of a candidate solution are important, and should thus be heritable. Let’s back off of QAP for a moment and think about the traveling salesrep problem (TSP), another permutation problem. In TSP, our goal is to find an ordering of cities such that a tour of each city in the given order is shortest (assuming a return to the first city as well). Again, assuming some numbering of cities, we might have the following two candidate solutions:

p1 = 6 4 3 7 0 2 1 5
p2 = 7 0 2 1 5 6 4 3

Note that these two solutions have no cities in the same location. And yet a bit of thought should convince you that they by necessity have the same cost, as they visit each city in exactly the same order, only starting and ending in a different city. They are the same tour. Generalizing a bit, the important part of a TSP solution is not where exactly the numbers appear in the permutation. It’s what numbers they appear next to in the permutation. So ideally, if we have two good parents in TSP, our crossover operator should try to produce offspring in which the adjacency of cities is similar to that exhibited in the parents.

Compare this to QAP. In QAP, the physical location of a building is important. The building will be connected to every other building with some flow, and the physical distance between locations is very important. Adjacency doesn’t matter at all, as location 0 and location 1 may be on opposite sides of a campus, so the fact that a solution puts one building “next to” another one purely in terms of adjacent location numbers is not a meaningful piece of information. Crossover should be designed to pass down relevant information, and for QAP, it’s absolute position in the chromosome. CX does exactly that.

With a suitable crossover operator, basic familiarity with GAs should allow you to implement one. Mutation can be done simply by swapping two elements at random within the chromosome. Selection and replacement occur as usual.

The one piece we’ve yet to address is how to build a distance matrix for optimizing your keyboard layout. I alluded briefly earlier to the notion that keys that are hard to type consecutively should be further apart according to this distance measure, but there are a lot of ways to interpret that. In addition, elements of that decision will need to take into account the specific model of keyboard you’re trying to optimize. I advocate the following approach: sit down with your keyboard and a pen and paper and start listing little rules and associated penalties. As a really quick example, you might have something like this.

  1. Default distance between two keys is 0
  2. +1 if keys are on the same hand
  3. +1 for each key in the pair that is not on the home row
  4. +10 if the keys use the same finger, but skip one row (e.g., ‘c’ and ‘e’ on a QWERTY keyboard)
  5. +12 if the keys use the same finger, but skip two rows (e.g., ‘c’ and ‘3’ on a QWERTY keyboard
  6. +9 for hard to reach keys (e.g., the ‘=’ key on most American keyboards)

The important thing here is that your rules match your expected keyboard and that the resulting typing model be roughly consistent and representative of realistic typing preferences. It’s not easy to come up with good rules. My advice would be to write a program that lets you easily tweak the rules and generate the resulting distance matrix. That way, you can experiment much more easily, as it’s unlikely you’ll be happy with your model the first few tries. The nice thing about this approach is that you can use it for very non-standard purposes. The most recent Redditor to ask about this was attempting to design a layout that could be used with one finger. That’s a perfect opportunity to use something like this, because you can easily play with suitable models of what it would be like to type on a hypothetical keyboard, and optimize your key assignments accordingly.

Finally, there’s one other really important thing you’d want to do, and it’s a real bummer as it takes us out of the nice pure world of mathematics where the basic abstract QAP formulation is all we need. This last issue is that while your computer will be capable of finding “good” layouts as measured by that overall distance times flow metric, real humans have several other constraints for which violations will make them cranky. People won’t like it if the ‘(‘ character is where the comma key normally goes and the ‘)’ is a shifted ‘2’. Look at your keyboard: you have several issues like this. The square brackets, parenthesis, and curly braces should have a coherent organization (next to their matching symbol, same level of shiftedness, etc.).

In the vernacular of optimization problems, these are called constraints, and constrained optimization is a whole category of study by itself. If you’re doing heuristic search like GAs or tabu search, you have two choices really. You can penalize the fitness of solutions that violate these constraints and hope that the algorithm will find a way to find better objective function values by satisfying them, or you can directly attempt to detect and fix violations. You can also treat the degree of violation as an extra objective to be minimized and do multiobjective optimization. My particular preference is usually to repair violations if I can and penalize when I can’t.

I’ve posted some C++ source code for generating the distance and flow matrices. Note a couple of things. My keyboard model is pretty basic. And by “basic”, I mean “shitty”. Nonetheless, it might give you a base to start from. Also, I did this a few years ago as a graduate student as part of my department’s sort of Ig Nobel Research Day where we all did something goofy with our serious research. As my research was mostly on multiobjective optimization, I generated two sets of flow matrices: one based on a corpus of my dissertation’s LaTeX source code, and the other based on a fairly large C++ code base I used for me research at the time. With multiobjective optimization, you get a set of solutions when your done that represent different trade-offs between the objectives. So I had one keyboard that was really good for typing my dissertation, one keyboard that was really good for typing my C++ code, and a bunch of keyboards in between. However, I punted completely on the constraints mentioned above, so my keyboards were all very screwy.

So anyway, that’s a pretty decent overview of how you can build your own customized keyboard layout using a bit of ingenuity to build a good keyboard model and some standard algorithms to attack the resulting optimization problem. If you try this yourself and come up with something interesting, drop me a line. I’d love to see what you came up with.

Arch Linux Configuration

The following was written in 2009 as a set of notes for things I had to figure out in order to switch to Arch Linux. The Linux server it was running on didn’t make the trip with me to Iceland, but I wanted to have a place where this bit of documentation intended mostly for myself could still live, so I’m posting it here now. Much of it may no longer be relevant.

So, Gentoo has “teh suck.” Not its fault really. It’s just suffered from a lack of good leadership.

OK, that was a bit harsh. Part of the problem is that I don’t use Gentoo the way I think you’re supposed to – always obsessively keeping a bleeding edge distribution. I really like the Gentoo organization system, and I’d still be using it except for how it’s, you know, horribly broken. Specifically, if you don’t update for a while, you’re screwed. You end up with package versions that aren’t even in portage anymore, and the system basically refuses to build certain things. And good luck if you need a glibc update.

Anyway, as a result of this, my Gentoo box has been somewhat hosed for a while now. I needed to build a new GCC, and it wouldn’t unless I committed to spending my life tracking down one problem after another, so I gave up. This weekend, I finally bit the bullet, wiped it, and installed Arch. So far, it seems pretty good, but there were a number of weird little issues that I wanted to document.

I should vehemently note here that all of the following is from my recollection after the fact. I wasn’t taking good notes during the process, so it’s likely that I’ve left something out – probably several things.

Before I get too far, let me mention a couple of quick notes. Pacman, the Arch package manager is actually quite good. To install a package, I’ve been doing the following

$ pacman -Ss boost
extra/boost 1.37.0-1		Boost provides free peer-reviewed portable C++ source libraries.
$ pacman -S boost

which does a search (“-Ss”) of the supported repositories and reports back with what it found. I then install the package with a bare “-S”. There are other options, and the man pages are helpful, but that’s the gist of things.

“You did what with Xorg?” or “It turns out input devices are useful in X.”

For reasons that completely escape me, Arch takes the concept of “modular Xorg” to whole new heights. If you’re like me, you would search for xorg packages, see a bunch of stuff scroll by, and pick out “xorg-server” as the one you need. And indeed, installing this package produced what seemed at first glance to be a fairly normal X11 installation. Trying to start X was a beautiful failure, as although everything started correctly, I had no keyboard and no mouse. As I foolishly hadn’t gotten ssh going yet, I got to hard cycle the machine.

Turns out, the good folks of Arch decided to do two lovely things: fail to install mouse_drv.so and kbd_drv.so, and configure X to allow an empty set of input devices. Thus, instead of getting an error telling you you didn’t have a keyboard or mouse, you got a functioning X display that couldn’t be killed.

Short answer to the problem, scream “fuckmonkey” at the top of your lungs and install xf86-input-mouse and xf86-input-keyboard.

$ pacman -S xf86-input-mouse xf86-input-keyboard xf86-input-evdev

Of course, if you prefer to communicate with your computer via telepathy or by careful direct application of electric current to the board, feel free to ignore this step.

There’s actually quite a bit more to this bullet point, as the Arch method of installing Xorg is somewhere just shy of “one package per file”. There’s more than I want to go into here, but just be prepared to do a lot of searching the repositories for various bits of X you thought for sure would have been installed along with the core X11 system.

On the bright side, once I helpfully clarified that I wanted to be able to interact with X, it installed and worked well enough. I installed the nvidia drivers

$ pacman -S nvidia

and Xorg -configure worked great. I did add a line to the ServerLayout section containing

Option "AllowEmptyInput" "false"

to make it complain if it didn’t find any devices, but otherwise, I left the file mostly unmodified. I also added a LoadModule “evdev” line so that it would work with modern peripherals.

Look at me, I’m on the Internet

I do a bunch of stuff from Apache. I serve out different pages based on the hostname coming in (NameVirtualHosts in Apache vernacular), I run ajaxterm (protected by SSL for obvious reasons), and I run a dynamic site for some friends written in Python using Django. So while none of this is particularly Arch specific, it seemed like a good time to document the process of deploying all this stuff on Arch.

First, you need to install apache and all the other crap I need.

$ pacman -S apache python mod_python django ajaxterm openssl

I also had to add a line to /etc/httpd/conf/httpd.conf to load the mod_python module.

We’re going to need a certificate for this and a couple of other steps, so we can go ahead and create a key and certificate via the normal mechanism.

$ openssl req -new -x509 -newkey rsa:1024 -days 3650 -keyout server.key -out server.crt
[answer lots of questions]
$ openssl rsa -in server.key -out server.key
$ cp server.key /etc/ssl/private/
$ chown nobody.nobody /etc/ssl/private/server.key
$ chmod 0400 nobody.nobody /etc/ssl/private/server.key
$ cp server.crt /etc/ssl/certs/

In Arch, the default home of your web presence is /srv/http/. I created directories for each of my vhosts under that directory and configured the NameVirtualHost environments for each one.

NameVirtualHost *:80
<VirtualHost *:80>
ServerAdmin root@localhost
DocumentRoot "/srv/http/docs/site1.com"
ServerName www.site1.com
ServerAlias site1.com
ErrorLog "/var/log/httpd/site1.com-error_log"
CustomLog "/var/log/httpd/site1.com-access_log" common
</VirtualHost>
<VirtualHost *:80>
DocumentRoot "/srv/http/docs/site2.com"
ServerName www.site2.com
ServerAlias site2.com
ErrorLog "/var/log/httpd/site2.com-error_log"
CustomLog "/var/log/httpd/site2.com-access_log" common
<Location />
SetHandler python-program
PythonPath "[r'/srv/http/docs/site2.com/production/'] + sys.path"
SetEnv DJANGO_SETTINGS_MODULE myapp.settings
PythonHandler django.core.handlers.modpython
PythonDebug On
PythonInterpreter myapp
</Location>
<Location "/static/">
SetHandler None
</Location>
</VirtualHost>
<VirtualHost *:80>
ServerAdmin root@localhost
DocumentRoot "/srv/http/docs/site3.com"
ServerName www.site3.com
ServerAlias site3.com
ErrorLog "/var/log/httpd/site3.com-error_log"
CustomLog "/var/log/httpd/site3.com-access_log" common
</VirtualHost>

There’s a lot of stuff going on here. Basically, in my example configuration, I am running three sites: site1.com, site2.com, and site3.com. The site1.com domain is a simple set of static web pages hosted at /srv/http/docs/site1.com/. The second block sets up site2.com to point to the Django application mentioned earlier. The first five lines in this VirtualHost declaration are very similar to the previous stanza. The new

block redirects all requests to mod_python, configured to run Django loading “myapp”. Also note the final <Location “/static/”> block. Within this directory, requests will be handled by Apache rather than being passed through to mod_python. This allows the web application to refer to images, css files, and other static data without going through unnecessary hoops. Finally, the last block defines site3.com to point to another document root. With this configuration, Apache can serve out different content for each of the three domain names, and the data is published by simply writing into the specified subdirectory of /srv/http.

In order to allow SSL encrypted connections, a similar set of blocks was placed in /etc/http/conf/httpd-ssl, shown below.

NameVirtualHost *:443
SSLEngine on
SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL
SSLCertificateFile "/etc/ssl/certs/server.crt"
SSLCertificateKeyFile "/etc/ssl/private/server.key"
<FilesMatch ".(cgi|shtml|phtml|php)$">
SSLOptions +StdEnvVars
</FilesMatch>
<Directory "/srv/http/cgi-bin">
SSLOptions +StdEnvVars
</Directory>
<VirtualHost *:443>
ServerAdmin root@localhost
DocumentRoot "/srv/http/docs/site1.com"
ServerName www.site1.com:443
ServerAlias site1.com:443
ErrorLog "/var/log/httpd/site1.com-error_log"
CustomLog "/var/log/httpd/site1.com-access_log" common
# proxy requests for /ajaxterm/
<IfModule proxy_module>
ProxyRequests Off
<Proxy *>
Order deny,allow
Allow from all
</Proxy>
ProxyPass /ajaxterm/ http://localhost:8022/
ProxyPassReverse /ajaxterm/ http://localhost:8022/
</IfModule>
</VirtualHost>

This gives site1.com available over SSL in addition to the already configured options above. I also enable a proxy so that if I ask for https://www.site1.com/ajaxterm/, I’m redirected to the ajaxterm instance running on port 8022 of localhost. For security purposes, it’s a good idea to expose the Ajaxterm proxy only behind SSL, but if desired, the proxy block could be copied into the site1.com configuration and used without encryption. Note that you can’t really use NameVirtualHost with more than one SSL host, as the SSL connection has to be set up prior to initiating a GET or POST request. Thus the server wouldn’t know which domain the client was hitting and might send out the wrong certificate.

Setting up a mail server

This one isn’t really Arch specific, but I’m sure I’m not the only one who runs a similar setup, and I thought it would be useful to document the steps for getting my setup going with Arch. Basically, my mail server at home is a stock Courier-IMAP server using SSL and Postfix configured to require authentication and forward the mail to my ISP’s mail server. Additionally, my ISP (AT&T) blocks inbound port 25, so I have Postfix listening on port 8025 as well.

First things first, we need to install a bunch of packages.

$ pacman -S postfix courier-imap cyrus-sasl libsasl openssl pam procmail

Arch puts the Courier configuration files under /etc/courier-imap, and you should be able to modify the imapd-ssl and pop3d-ssl files as you ordinarily would for your setup. For my purposes, it was sufficient to set the following lines in the imapd-ssl file.

TLS_CERTFILE=/etc/courier-imap/imapd.pem
MAILDIRPATH=maildir/

Note that the imapd.pem file will have to be created. During the Apache configuration above, we created the server key and certificate files in /etc/ssl/certs/ and /etc/ssl/private/. To make them work for our purposes here, do the following.

$ cat /etc/ssl/private/server.key /etc/ssl/certs/server.crt >> /etc/courier-imap/imapd.pem

Additionally, edit /etc/courier-imap/imapd.cnf to contain the correct information about your certificate.

For Postfix, we need to set up sasl authentication and forwarding to the ISP. I assume you’re familiar with basic configuration, but here are the values I edited in /etc/postfix/main.cf. You’ll want to edit at least some of these as well.

myhostname = your.fqdn.com
mydomain = fqdn.com
myorigin = $mydomain
proxy_interfaces = your.public.ip.addr
mydestination = $myhostname, localhost.$mydomain, localhost, $mydomain
# sasl authentication stuff here
smtpd_sasl_auth_enable = yes
smtpd_sasl_security_options = noanonymous
smtpd_sasl_local_domain = $myhostname
smtpd_recipient_restrictions =
permit_sasl_authenticated,
permit_mynetworks,
check_relay_domains
relayhost = [mailserver.yourisp.net]
home_mailbox = maildir/

In addition, to sidestep the port 25 problem with AT&T, we need to set up a separate service for our SMTP server to run on. Because I want to allow normal” clients to easily connect whenever possible, I also keep port 25 open as well. The first step is to add a separate entry for our new service. I have the following lines in /etc/services.

smtp-hi     8025/tcp
smtp-hi     8025/udp

Then, in /etc/postfix/master.cf, I add a corresponding line to tell Postfix to use that service as well as the default port 25.

smtp-hi   inet  n       -       n       -       -       smtpd

You’ll need to make sure that a few other services are running at boot time in order for all this to work correctly. In particular, make sure that the DAEMONS line in /etc/rc.conf contains at least the following services:

  • fam
  • authdaemond
  • saslauthd
  • courier-imap

Geotagging in C (Part 2: NMEA)

When last we left off, we had written most of the code necessary for reading and writing the metadata to and from Nikon raw images in NEF format. Recall, the ultimate goal is to be able to match the timestamp from a photo to an entry from our GPS log and write the corresponding location data back into the EXIF data for the photo. In this installment, we’ll look at how to parse the needed information from the files output by the data logger.

If you recall, the GPS unit I bought (the AMOD AGL3080), recorded its log files in a format called NMEA. NMEA takes its name from the National Marine Electronics Association, and it is, in a word, bizarre. Note that NMEA in actuality is not simply a data format, but defines an entire communications protocol. However, since the simple data logger we’re interested in isn’t really communicating with anything, all we really need to know is how to parse a file of data that corresponds to the NMEA protocol.

.NMEA data is at heart nothing more than CSV formatted data, but in NMEA, there are different types of “sentences”. Each line of data begins with a token that defines the sentence type, and the sentence type tells you what the remaining fields in the CSV formatted line represent. A reasonably complete listing of NMEA sentences and their meaning can be found here. A not so quick glance through that information shows that we need at least the NMEA sentence type “$GPGGA”. The GPGGA format is reproduced below.

$GPGGA,hhmmss.ss,llll.ll,a,yyyyy.yy,a,x,xx,x.x,x.x,M,x.x,M,x.x,xxxx*hh
1    = UTC of Position
2    = Latitude
3    = N or S
4    = Longitude
5    = E or W
6    = GPS quality indicator (0=invalid; 1=GPS fix; 2=Diff. GPS fix)
7    = Number of satellites in use [not those in view]
8    = Horizontal dilution of position
9    = Antenna altitude above/below mean sea level (geoid)
10   = Meters  (Antenna height unit)
11   = Geoidal separation (Diff. between WGS-84 earth ellipsoid and
       mean sea level.  -=geoid is below WGS-84 ellipsoid)
12   = Meters  (Units of geoidal separation)
13   = Age in seconds since last update from diff. reference station
14   = Diff. reference station ID#
15   = Checksum

At first glance, it would appear that this is all the information we need. It contains a timestamp, the latitude, longitude, and some information that tells us how reliable the GPS information was for this particular fix. However, there is a significant problem, which will be evident when we see an example of a real GPGGA sentence.

$GPGGA,111929.943,6400.0478,N,02058.8074,W,1,06,1.6,44.6,M,60.4,M,,0000*7B

The record show above tells us that at 11:19:29.943 UTC, the unit was at 6400.0478 north latitude, and 02058.8074 west longitude. Ignoring for a moment that the latitude and longitude seem to make no sense (we’ll come back to that in a moment), the bigger problem is that there is no date information. We know the time, but not the date. If we had taken photos on more than one day, there would be no way to figure out which GPGGA record we should match with each photo.

Looking back at the NMEA spec, it looks like the “$GPRMC” sentence contains the date information. Note that it also contains the latitude, longitude, and time information again, so if we’re content with that, we could simply dispense with the GPGGA sentences altogether. However, GPGGA sentences contain the secondary GPS fix information, as well as the altitude, and I wanted to be able to record altitude if I knew it. Thus, the strategy seems to be this: for each picture, we’ll find both the GPGGA and GPRMS records nearest to the time the photo was taken and combine their data to generate the EXIF data to write into the photo. Here is where I went ahead and cheated. The GPS unit I bought always seems to record GPGGA and GPRMS records every second, so I simply read each GPGGA sentence, left the date blank, and then filled it in based on the next GPRMS sentence read from the file. I can’t guarantee this is a bug-free strategy in general, but it does appear to work for me.

Now let’s return to the bizarrely formatted latitude and longitude for a moment. If you think back to your high school geometry class, there are several ways to represent portions of a circle. With latitude and longitude, the conventional units are degrees measured from specified zero points (the equator being zero degrees latitude, and zero degrees longitude being set through Greenwich). Again, there are a couple of ways to specify coordinates: we can specify a triplet of degrees, minutes, and seconds, or we can specify decimal degrees. For example, two latitudes 46º30’15” is the same point as 46.504167º. We get this by realizing that there are 60 minutes in a degree and 60 seconds in a minute, so 46º30’15” becomes 46 (30/60) (15/3600)=46.504167º.

So which format does NMEA choose? Both…neither…well, it’s complicated. NMEA, or at least as implemented by my device, chooses to store coordinates as degrees and decimal minutes. Oh, and the decimal point is in the wrong place. This is presumably what one gets when one lets fishermen design a data storage and transfer format. In any case, a coordinate like 6400.0478 N, 02058.8074 W from our example above actually represents 64º00.0478’ north latitude, 20º58.8074’ west longitude. Once we know this, it’s easy enough to convert to the degrees-minutes-seconds format needed for the EXIF convention. Note that EXIF also does not allow the character mnemonics like ‘N’ and ‘W’, so we’ll need to detect south and east coordinates and express them as negative numbers.

Let’s get some of this down in code. First, we need a structure to keep track of the notion of a timestamped location. Recall, we’ll fill this from both a GPGGA and a GPRMS sentence from the NMEA data log.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
typedef struct {
    time_t when;
    char status; /* A (active), V (void) */
    double latitude;
    char lat_ref;
    double longitude;
    char lon_ref;
    double speed; /* in knots */
    double heading; /* in degrees */
    double altitude; /* in meters */
    double geoid_ht; /* in meters */
    int num_sat;
    int quality;
} location_t;

We had some choices to make here. Do we store the directional mnemonics or convert the degrees as we read the file? Do we store the information that doesn’t seem to be immediately useful? What format should we store the time in? Some of these questions are somewhat arbitrary. The choices I made weren’t strongly considered; they just seemed reasonable to me. I store the time and date information in a standard time_t value, primarily because we can easily compare values allowing an efficient binary search of our array of locations read from the file. I decided to store some information from the NMEA log that didn’t seem necessary (heading, number of satellites, etc.). This was for no real reason other than interest – it was data that I thought I might want to do something with.

As NMEA files are CSV files, we need to be able to parse those files. Again, as with TIFF files and NMEA files, there are libraries available to handle this for us, but the whole point of this project was to write some fun low-level code, so I chose to implement my own simple CSV file parser.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/*
 * parse a line of comma separated text into a given
 * array of tokens
 */
void parse_line(char* line, char* sep, char** toks)
{
    char*  curr;
    char*  p;
    int i = 0;

    i=0;
    while(line != NULL && i < NUM_TOKENS) {
        curr = strsep(&line, sep);

        /* NMEA files from my device are DOS formatted; kill the rn stuff */
        if((p = strchr(curr, 'r')) != NULL)
            *p = '';
        else if((p = strchr(curr, 'n')) != NULL)
            *p = '';

        /* note we're reusing the space in the line here */
        toks[i++] = curr;
    }
}

With that code, we can read an entire NMEA file as follows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
/*
 * parse a file of NMEA sentences into an array of location_t records
 */
void parse_nmea_file(FILE* fp, location_t** rows, int* num_recs, int max_size)
{
    char** toks;
    char*  line;

    /* init some memory to use to parse the nmea file */
    toks = (char**)malloc(NUM_TOKENS * sizeof(char*));

    /* parse each GPRMC record */
    *num_recs = 0;
    line = (char*)malloc(82 * sizeof(char));
    while((line = fgets(line, 82, fp)) != NULL) {
        parse_line(line, ",", toks);
        if(strncmp(toks[0], "$GPRMC", MAX_TOKEN_LEN) == 0) {
            init_rmc_rec(&(*rows)[(*num_recs)  ], toks);
        } else if(strncmp(toks[0], "$GPGGA", MAX_TOKEN_LEN) == 0) {
            /* if first record is a GPGGA record, ignore it */
            if(*num_recs > 0)
                process_gga_rec(&(*rows)[(*num_recs)-1], toks);
        } else {
            /* skip other sentence types */
            continue;
        }

        /* if we've run out of space, realloc twice the space and keep going */
        if(*num_recs >= max_size) {
            max_size = max_size * 2;
            *rows = (location_t*)realloc(*rows, max_size * sizeof(location_t));
            if(!*rows) {
                fprintf(stderr, "error enlarging nmea sentences arrayn");
                exit(EXIT_FAILURE);
            }
        }
    }
    free(toks);
    free(line);
}

The strategy here is quite simple. We allocate an initial array to hold the location records and begin reading lines. When we fill up the array, we realloc some more memory and continue reading. This is a classic strategy for implementing growable arrays. As each line is read, we check the first token in the sentence. If it’s a GPRMC sentence, we initialize a new location entry. We parse the time and date and pull out the GPS information that is present. The rest we set to zeros. As soon as we read a GPGGA record, we pass the same partially populated location record in and fill in the remainder of the GPS fix information. All other sentence types are ignored. Note that the latitude and longitude are stored in the NMEA native degrees/decimal minutes format.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
/* 
 * builds up a location_t record based on a parsed GPRMC sentence
 */
void init_rmc_rec(location_t* rec, char** toks)
{
    struct tm fix_time;
    char tmp[3];

    /* time */
    strncpy(tmp, toks[1], 2);
    fix_time.tm_hour = atoi(tmp);
    strncpy(tmp, toks[1] 2, 2);
    fix_time.tm_min = atoi(tmp);
    strncpy(tmp, toks[1] 4, 2);
    fix_time.tm_sec = atoi(tmp);

    /* date */
    strncpy(tmp, toks[9], 2);
    fix_time.tm_mday = atoi(tmp);
    strncpy(tmp, toks[9] 2, 2);
    fix_time.tm_mon = atoi(tmp) - 1;
    strncpy(tmp, toks[9] 4, 2);
    fix_time.tm_year = 2000   atoi(tmp) - 1900;

    /* set the offset from UTC to 0, as GPS reports times in UTC anyway */
    fix_time.tm_gmtoff = 0;

    rec->when = timegm(&fix_time);
    rec->status = toks[2][0];
    rec->latitude = atof(toks[3]);
    rec->lat_ref = toks[4][0];
    rec->longitude = atof(toks[5]);
    rec->lon_ref = toks[6][0];
    rec->speed = atof(toks[7]);
    rec->heading = atof(toks[8]);

    /* these fields aren't part of GPRMS sentence; we'll add them later */
    rec->altitude = 0;
    rec->geoid_ht = 0;
    rec->quality = 0;
    rec->num_sat = 0;
}

/*
 * updates a given location_t record with information from a parsed GPGGA sentence
 */
void process_gga_rec(location_t* rec, char** toks)
{
    /* if we've already seen a more accurate GPGGA record and used it,
       then bail.  this is because there may be multiple GPGGA records
       per GPRMC record, and we want to use the first one succeeding a
       GPRMC record, as it should have the closest timestamp. */
    if(rec->altitude > 1e-3)
        return;

    /* make sure we have at least some kind of fix */
    if(atoi(toks[6]) == 0)
        return;

    /* update the previous location_t record with the fix and altitude data
       from the currently parsed GPGGA sentence tokens */
    rec->quality = atoi(toks[6]);
    rec->num_sat = atoi(toks[7]);
    rec->altitude = atof(toks[9]);
    rec->geoid_ht = atof(toks[11]);
}

Once this process is complete and the file has been completely read, what we’re left with is an array of location_t structures, ordered by time. Note that if we wanted to be completely general, we should go ahead and sort the array, but if we only read one GPS log file during the run, it will be in order by time already. Since the file is ordered, we can implement a simple binary search based on time.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
/*
 * custom binary search that looks for the record in the array whose timestamp
 * is nearest to the given timestamp.  if no records are stamped within epsilon
 * seconds, returns NULL.
 */
location_t* find_location_at(location_t* rows, unsigned int nrows, time_t ts, int epsilon)
{
    int low = 0;
    int mid;
    int high = nrows - 1;

    while(low <= high)
    {
        mid = (low + high) / 2;
        if(rows[mid].when < ts)
            low = mid + 1;
        else if(rows[mid].when > ts)
            high = mid - 1;
        else
            return &rows[mid];
    }

    /* low has passed high, so check which is closer to desired time */
    if(fabs(rows[high].when - ts) <= fabs(rows[low].when - ts) &&
       fabs(rows[high].when - ts) <= epsilon)
        return &rows[high];
    else if(fabs(rows[low].when - ts) < fabs(rows[high].when - ts) &&
       fabs(rows[low].when - ts) <= epsilon)
        return &rows[low];

    /* no record found within required time limit */
    return NULL;
}

Note that there’s really no excuse, even for a project where the point was to have fun writing low-level code, to avoid the built-in bsearch function from the C library. However, we want slightly different behavior from a standard binary search. In particular, it’s possible that when you’re out walking around taking photos, your GPS fix may drop out for a few minutes. Tall buildings can cause problems in cities, or maybe you simply forgot to turn it on for a few minutes when you started your photo walk. The nature of geotagging is such that we do not demand perfection. If the choice is between not tagging a photo or tagging it incorrectly but within a few hundred feet, most people would choose the approximately correct location. In any case, what we want is the ability to specify a tolerance – the largest time difference that we’re willing to accept as matching a given photo. A standard binary search will find an element if it exists, but has no way to say, “find the closest element unless the distance between it and the probe is greater than some value”. The custom binary search shown above allows just that.

So we can now read an NMEA log, store an array of locations with timestamps, and search this array for the best available location given a probe timestamp. From the previous post, we have the ability to parse information from an NEF file. What’s left is the specific code to get the capture time from a set of images, to wire up the search to find the best location, and to write the location back into the photo. We’ll pick up there in the next installment.

Geotagging in C (Part 1: NEF Files)

I actually did the work that this post will discuss a while ago, but I never documented it anywhere, and it seemed like a good exercise to describe in a bit more detail. What I’m talking about is how to add GPS information to photos taken with a Nikon digital camera, particularly if you shoot in raw mode. There are a few cameras or attachments available that can do this automatically, but the more common scenario is that you have a separate device that tracks your location as you walk around taking photos, and then you run a special program that uses the timestamps on your photos to match up the appropriate locations and embed them into your files. This post (and probably the next couple too) are describing how to write such software yourself.

Before we even get started, I should be clear that if you’re a photographer looking for a tool to geotag your images, you probably should be looking elsewhere. There are lots of good programs available with nice graphical interfaces that will let you do this pretty easily, and they’ll likely be more reliable as well. I’m on a Mac, and HoudahGEO seems to be a really good one that does all you need and more. For $30, it will almost certainly be a better use of your time than trying to figure out what I’m talking about here.

However, if you’re a hackery sort of chap and would like to see an example of how you’d write the core logic of this sort of application yourself, then grab your trusty C compiler and follow along after the fold.

OK, first of all, if you’re interested in actually putting any of this stuff to use, there are a couple of things you’re going to need. First, a Nikon camera capable of shooting RAW files in Nikon’s NEF format. Every Nikon camera shoots in a RAW file format that is potentially different from any other. You can register for Nikon’s SDK and get all the technical specs for your camera, but in general, the differences are pretty minor, and I suspect that the code I’ll be showing will work without modification on just about any Nikon DSLR. However, it’s only been tested on my D90, so your mileage may vary.

You’re also going to need something that can record a GPS log that you can match up with later. I purchased the AMOD AGL3080 logger from Amazon. For my purposes, it was important that the logger simply show up as a USB storage device when I plugged it into a computer, as I didn’t want to have to deal with some crappy third party driver in order to make it work on my Mac or Linux machines. The AMOD fit the bill, and I can recommend it pretty highly – it’s worked fine for me in the year or so I’ve had it.

GPS logs can come in several varieties. Probably the most common format is GPX, which is natively supported by a number of tools. GPX is an XML schema for GPS information, with all the good and bad that that entails. Unfortunately (or fortunately if you think XML is a train wreck), the AMOD unit does not generate GPX files. Instead, it produces logs in a format called NMEA. NMEA is basically CSV formatted data, which is nice, but has a number of quirks that make parsing it a little cumbersome. I’ll get to more of that later, probably in the next posting.

And finally, of course, you’re going to need a C compiler. You shouldn’t need much else, as the code should be quite portable. As this was mostly done for fun, I’m didn’t use any libraries outside of the basic C standard library. The code has been tested on several versions of GCC on 64-bit Snow Leopard and both 32 and 64-bit Linux, but I make no guarantees of any sort. There are bugs, and I run into one every now and then, but I do use the program myself, and it does (mostly) work. However, do not under any circumstances run the program on the only copy of an image you have. My usual work flow is to import my images into Aperture, run the tool on the NEF files, tell Aperture to update it’s metadata from the files, and only after I’ve verified nothing bad happened do I format the memory card in my camera.

NEF File structure

When you tell your Nikon DSLR to capture a file in raw mode, what you get is an NEF file. NEF files are basically TIFF files with certain characteristics predefined. This means that the core of our photo manipulation code is going to be TIFF code, and you can probably find a library or 30 out there that will make this process pretty painless. We aren’t going to take that route. Instead, we’re going to write our own TIFF handling code from scratch. It won’t be a complete library, but it will have enough functionality to do what we need, which is to pull the timestamp from the image and write back GPS coordinate information. The TIFF specification (PDF) is enormously useful in understanding what the files look like under the hood.

The overall structure of a TIFF file, and therefore of an NEF file as well, is pretty simple. You have a short header sequence that identifies the type of file followed by a set of what are called Image File Directories (IFDs) which contain a wealth of information about the image, finally followed by the actual image data. In the case of NEF files, this image data is extremely complex. If we were to write our own tool to interpret that data, we’d need to perform the demosaicing step that software like Aperture or Adobe Camera Raw can do. Fortunately, we don’t need to touch the image data at all – everything we need is stored in the IFDs. So what is an IFD and what does it look like?

An IFD is simply a list of entries, each of which has a particular structure. Typically, the entries are grouped together under an IFD based on their content. So one IFD might contain information about the exposure settings the image was shot with while another contains GPS settings. The metadata that we’re interested in that is stored in these IFDs is called EXIF data. EXIF defines a set of standard information related to digital photos, and among that information (technically in an extension to EXIF) are a set of features regarding GPS information. The particular IFDs and what they contain are described in Nikon’s file format documentation. All IFDs look the same: they are 12 bytes long and laid out as follows.

Bytes Field
0-1 Tag ID
2-3 Field Type
4-7 Number of Values
8-11 Value/Offset


There are a few things we need to say about this structure. First, if it wasn’t already obvious, TIFF files are written in binary, so we’re going to be reading and writing things in terms of bytes. The Tag ID field is two bytes, and the interpretation of these bytes is defined by the TIFF and/or NEF file specifications. NEF generally honors the underlying TIFF spec, but there are some extensions present as well. For our purposes though, the tag IDs will be standard TIFF IDs and they tell you what field you’re looking at. The field type consists of another two bytes, and again, the TIFF spec defines the interpretation of those bytes. Basically though, the field type just defines the data type for the particular bit of information (integer, real, string, etc.). We can define some constants to map those bytes onto human readable names as follows. `

1
2
3
4
5
6
7
8
9
10
11
12
13
/* TIFF data types */
#define BYTE 1
#define ASCII 2
#define SHORT 3
#define LONG 4
#define RATIONAL 5
#define SBYTE 6
#define UNDEFINED 7
#define SSHORT 8
#define SLONG 9
#define SRATIONAL 10
#define FLOAT 11
#define DOUBLE 12

Each of these types can be mapped onto a machine type, with the exception of the RATIONAL and SRATIONAL (signed rational) types. For those, we need to define a helper structure. `

1
2
3
4
typedef struct {
    unsigned int32 numerator;
    unsigned int32 denominator;
} rational_t;

We’re also going to need, at several points, to know the sizes of the various TIFF data types. Because each type gets a unique integer ID arranged sequentially, it’s a simple matter to create a global array indexing from each type to the size in bytes for that type. Note that no type has ID 0 in the TIFF spec, so we insert a leading dummy value to make the others line up with the ID given in each #define above. (Note that the sizes are defined by the TIFF specification.) `

1
2
/* size in bytes of each of the TIFF data types */
unsigned int type_bytes[13] = {0, 1, 1, 2, 4, 8, 1, 1, 2, 4, 8, 4, 8};

The third entry, the number of values, is fairly self explanatory. It simply defines how many values will be present for that field. A 20-byte ASCII string might be represented with a data type of “byte” and number of values equal to 20, for instance. The final field, the “value/offset” field is a bit more complicated. The basic idea is that this field contains the “value” for the IFD entry, so if the Tag ID is 0×0101 (ImageLength), then the value/offset field will contain the number of rows of pixels in the image. However, as we only have four bytes available, it will often be the case that the data needed to describe the given tag can’t be stored directly in the value/offset field. In this case, the value/offset field will contain a 32-bit offset into the file where the data for this tag is stored.

With this in mind, the overall structure of each IFD can now be described. Each IFD contains a 2-byte header denoting the number of directory entries (fields) in the IFD. Let’s call that number n. The header is then followed by n 12-byte IFD entries, and finally by a four byte offset to the next IFD in the file, or 0 if there are no remaining IFDs. We can describe this structure in C as follows. `

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
typedef struct {
    unsigned int16 tag;
    unsigned int16 type;
    unsigned int32 count;
    union {
        unsigned byte* byte_values;
        byte* sbyte_values;
        unsigned int16* uint16_values;
        int16* int16_values;
        unsigned int32* uint32_values;
        int32* int32_values;
        float32* float32_values;
        float64* float64_values;
        rational_t* rational_values;
    };
} direntry_t;

typedef struct {
    unsigned int16 count;
    direntry_t* dirs;
    unsigned int32 next_offset;
} ifd_t;

We have one remaining bit of information we need before we can start reading TIFF files, and that is the header information. Each TIFF file begins with a 2-byte sequence, either 0×4949 or 0x4D4D. The former tells us that the file stores the remaining information in little-endian format, the latter denotes big-endian. Bytes 2-3 of any TIFF file contain the “magic number” for TIFF, decimal 42. Bytes 4-7 then contain the offset to the beginning of the first IFD in the file. With this, we can begin to parse information from TIFF files. (Note the global variable byte_order, which will be needed to correctly parse multibyte values from the file.) `

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
/* TIFF header constants */
#define TIFF_BIG_ENDIAN 0x4d4d
#define TIFF_LITTLE_ENDIAN 0x4949
#define TIFF_MAGIC 42

/*
 * check the magic bytes at the beginning of the file to make sure it's
 * really a tiff file, and set the byte ordering in use in the file
 */
int valid_tiff_file(FILE* f)
{
    /* assume that file pointer is at offset 0 */
    unsigned int16 magic_number = 0;

    fread(&byte_order, 1, 2, f);
    magic_number = read_uint16(f);

    if(byte_order != TIFF_LITTLE_ENDIAN && byte_order != TIFF_BIG_ENDIAN) {
        fprintf(stderr, "invalid byte ordering %xn", byte_order);
        return 0;
    }
    if(magic_number != TIFF_MAGIC) {
        fprintf(stderr, "magic number %d (%x) not valid for tiff filen",
                magic_number, magic_number);
        return 0;
    }
    return 1;
}

With all this setup in place, the code to populate an IFD is a bit long, but very straightforward. `

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
//*
 * load an ifd_t from a given tiff file
 * reads all the direntry_t blocks and sets up the next_offset pointer
 * for the block
 */
void ifd_load(FILE* f, ifd_t* ifd)
{
    /* must be called when the file pointer is at the beginning of the IFD */
    int i, j;
    memset(ifd, 0, sizeof(ifd_t));

    /* get the number of directory entries */
    ifd->count = read_uint16(f);
    ifd->dirs = (direntry_t*)malloc(ifd->count * sizeof(direntry_t));

    /* read each directory entry */
    for(i=0; i<ifd->count; ++i) {
        ifd->dirs[i].tag = read_uint16(f);
        ifd->dirs[i].type = read_uint16(f);
        ifd->dirs[i].count = read_uint32(f);
        switch(ifd->dirs[i].type) {
        case BYTE:
        case ASCII:
        case UNDEFINED:
            ifd->dirs[i].byte_values = (unsigned byte*)malloc(ifd->dirs[i].count * sizeof(byte));
            break;
        case SBYTE:
            ifd->dirs[i].sbyte_values = (byte*)malloc(ifd->dirs[i].count * sizeof(byte));
            break;
        case SHORT:
            ifd->dirs[i].uint16_values = (unsigned int16*)malloc(ifd->dirs[i].count * sizeof(int16));
            break;
        case SSHORT:
            ifd->dirs[i].int16_values = (int16*)malloc(ifd->dirs[i].count * sizeof(int16));
            break;
        case LONG:
            ifd->dirs[i].uint32_values = (unsigned int32*)malloc(ifd->dirs[i].count * sizeof(int32));
            break;
        case SLONG:
            ifd->dirs[i].int32_values = (int32*)malloc(ifd->dirs[i].count * sizeof(int32));
            break;
        case FLOAT:
            ifd->dirs[i].float32_values = (float32*)malloc(ifd->dirs[i].count * sizeof(float32));
            break;
        case DOUBLE:
            ifd->dirs[i].float64_values = (float64*)malloc(ifd->dirs[i].count * sizeof(float64));
            break;
        case RATIONAL:
            ifd->dirs[i].rational_values = (rational_t*)malloc(ifd->dirs[i].count * sizeof(rational_t));
            break;
        }

        if(ifd->dirs[i].count * type_bytes[ifd->dirs[i].type] <= 4) {
            /* value fits entirely in the 4 byte value offset field */
            unsigned int bytes_read = 0;
            for(j=0; j<ifd->dirs[i].count; ++j)
            {
                bytes_read+=type_bytes[ifd->dirs[i].type];
                switch(ifd->dirs[i].type) {
                case BYTE:
                case ASCII:
                case UNDEFINED:
                    ifd->dirs[i].byte_values[j] = read_byte(f);
                    break;
                case SBYTE:
                    ifd->dirs[i].sbyte_values[j] = read_byte(f);
                    break;
                case SHORT:
                    ifd->dirs[i].uint16_values[j] = read_uint16(f);
                    break;
                case SSHORT:
                    ifd->dirs[i].int16_values[j] = read_int16(f);
                    break;
                case LONG:
                    ifd->dirs[i].uint32_values[j] = read_uint32(f);
                    break;
                case SLONG:
                    ifd->dirs[i].int32_values[j] = read_int32(f);
                    break;
                case FLOAT:
                    ifd->dirs[i].float32_values[j] = read_float32(f);
                    break;
                default:
                    fprintf(stderr, "can't fit specified type '%d' into value offset fieldn",
                            ifd->dirs[i].type);
                    break;
                }
            }
            for(; bytes_read<4; ++bytes_read) {
                read_byte(f);
            }
        } else {
            int p;

            /* read the offset where the data is stored */
            unsigned int32 dataloc = read_uint32(f);

            /* save the current offset */
            unsigned int32 cpos = ftell(f);

            /* jump to the data */
            fseek(f, dataloc, SEEK_SET);

            /* read it */
            for(p=0; p<ifd->dirs[i].count; ++p) {
                switch(ifd->dirs[i].type) {
                case BYTE:
                case ASCII:
                case UNDEFINED:
                    ifd->dirs[i].byte_values[p] = read_byte(f);
                    break;
                case SBYTE:
                    ifd->dirs[i].sbyte_values[p] = read_byte(f);
                    break;
                case SHORT:
                    ifd->dirs[i].uint16_values[p] = read_uint16(f);
                    break;
                case SSHORT:
                    ifd->dirs[i].int16_values[p] = read_int16(f);
                    break;
                case LONG:
                    ifd->dirs[i].uint32_values[p] = read_uint32(f);
                    break;
                case SLONG:
                    ifd->dirs[i].int32_values[p] = read_int32(f);
                    break;
                case FLOAT:
                    ifd->dirs[i].float32_values[p] = read_float32(f);
                    break;
                case DOUBLE:
                    ifd->dirs[i].float64_values[p] = read_float64(f);
                    break;
                case RATIONAL:
                    ifd->dirs[i].rational_values[p].numerator = read_uint32(f);
                    ifd->dirs[i].rational_values[p].denominator = read_uint32(f);
                    break;
                }
            }

            /* and jump back */
            fseek(f, cpos, SEEK_SET);
        }
    }

    /* read the next ifd offset */
    ifd->next_offset = read_uint32(f);
}

We have a few utility functions to read the various data types, taking byte order into consideration. Those functions haven’t been shown here, but they’re easy enough to derive as needed, or you can find them in the source code available for download. But otherwise, the code is quite simple. It reads the number of IFD entries, mallocs the appropriate amount of space, and then for each entry, reads the tag, type, and value count, and mallocs the memory needed to store the values. It then checks to see if the number of values needed of the given type can fit into the value/offset field. If so, it reads them directly. If not, it reads the value/offset field as an offset, saves the current location, seeks to the offset, reads the correct number of values, and then seeks back to the saved offset to put it into position to read the next entry. At the very end of the function, we finally read the offset of the next IFD so that we can continue parsing the file if needed.

Writing IFDs is very similar to reading them. There are a few small differences, we need to guarantee that we write complete fields, so there’s code to zero pad if necessary. Also, we have to calculate the correct offset when we see that the values needed can’t fit into the value/offset field. However, the overall structure of the code is very similar to that for reading.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
/*
 * write a new ifd_t block into the given tiff file
 */
void ifd_write(FILE* f, ifd_t* ifd)
{
    /* ifd is the GPSInfoIFD structure; file pointer of f must be positioned
     * to the location of the gps_info pointer
     */
    unsigned int16 i, j;
    unsigned int32 ifd_block_size;
    unsigned int32 ifd_value_offset;
    unsigned int value_bytes_written = 0;

    /* total block is a two byte header determining count of directory entries,
     * 12*n bytes for all n directories, and a 4 byte pointer to the next block */
    ifd_block_size = 2 + 12*ifd->count + 4;
    ifd_value_offset = ftell(f)+ifd_block_size;

    /* write the number of directory entries in the gps info section */
    write_uint16(f, ifd->count);

    /* write each directory */
    for(i=0; i<ifd->count; ++i) {
        write_uint16(f, ifd->dirs[i].tag);
        write_uint16(f, ifd->dirs[i].type);
        write_uint32(f, ifd->dirs[i].count);

        if(ifd->dirs[i].count * type_bytes[ifd->dirs[i].type] <= 4) {
            /* can write the data directly into the value offset field */
            unsigned int bytes_written = 0;
            for(j=0; j<ifd->dirs[i].count; ++j) {
                bytes_written+=type_bytes[ifd->dirs[i].type];
                switch(ifd->dirs[i].type) {
                case BYTE:
                case ASCII:
                case UNDEFINED:
                    write_byte(f, ifd->dirs[i].byte_values[j]);
                    break;
                case SBYTE:
                    write_sbyte(f, ifd->dirs[i].sbyte_values[j]);
                    break;
                case SHORT:
                    write_uint16(f, ifd->dirs[i].uint16_values[j]);
                    break;
                case SSHORT:
                    write_int16(f, ifd->dirs[i].int16_values[j]);
                    break;
                case LONG:
                    write_uint32(f, ifd->dirs[i].uint32_values[j]);
                    break;
                case SLONG:
                    write_int32(f, ifd->dirs[i].int32_values[j]);
                    break;
                case FLOAT:
                    write_float32(f, ifd->dirs[i].float32_values[j]);
                    break;
                default:
                    fprintf(stderr, "attempt to write impossible type '%d' into "
                            "value offset fieldn", ifd->dirs[i].type);
                    break;
                }
            }
            /* make sure we have written an even number of bytes */
            if(bytes_written % 2 == 1) {
                unsigned byte c = 0;
                write_byte(f, c);
                ++bytes_written;
            }
            /* make sure we fill out the value offset field completely */
            for(; bytes_written<4; ++bytes_written) {
                unsigned byte c = 0;
                write_byte(f, c);
            }
        } else {
            /* must write a pointer to where the data will be written */
            unsigned int32 cpos;
            unsigned int32 pos = ifd_value_offset + value_bytes_written;
            write_uint32(f, pos);
            cpos = ftell(f);

            /* now go to that location and write the data */
            fseek(f, pos, SEEK_SET);
            for(j=0; j<ifd->dirs[i].count; ++j) {
                value_bytes_written+=type_bytes[ifd->dirs[i].type];
                switch(ifd->dirs[i].type) {
                case BYTE:
                case ASCII:
                case UNDEFINED:
                    write_byte(f, ifd->dirs[i].byte_values[j]);
                    break;
                case SBYTE:
                    write_sbyte(f, ifd->dirs[i].sbyte_values[j]);
                    break;
                case SHORT:
                    write_uint16(f, ifd->dirs[i].uint16_values[j]);
                    break;
                case SSHORT:
                    write_int16(f, ifd->dirs[i].int16_values[j]);
                    break;
                case LONG:
                    write_uint32(f, ifd->dirs[i].uint32_values[j]);
                    break;
                case SLONG:
                    write_int32(f, ifd->dirs[i].int32_values[j]);
                    break;
                case FLOAT:
                    write_float32(f, ifd->dirs[i].float32_values[j]);
                    break;
                case DOUBLE:
                    write_float64(f, ifd->dirs[i].float64_values[j]);
                    break;
                case RATIONAL:
                    write_uint32(f, ifd->dirs[i].rational_values[j].numerator);
                    write_uint32(f, ifd->dirs[i].rational_values[j].denominator);
                    break;
                default:
                    fprintf(stderr, "attempt to write impossible type '%d' into "
                            "value offset fieldn", ifd->dirs[i].type);
                    break;
                }
            }
            /* make sure we have written an even number of bytes */
            if(value_bytes_written % 2 == 1) {
                unsigned byte c = 0;
                write_byte(f, c);
                ++value_bytes_written;
            }

            /* now jump back to where we were in the file */
            fseek(f, cpos, SEEK_SET);
        }
    }

    /* and finally, write the offset to the next ifd */
    write_uint32(f, ifd->next_offset);
}

A quick note on the structure of the C code shown. It’s possible that we could make the code a bit shorter by basically eliminating all the type checking and just stuffing bytes into our structures. I prefer to go ahead and try to get the type information put down as soon as possible, and maintain it through as much of the code as possible. So the read/write functions tend to be quite long, but I think it’s worth it to have a bit more self-documenting code.

This covers most of what we need to know about NEF files in order to complete our task. As we dig further in, we’ll spend a bit more time dealing with the peculiarities of the data (date/time formats, etc.) but the basic concept of reading, manipulating, and writing IFD entries forms the core of the RAW file processing we need. In the next post, I’ll talk about the NMEA data format for GPS log files, and we’ll take a look at how we can implement the matching of images to locations from a log.

New Year’s Eve in Reykjavík

Reykjavík is famous around the world for it’s New Year’s Eve celebrations. Take a group of 200,000 people, isolate them nearly completely from the rest of the world, hide the sun for four months, and eliminate even the most basic of regulation on fireworks, and apparently this is what you get.

Anyway, I was supposed to return from my Christmas holiday on the 28th, but the blizzard in the east coast of the US cancelled my flights, so I didn’t get back until the morning of the 31st. I was too tired and jet lagged to go out to the bars, but I did make sure I got out to see the fireworks as 2010 came to a close.

Unfortunately, it was really windy, so many of my attempts to take photos turned out a bit blurry due to the wind shaking the tripod. The handful that turned out the best are posted to a Flickr set. I also took some video both with my SLR on a tripod as well as my point and shoot handheld. You can find better video of the fireworks elsewhere, but I thought I’d post mine anyway.

Happy New Year to everyone!

Multiple GMail Accounts in Gnus

I have both my home and work email accounts through Gmail (work being hosted through our own domain). I wanted to configure Gnus to access both imap accounts with the following requirements:

  • Visually differentiate between mailboxes in the different accounts (Gnus wants to call all your Gmail account inboxes “INBOX”)
  • Have outgoing mail routed through the correct account, with an appropriate signature, from address, other headers, etc.
  • Have expiry work the way I want – basically, in “normal” mailboxes (not special Gmail folders), I want to be able to expire messages and have them moved immediately into the Gmail “All Mail” folder for archival.
  • Get the desired behavior for each mailbox (INBOX always visible, show read messages, etc.)

There are a number of hurdles you have to climb over to get this working acceptably, and I never found one source that had a complete working solution that met my needs, so I decided to publish my setup in the hopes that it might prove useful for someone. In short, here are the problems we need to solve:

  • All your google accounts will have the same server name, breaking authinfo logins
  • Get Gnus to understand how to set behaviors per-mailbox
  • Make outgoing mail route through the correct account’s SMTP server
  • Figure out a way to identify mailboxes by name in the Group buffer

Getting authinfo to work with multiple Gmail accounts

Ordinarily, you’d configure your select methods in Gnus something like the following:

‘Account setup in .gnus’
1
2
3
4
5
6
7
8
9
10
11
(setq gnus-secondary-select-methods
      '((nnimap "home"
                (nnimap-address "imap.gmail.com")
                (nnimap-server-port 993)
                (nnimap-stream ssl)
                (nnimap-authinfo-file "~/.authinfo"))
        (nnimap "work"
                (nnimap-address "imap.gmail.com")
                (nnimap-server-port 993)
                (nnimap-stream ssl)
                (nnimap-authinfo-file "~/.authinfo"))))

You’d then put your credentials into your .authinfo file like so:

machine imap.gmail.com login mylogin password mypassword port imaps
machine imap.gmail.com login mylogin2 password mypassword2 port imaps

The problem is that in our case, both accounts have a server name of “imap.gmail.com”, and thus authinfo can’t figure out which credentials to send to which account. If you’re using a new enough Gnus version, this problem can be solved by replacing “imap.gmail.com” in .authinfo with the account name “home” or “work” from .gnus. Alternately, you can define aliases for imap.gmail.com in your localhosts file and configure each account to access a different alias, allowing you to have different server names in .authinfo. I went with the simpler version.

machine home login mylogin password mypassword port imaps
machine work login mylogin2 password mypassword2 port imaps

Setting up Multiple SMTP Servers

Gnus by default has no mechanism for switching SMTP servers on outgoing messages. There are a few options, but the simplest is probably to install msmtp. This is a simple little program that accepts a command line parameter that tells it which server to use, and then connects to that server to send your mail. Once you’ve installed and configured that (my configuration is shown below), we’ll still need to customize Gnus to pass the right flags to msmtp depending on where we send the message from.

defaults
tls on
auto_from on
logfile ~/.msmtp.log

account home
host smtp.gmail.com
tls on
tls_certcheck off
auth on
from myusername@gmail.com
user myusername@gmail.com
password myhomepassword
port 587

account work
host smtp.gmail.com
tls on
tls_certcheck off
auth on
from me@myworkdomain.com
user me@myworkdomain.com
password myworkpassword
port 587

Setting behaviors per Mailbox

Most of what I need can be accomplished through Gnus’ notion of group parameters. You can theoretically customize parameters through the Customize buffer, but I abhor that awful abomination and would prefer to do it through Lisp. Fortunately, you can set most group parameters via the gnus-parameters variable. This variable is a list of alists, each of which contains a regular expression matching a group name followed by a set of parameters for that group. Below is my gnus-parameters.

‘Account customization in .gnus’
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
(setq gnus-parameters
      '(("nnimap work:INBOX"
         (display . all)
         (posting-style
          (name "Deon Garrett")
          (address "me@myworkaddress.com")
          (organization "My Employer")
          (signature-file "~/.signature-work"))
         (expiry-target . delete)
        ("nnimap work:[Gmail]/.*"
         (display . all)
         (posting-style
          (name "Deon Garrett")
          (address "me@myworkaddress.com")
          (organization "My Employer")
          (signature-file "~/.signature-work"))
         (expiry-wait . never))
        ("nnimap home:(INBOX|lists..*)"
         (display . all)
         (posting-style
          (name "Deon Garrett")
          (address "me@myhomeaddress.com")
          (signature-file "~/.signature-home"))
         (expiry-target . delete)
        ("nnimap home:[Gmail]/.*"
         (display . all)
         (posting-style
          (name "Deon Garrett")
          (address "me@myhomeaddress.com")
          (signature-file "~/.signature-home"))
         (expiry-wait . never))))

There’s a lot going on here, but basically, I have two accounts “home” and “work”, and for each account, I have one set of parameters for normal mailboxes, and another set for those “special” gmail mailboxes that I want to treat slightly differently (mainly with respect to expiry). Note the regular expressions for the group names: “nnimap XXXX:YYYY” where XXXX is the account name (“home” or “work”) and YYYY must match only the mailboxes I want. So the first block matches only the INBOX on my work account. It sets the headers I want on outgoing mail, tells Gnus to always show all messages when I enter the group, sends the correct switch to the msmtp program to select my work account (Edit: msmtp supports automatically choosing an account based on the From header. See my updated example .msmtprc file),and sets up expiry to immediate move expired messages to the “[GMail]/All Mail” folder. *Correction: I’ve since changed the expiry to delete the message instead of moving it to All Mail, as Gmail always keeps a copy in All Mail anyway. I have left the “expiry . never” for the Gmail groups to prevent deletion of messages from inside the Google special groups. **The second block is still the work account, but now matches only mailboxes named like “[GMail]/”, that is, all the special Gmail boxes. The only different setting here is that I tell Gnus to never expire a message from a special folder. The remaining two blocks of settings configure my home account in a similar fashion. There are a few global settings we need to set as well. We need to tell Gnus to use msmtp as our sendmail replacement. I also want all my subscribed groups to be always visible, and the “visible” group parameter won’t work from gnus-parameters, so per the documentation, we need to set that in an alternate fashion. You should also set up a default set of outgoing headers so that if you send mail from outside any group, you’ll still have some useful default. Below is the elisp to set up msmtp and make the groups visible. I’ll leave it to you to set most of those other variables, as they’re standard Gnus settings that you probably already know.

‘SMTP setup in .gnus’
1
2
3
    (setq message-send-mail-function 'message-send-mail-with-sendmail)
    (setq sendmail-program "/usr/local/bin/msmtp")
    (setq gnus-permanently-visible-groups ".*")

Fixing the Naming Problem

There’s one last really annoying issue – if you open gnus with this configuration and subscribe to INBOX from both accounts, you’ll see the groups buffer with the following:

0: INBOX
0: INBOX

Clearly we’d like to have a visual way of differentiating the two accounts. I tried setting the “comment” field in the group parameters and modifying gnus-group-line-format, but it had no effect. I believe the gnus-parameters variable isn’t consulted until you enter a group, which must happen after the group buffer is displayed, so that makes some sense. After experimenting with several complete failures, I hit upon the idea to use Gnus’ “topics” to give me at least a reasonable solution. Briefly, the idea of topics is that you can organize your group buffer by topic rather than by group, giving you a hierarchical display of, for example, all your “comp.lang.*” newsgroups under a tree instead of just in a flat list. From the group buffer, press “t”. This puts you into the topic minor mode. You can consult the info page for Topic Parameters for all the details, but the short version is that you can create new topics with “T n”, rename topics with “T r”, move groups from one topic to another with “T m”, etc. I created topics named “Home”, “Work”, and “News” (for a couple of NNTP groups), and moved each group or mailbox into the appropriate topic. Put the following into your .gnus to activate topic mode each time you start Gnus to persist your changes.

‘turning on topic-mode’
1
    (add-hook 'gnus-group-mode-hook 'gnus-topic-mode)

The result is shown in the screenshot below. It’s not perfect, but it’s certainly at least reasonable.

Emacs, Icelandic, and Input Methods

So Icelandic has bizarro-world characters in its alphabet. The Mac has an Icelandic keyboard map built in, but it’s slightly inconvenient to use. A lot of characters (á, ö, etc.) can be typed without switching layouts by typing Alt-key sequences, but typing characters like ð and þ require switching keymaps, and that process is cumbersome. You can put the switcher in the menu bar, but because I don’t want to give up my nice US layout the vast majority of the time, I end up switching layouts just to type one or two characters, and then switching back immediately. Since this requires the mouse, it’s slow.

The other problem with this is that the Alt key method doesn’t work at all in Emacs, which interprets those as meta key bindings. The solution to this problem turns out to be Emacs’ great support for input methods.

You type ‘C-‘ to switch input methods. The first time you do it, it prompts you to select one. I choose ‘icelandic-keyboard’. This accomplishes the same thing as setting Mac OS X’s keymap, but only affects emacs. The really great part is that once you’ve set the input method, pressing ‘C-‘ again toggles it. So I can really quickly toggle back and forth between Icelandic (to get ð, þ, and really easy access to the accents) and my normal US keymap that has much easier access to semicolons and the like.