I actually did the work that this post will discuss a while ago, but I never documented it anywhere, and it seemed like a good exercise to describe in a bit more detail. What I’m talking about is how to add GPS information to photos taken with a Nikon digital camera, particularly if you shoot in raw mode. There are a few cameras or attachments available that can do this automatically, but the more common scenario is that you have a separate device that tracks your location as you walk around taking photos, and then you run a special program that uses the timestamps on your photos to match up the appropriate locations and embed them into your files. This post (and probably the next couple too) are describing how to write such software yourself.
Before we even get started, I should be clear that if you’re a photographer looking for a tool to geotag your images, you probably should be looking elsewhere. There are lots of good programs available with nice graphical interfaces that will let you do this pretty easily, and they’ll likely be more reliable as well. I’m on a Mac, and HoudahGEO seems to be a really good one that does all you need and more. For $30, it will almost certainly be a better use of your time than trying to figure out what I’m talking about here.
However, if you’re a hackery sort of chap and would like to see an example of how you’d write the core logic of this sort of application yourself, then grab your trusty C compiler and follow along after the fold.
OK, first of all, if you’re interested in actually putting any of this stuff to use, there are a couple of things you’re going to need. First, a Nikon camera capable of shooting RAW files in Nikon’s NEF format. Every Nikon camera shoots in a RAW file format that is potentially different from any other. You can register for Nikon’s SDK and get all the technical specs for your camera, but in general, the differences are pretty minor, and I suspect that the code I’ll be showing will work without modification on just about any Nikon DSLR. However, it’s only been tested on my D90, so your mileage may vary.
You’re also going to need something that can record a GPS log that you can match up with later. I purchased the AMOD AGL3080 logger from Amazon. For my purposes, it was important that the logger simply show up as a USB storage device when I plugged it into a computer, as I didn’t want to have to deal with some crappy third party driver in order to make it work on my Mac or Linux machines. The AMOD fit the bill, and I can recommend it pretty highly – it’s worked fine for me in the year or so I’ve had it.
GPS logs can come in several varieties. Probably the most common format is GPX, which is natively supported by a number of tools. GPX is an XML schema for GPS information, with all the good and bad that that entails. Unfortunately (or fortunately if you think XML is a train wreck), the AMOD unit does not generate GPX files. Instead, it produces logs in a format called NMEA. NMEA is basically CSV formatted data, which is nice, but has a number of quirks that make parsing it a little cumbersome. I’ll get to more of that later, probably in the next posting.
And finally, of course, you’re going to need a C compiler. You shouldn’t need much else, as the code should be quite portable. As this was mostly done for fun, I’m didn’t use any libraries outside of the basic C standard library. The code has been tested on several versions of GCC on 64-bit Snow Leopard and both 32 and 64-bit Linux, but I make no guarantees of any sort. There are bugs, and I run into one every now and then, but I do use the program myself, and it does (mostly) work. However, do not under any circumstances run the program on the only copy of an image you have. My usual work flow is to import my images into Aperture, run the tool on the NEF files, tell Aperture to update it’s metadata from the files, and only after I’ve verified nothing bad happened do I format the memory card in my camera.
NEF File structure
When you tell your Nikon DSLR to capture a file in raw mode, what you get is an NEF file. NEF files are basically TIFF files with certain characteristics predefined. This means that the core of our photo manipulation code is going to be TIFF code, and you can probably find a library or 30 out there that will make this process pretty painless. We aren’t going to take that route. Instead, we’re going to write our own TIFF handling code from scratch. It won’t be a complete library, but it will have enough functionality to do what we need, which is to pull the timestamp from the image and write back GPS coordinate information. The TIFF specification (PDF) is enormously useful in understanding what the files look like under the hood.
The overall structure of a TIFF file, and therefore of an NEF file as well, is pretty simple. You have a short header sequence that identifies the type of file followed by a set of what are called Image File Directories (IFDs) which contain a wealth of information about the image, finally followed by the actual image data. In the case of NEF files, this image data is extremely complex. If we were to write our own tool to interpret that data, we’d need to perform the demosaicing step that software like Aperture or Adobe Camera Raw can do. Fortunately, we don’t need to touch the image data at all – everything we need is stored in the IFDs. So what is an IFD and what does it look like?
An IFD is simply a list of entries, each of which has a particular structure. Typically, the entries are grouped together under an IFD based on their content. So one IFD might contain information about the exposure settings the image was shot with while another contains GPS settings. The metadata that we’re interested in that is stored in these IFDs is called EXIF data. EXIF defines a set of standard information related to digital photos, and among that information (technically in an extension to EXIF) are a set of features regarding GPS information. The particular IFDs and what they contain are described in Nikon’s file format documentation. All IFDs look the same: they are 12 bytes long and laid out as follows.
|4-7||Number of Values|
There are a few things we need to say about this structure. First, if it wasn’t already obvious, TIFF files are written in binary, so we’re going to be reading and writing things in terms of bytes. The Tag ID field is two bytes, and the interpretation of these bytes is defined by the TIFF and/or NEF file specifications. NEF generally honors the underlying TIFF spec, but there are some extensions present as well. For our purposes though, the tag IDs will be standard TIFF IDs and they tell you what field you’re looking at. The field type consists of another two bytes, and again, the TIFF spec defines the interpretation of those bytes. Basically though, the field type just defines the data type for the particular bit of information (integer, real, string, etc.). We can define some constants to map those bytes onto human readable names as follows. `
1 2 3 4 5 6 7 8 9 10 11 12 13
Each of these types can be mapped onto a machine type, with the exception of the
SRATIONAL (signed rational) types. For those, we need to define a
helper structure. `
1 2 3 4
We’re also going to need, at several points, to know the sizes of the various TIFF data types. Because each type gets a unique integer ID arranged sequentially, it’s a simple matter to create a global array indexing from each type to the size in bytes for that type. Note that no type has ID 0 in the TIFF spec, so we insert a leading dummy value to make the others line up with the ID given in each #define above. (Note that the sizes are defined by the TIFF specification.) `
The third entry, the number of values, is fairly self explanatory. It simply defines how many values will be present for that field. A 20-byte ASCII string might be represented with a data type of “byte” and number of values equal to 20, for instance. The final field, the “value/offset” field is a bit more complicated. The basic idea is that this field contains the “value” for the IFD entry, so if the Tag ID is 0×0101 (ImageLength), then the value/offset field will contain the number of rows of pixels in the image. However, as we only have four bytes available, it will often be the case that the data needed to describe the given tag can’t be stored directly in the value/offset field. In this case, the value/offset field will contain a 32-bit offset into the file where the data for this tag is stored.
With this in mind, the overall structure of each IFD can now be described. Each IFD contains a 2-byte header denoting the number of directory entries (fields) in the IFD. Let’s call that number n. The header is then followed by n 12-byte IFD entries, and finally by a four byte offset to the next IFD in the file, or 0 if there are no remaining IFDs. We can describe this structure in C as follows. `
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
We have one remaining bit of information we need before we can start reading TIFF files, and that is the header information. Each TIFF file begins with a 2-byte sequence, either 0×4949 or 0x4D4D. The former tells us that the file stores the remaining information in little-endian format, the latter denotes big-endian. Bytes 2-3 of any TIFF file contain the “magic number” for TIFF, decimal 42. Bytes 4-7 then contain the offset to the beginning of the first IFD in the file. With this, we can begin to parse information from TIFF files. (Note the global variable byte_order, which will be needed to correctly parse multibyte values from the file.) `
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
With all this setup in place, the code to populate an IFD is a bit long, but very straightforward. `
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147
We have a few utility functions to read the various data types, taking byte order into consideration. Those functions haven’t been shown here, but they’re easy enough to derive as needed, or you can find them in the source code available for download. But otherwise, the code is quite simple. It reads the number of IFD entries, mallocs the appropriate amount of space, and then for each entry, reads the tag, type, and value count, and mallocs the memory needed to store the values. It then checks to see if the number of values needed of the given type can fit into the value/offset field. If so, it reads them directly. If not, it reads the value/offset field as an offset, saves the current location, seeks to the offset, reads the correct number of values, and then seeks back to the saved offset to put it into position to read the next entry. At the very end of the function, we finally read the offset of the next IFD so that we can continue parsing the file if needed.
Writing IFDs is very similar to reading them. There are a few small differences, we need to guarantee that we write complete fields, so there’s code to zero pad if necessary. Also, we have to calculate the correct offset when we see that the values needed can’t fit into the value/offset field. However, the overall structure of the code is very similar to that for reading.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136
A quick note on the structure of the C code shown. It’s possible that we could make the code a bit shorter by basically eliminating all the type checking and just stuffing bytes into our structures. I prefer to go ahead and try to get the type information put down as soon as possible, and maintain it through as much of the code as possible. So the read/write functions tend to be quite long, but I think it’s worth it to have a bit more self-documenting code.
This covers most of what we need to know about NEF files in order to complete our task. As we dig further in, we’ll spend a bit more time dealing with the peculiarities of the data (date/time formats, etc.) but the basic concept of reading, manipulating, and writing IFD entries forms the core of the RAW file processing we need. In the next post, I’ll talk about the NMEA data format for GPS log files, and we’ll take a look at how we can implement the matching of images to locations from a log.