Cataclysmic Mutation

Machine Learning and Whatever Else

Geotagging in C (Part 1: NEF Files)

I actually did the work that this post will discuss a while ago, but I never documented it anywhere, and it seemed like a good exercise to describe in a bit more detail. What I’m talking about is how to add GPS information to photos taken with a Nikon digital camera, particularly if you shoot in raw mode. There are a few cameras or attachments available that can do this automatically, but the more common scenario is that you have a separate device that tracks your location as you walk around taking photos, and then you run a special program that uses the timestamps on your photos to match up the appropriate locations and embed them into your files. This post (and probably the next couple too) are describing how to write such software yourself.

Before we even get started, I should be clear that if you’re a photographer looking for a tool to geotag your images, you probably should be looking elsewhere. There are lots of good programs available with nice graphical interfaces that will let you do this pretty easily, and they’ll likely be more reliable as well. I’m on a Mac, and HoudahGEO seems to be a really good one that does all you need and more. For $30, it will almost certainly be a better use of your time than trying to figure out what I’m talking about here.

However, if you’re a hackery sort of chap and would like to see an example of how you’d write the core logic of this sort of application yourself, then grab your trusty C compiler and follow along after the fold.

OK, first of all, if you’re interested in actually putting any of this stuff to use, there are a couple of things you’re going to need. First, a Nikon camera capable of shooting RAW files in Nikon’s NEF format. Every Nikon camera shoots in a RAW file format that is potentially different from any other. You can register for Nikon’s SDK and get all the technical specs for your camera, but in general, the differences are pretty minor, and I suspect that the code I’ll be showing will work without modification on just about any Nikon DSLR. However, it’s only been tested on my D90, so your mileage may vary.

You’re also going to need something that can record a GPS log that you can match up with later. I purchased the AMOD AGL3080 logger from Amazon. For my purposes, it was important that the logger simply show up as a USB storage device when I plugged it into a computer, as I didn’t want to have to deal with some crappy third party driver in order to make it work on my Mac or Linux machines. The AMOD fit the bill, and I can recommend it pretty highly – it’s worked fine for me in the year or so I’ve had it.

GPS logs can come in several varieties. Probably the most common format is GPX, which is natively supported by a number of tools. GPX is an XML schema for GPS information, with all the good and bad that that entails. Unfortunately (or fortunately if you think XML is a train wreck), the AMOD unit does not generate GPX files. Instead, it produces logs in a format called NMEA. NMEA is basically CSV formatted data, which is nice, but has a number of quirks that make parsing it a little cumbersome. I’ll get to more of that later, probably in the next posting.

And finally, of course, you’re going to need a C compiler. You shouldn’t need much else, as the code should be quite portable. As this was mostly done for fun, I’m didn’t use any libraries outside of the basic C standard library. The code has been tested on several versions of GCC on 64-bit Snow Leopard and both 32 and 64-bit Linux, but I make no guarantees of any sort. There are bugs, and I run into one every now and then, but I do use the program myself, and it does (mostly) work. However, do not under any circumstances run the program on the only copy of an image you have. My usual work flow is to import my images into Aperture, run the tool on the NEF files, tell Aperture to update it’s metadata from the files, and only after I’ve verified nothing bad happened do I format the memory card in my camera.

NEF File structure

When you tell your Nikon DSLR to capture a file in raw mode, what you get is an NEF file. NEF files are basically TIFF files with certain characteristics predefined. This means that the core of our photo manipulation code is going to be TIFF code, and you can probably find a library or 30 out there that will make this process pretty painless. We aren’t going to take that route. Instead, we’re going to write our own TIFF handling code from scratch. It won’t be a complete library, but it will have enough functionality to do what we need, which is to pull the timestamp from the image and write back GPS coordinate information. The TIFF specification (PDF) is enormously useful in understanding what the files look like under the hood.

The overall structure of a TIFF file, and therefore of an NEF file as well, is pretty simple. You have a short header sequence that identifies the type of file followed by a set of what are called Image File Directories (IFDs) which contain a wealth of information about the image, finally followed by the actual image data. In the case of NEF files, this image data is extremely complex. If we were to write our own tool to interpret that data, we’d need to perform the demosaicing step that software like Aperture or Adobe Camera Raw can do. Fortunately, we don’t need to touch the image data at all – everything we need is stored in the IFDs. So what is an IFD and what does it look like?

An IFD is simply a list of entries, each of which has a particular structure. Typically, the entries are grouped together under an IFD based on their content. So one IFD might contain information about the exposure settings the image was shot with while another contains GPS settings. The metadata that we’re interested in that is stored in these IFDs is called EXIF data. EXIF defines a set of standard information related to digital photos, and among that information (technically in an extension to EXIF) are a set of features regarding GPS information. The particular IFDs and what they contain are described in Nikon’s file format documentation. All IFDs look the same: they are 12 bytes long and laid out as follows.

Bytes Field
0-1 Tag ID
2-3 Field Type
4-7 Number of Values
8-11 Value/Offset


There are a few things we need to say about this structure. First, if it wasn’t already obvious, TIFF files are written in binary, so we’re going to be reading and writing things in terms of bytes. The Tag ID field is two bytes, and the interpretation of these bytes is defined by the TIFF and/or NEF file specifications. NEF generally honors the underlying TIFF spec, but there are some extensions present as well. For our purposes though, the tag IDs will be standard TIFF IDs and they tell you what field you’re looking at. The field type consists of another two bytes, and again, the TIFF spec defines the interpretation of those bytes. Basically though, the field type just defines the data type for the particular bit of information (integer, real, string, etc.). We can define some constants to map those bytes onto human readable names as follows. `

1
2
3
4
5
6
7
8
9
10
11
12
13
/* TIFF data types */
#define BYTE 1
#define ASCII 2
#define SHORT 3
#define LONG 4
#define RATIONAL 5
#define SBYTE 6
#define UNDEFINED 7
#define SSHORT 8
#define SLONG 9
#define SRATIONAL 10
#define FLOAT 11
#define DOUBLE 12

Each of these types can be mapped onto a machine type, with the exception of the RATIONAL and SRATIONAL (signed rational) types. For those, we need to define a helper structure. `

1
2
3
4
typedef struct {
    unsigned int32 numerator;
    unsigned int32 denominator;
} rational_t;

We’re also going to need, at several points, to know the sizes of the various TIFF data types. Because each type gets a unique integer ID arranged sequentially, it’s a simple matter to create a global array indexing from each type to the size in bytes for that type. Note that no type has ID 0 in the TIFF spec, so we insert a leading dummy value to make the others line up with the ID given in each #define above. (Note that the sizes are defined by the TIFF specification.) `

1
2
/* size in bytes of each of the TIFF data types */
unsigned int type_bytes[13] = {0, 1, 1, 2, 4, 8, 1, 1, 2, 4, 8, 4, 8};

The third entry, the number of values, is fairly self explanatory. It simply defines how many values will be present for that field. A 20-byte ASCII string might be represented with a data type of “byte” and number of values equal to 20, for instance. The final field, the “value/offset” field is a bit more complicated. The basic idea is that this field contains the “value” for the IFD entry, so if the Tag ID is 0×0101 (ImageLength), then the value/offset field will contain the number of rows of pixels in the image. However, as we only have four bytes available, it will often be the case that the data needed to describe the given tag can’t be stored directly in the value/offset field. In this case, the value/offset field will contain a 32-bit offset into the file where the data for this tag is stored.

With this in mind, the overall structure of each IFD can now be described. Each IFD contains a 2-byte header denoting the number of directory entries (fields) in the IFD. Let’s call that number n. The header is then followed by n 12-byte IFD entries, and finally by a four byte offset to the next IFD in the file, or 0 if there are no remaining IFDs. We can describe this structure in C as follows. `

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
typedef struct {
    unsigned int16 tag;
    unsigned int16 type;
    unsigned int32 count;
    union {
        unsigned byte* byte_values;
        byte* sbyte_values;
        unsigned int16* uint16_values;
        int16* int16_values;
        unsigned int32* uint32_values;
        int32* int32_values;
        float32* float32_values;
        float64* float64_values;
        rational_t* rational_values;
    };
} direntry_t;

typedef struct {
    unsigned int16 count;
    direntry_t* dirs;
    unsigned int32 next_offset;
} ifd_t;

We have one remaining bit of information we need before we can start reading TIFF files, and that is the header information. Each TIFF file begins with a 2-byte sequence, either 0×4949 or 0x4D4D. The former tells us that the file stores the remaining information in little-endian format, the latter denotes big-endian. Bytes 2-3 of any TIFF file contain the “magic number” for TIFF, decimal 42. Bytes 4-7 then contain the offset to the beginning of the first IFD in the file. With this, we can begin to parse information from TIFF files. (Note the global variable byte_order, which will be needed to correctly parse multibyte values from the file.) `

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
/* TIFF header constants */
#define TIFF_BIG_ENDIAN 0x4d4d
#define TIFF_LITTLE_ENDIAN 0x4949
#define TIFF_MAGIC 42

/*
 * check the magic bytes at the beginning of the file to make sure it's
 * really a tiff file, and set the byte ordering in use in the file
 */
int valid_tiff_file(FILE* f)
{
    /* assume that file pointer is at offset 0 */
    unsigned int16 magic_number = 0;

    fread(&byte_order, 1, 2, f);
    magic_number = read_uint16(f);

    if(byte_order != TIFF_LITTLE_ENDIAN && byte_order != TIFF_BIG_ENDIAN) {
        fprintf(stderr, "invalid byte ordering %xn", byte_order);
        return 0;
    }
    if(magic_number != TIFF_MAGIC) {
        fprintf(stderr, "magic number %d (%x) not valid for tiff filen",
                magic_number, magic_number);
        return 0;
    }
    return 1;
}

With all this setup in place, the code to populate an IFD is a bit long, but very straightforward. `

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
//*
 * load an ifd_t from a given tiff file
 * reads all the direntry_t blocks and sets up the next_offset pointer
 * for the block
 */
void ifd_load(FILE* f, ifd_t* ifd)
{
    /* must be called when the file pointer is at the beginning of the IFD */
    int i, j;
    memset(ifd, 0, sizeof(ifd_t));

    /* get the number of directory entries */
    ifd->count = read_uint16(f);
    ifd->dirs = (direntry_t*)malloc(ifd->count * sizeof(direntry_t));

    /* read each directory entry */
    for(i=0; i<ifd->count; ++i) {
        ifd->dirs[i].tag = read_uint16(f);
        ifd->dirs[i].type = read_uint16(f);
        ifd->dirs[i].count = read_uint32(f);
        switch(ifd->dirs[i].type) {
        case BYTE:
        case ASCII:
        case UNDEFINED:
            ifd->dirs[i].byte_values = (unsigned byte*)malloc(ifd->dirs[i].count * sizeof(byte));
            break;
        case SBYTE:
            ifd->dirs[i].sbyte_values = (byte*)malloc(ifd->dirs[i].count * sizeof(byte));
            break;
        case SHORT:
            ifd->dirs[i].uint16_values = (unsigned int16*)malloc(ifd->dirs[i].count * sizeof(int16));
            break;
        case SSHORT:
            ifd->dirs[i].int16_values = (int16*)malloc(ifd->dirs[i].count * sizeof(int16));
            break;
        case LONG:
            ifd->dirs[i].uint32_values = (unsigned int32*)malloc(ifd->dirs[i].count * sizeof(int32));
            break;
        case SLONG:
            ifd->dirs[i].int32_values = (int32*)malloc(ifd->dirs[i].count * sizeof(int32));
            break;
        case FLOAT:
            ifd->dirs[i].float32_values = (float32*)malloc(ifd->dirs[i].count * sizeof(float32));
            break;
        case DOUBLE:
            ifd->dirs[i].float64_values = (float64*)malloc(ifd->dirs[i].count * sizeof(float64));
            break;
        case RATIONAL:
            ifd->dirs[i].rational_values = (rational_t*)malloc(ifd->dirs[i].count * sizeof(rational_t));
            break;
        }

        if(ifd->dirs[i].count * type_bytes[ifd->dirs[i].type] <= 4) {
            /* value fits entirely in the 4 byte value offset field */
            unsigned int bytes_read = 0;
            for(j=0; j<ifd->dirs[i].count; ++j)
            {
                bytes_read+=type_bytes[ifd->dirs[i].type];
                switch(ifd->dirs[i].type) {
                case BYTE:
                case ASCII:
                case UNDEFINED:
                    ifd->dirs[i].byte_values[j] = read_byte(f);
                    break;
                case SBYTE:
                    ifd->dirs[i].sbyte_values[j] = read_byte(f);
                    break;
                case SHORT:
                    ifd->dirs[i].uint16_values[j] = read_uint16(f);
                    break;
                case SSHORT:
                    ifd->dirs[i].int16_values[j] = read_int16(f);
                    break;
                case LONG:
                    ifd->dirs[i].uint32_values[j] = read_uint32(f);
                    break;
                case SLONG:
                    ifd->dirs[i].int32_values[j] = read_int32(f);
                    break;
                case FLOAT:
                    ifd->dirs[i].float32_values[j] = read_float32(f);
                    break;
                default:
                    fprintf(stderr, "can't fit specified type '%d' into value offset fieldn",
                            ifd->dirs[i].type);
                    break;
                }
            }
            for(; bytes_read<4; ++bytes_read) {
                read_byte(f);
            }
        } else {
            int p;

            /* read the offset where the data is stored */
            unsigned int32 dataloc = read_uint32(f);

            /* save the current offset */
            unsigned int32 cpos = ftell(f);

            /* jump to the data */
            fseek(f, dataloc, SEEK_SET);

            /* read it */
            for(p=0; p<ifd->dirs[i].count; ++p) {
                switch(ifd->dirs[i].type) {
                case BYTE:
                case ASCII:
                case UNDEFINED:
                    ifd->dirs[i].byte_values[p] = read_byte(f);
                    break;
                case SBYTE:
                    ifd->dirs[i].sbyte_values[p] = read_byte(f);
                    break;
                case SHORT:
                    ifd->dirs[i].uint16_values[p] = read_uint16(f);
                    break;
                case SSHORT:
                    ifd->dirs[i].int16_values[p] = read_int16(f);
                    break;
                case LONG:
                    ifd->dirs[i].uint32_values[p] = read_uint32(f);
                    break;
                case SLONG:
                    ifd->dirs[i].int32_values[p] = read_int32(f);
                    break;
                case FLOAT:
                    ifd->dirs[i].float32_values[p] = read_float32(f);
                    break;
                case DOUBLE:
                    ifd->dirs[i].float64_values[p] = read_float64(f);
                    break;
                case RATIONAL:
                    ifd->dirs[i].rational_values[p].numerator = read_uint32(f);
                    ifd->dirs[i].rational_values[p].denominator = read_uint32(f);
                    break;
                }
            }

            /* and jump back */
            fseek(f, cpos, SEEK_SET);
        }
    }

    /* read the next ifd offset */
    ifd->next_offset = read_uint32(f);
}

We have a few utility functions to read the various data types, taking byte order into consideration. Those functions haven’t been shown here, but they’re easy enough to derive as needed, or you can find them in the source code available for download. But otherwise, the code is quite simple. It reads the number of IFD entries, mallocs the appropriate amount of space, and then for each entry, reads the tag, type, and value count, and mallocs the memory needed to store the values. It then checks to see if the number of values needed of the given type can fit into the value/offset field. If so, it reads them directly. If not, it reads the value/offset field as an offset, saves the current location, seeks to the offset, reads the correct number of values, and then seeks back to the saved offset to put it into position to read the next entry. At the very end of the function, we finally read the offset of the next IFD so that we can continue parsing the file if needed.

Writing IFDs is very similar to reading them. There are a few small differences, we need to guarantee that we write complete fields, so there’s code to zero pad if necessary. Also, we have to calculate the correct offset when we see that the values needed can’t fit into the value/offset field. However, the overall structure of the code is very similar to that for reading.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
/*
 * write a new ifd_t block into the given tiff file
 */
void ifd_write(FILE* f, ifd_t* ifd)
{
    /* ifd is the GPSInfoIFD structure; file pointer of f must be positioned
     * to the location of the gps_info pointer
     */
    unsigned int16 i, j;
    unsigned int32 ifd_block_size;
    unsigned int32 ifd_value_offset;
    unsigned int value_bytes_written = 0;

    /* total block is a two byte header determining count of directory entries,
     * 12*n bytes for all n directories, and a 4 byte pointer to the next block */
    ifd_block_size = 2 + 12*ifd->count + 4;
    ifd_value_offset = ftell(f)+ifd_block_size;

    /* write the number of directory entries in the gps info section */
    write_uint16(f, ifd->count);

    /* write each directory */
    for(i=0; i<ifd->count; ++i) {
        write_uint16(f, ifd->dirs[i].tag);
        write_uint16(f, ifd->dirs[i].type);
        write_uint32(f, ifd->dirs[i].count);

        if(ifd->dirs[i].count * type_bytes[ifd->dirs[i].type] <= 4) {
            /* can write the data directly into the value offset field */
            unsigned int bytes_written = 0;
            for(j=0; j<ifd->dirs[i].count; ++j) {
                bytes_written+=type_bytes[ifd->dirs[i].type];
                switch(ifd->dirs[i].type) {
                case BYTE:
                case ASCII:
                case UNDEFINED:
                    write_byte(f, ifd->dirs[i].byte_values[j]);
                    break;
                case SBYTE:
                    write_sbyte(f, ifd->dirs[i].sbyte_values[j]);
                    break;
                case SHORT:
                    write_uint16(f, ifd->dirs[i].uint16_values[j]);
                    break;
                case SSHORT:
                    write_int16(f, ifd->dirs[i].int16_values[j]);
                    break;
                case LONG:
                    write_uint32(f, ifd->dirs[i].uint32_values[j]);
                    break;
                case SLONG:
                    write_int32(f, ifd->dirs[i].int32_values[j]);
                    break;
                case FLOAT:
                    write_float32(f, ifd->dirs[i].float32_values[j]);
                    break;
                default:
                    fprintf(stderr, "attempt to write impossible type '%d' into "
                            "value offset fieldn", ifd->dirs[i].type);
                    break;
                }
            }
            /* make sure we have written an even number of bytes */
            if(bytes_written % 2 == 1) {
                unsigned byte c = 0;
                write_byte(f, c);
                ++bytes_written;
            }
            /* make sure we fill out the value offset field completely */
            for(; bytes_written<4; ++bytes_written) {
                unsigned byte c = 0;
                write_byte(f, c);
            }
        } else {
            /* must write a pointer to where the data will be written */
            unsigned int32 cpos;
            unsigned int32 pos = ifd_value_offset + value_bytes_written;
            write_uint32(f, pos);
            cpos = ftell(f);

            /* now go to that location and write the data */
            fseek(f, pos, SEEK_SET);
            for(j=0; j<ifd->dirs[i].count; ++j) {
                value_bytes_written+=type_bytes[ifd->dirs[i].type];
                switch(ifd->dirs[i].type) {
                case BYTE:
                case ASCII:
                case UNDEFINED:
                    write_byte(f, ifd->dirs[i].byte_values[j]);
                    break;
                case SBYTE:
                    write_sbyte(f, ifd->dirs[i].sbyte_values[j]);
                    break;
                case SHORT:
                    write_uint16(f, ifd->dirs[i].uint16_values[j]);
                    break;
                case SSHORT:
                    write_int16(f, ifd->dirs[i].int16_values[j]);
                    break;
                case LONG:
                    write_uint32(f, ifd->dirs[i].uint32_values[j]);
                    break;
                case SLONG:
                    write_int32(f, ifd->dirs[i].int32_values[j]);
                    break;
                case FLOAT:
                    write_float32(f, ifd->dirs[i].float32_values[j]);
                    break;
                case DOUBLE:
                    write_float64(f, ifd->dirs[i].float64_values[j]);
                    break;
                case RATIONAL:
                    write_uint32(f, ifd->dirs[i].rational_values[j].numerator);
                    write_uint32(f, ifd->dirs[i].rational_values[j].denominator);
                    break;
                default:
                    fprintf(stderr, "attempt to write impossible type '%d' into "
                            "value offset fieldn", ifd->dirs[i].type);
                    break;
                }
            }
            /* make sure we have written an even number of bytes */
            if(value_bytes_written % 2 == 1) {
                unsigned byte c = 0;
                write_byte(f, c);
                ++value_bytes_written;
            }

            /* now jump back to where we were in the file */
            fseek(f, cpos, SEEK_SET);
        }
    }

    /* and finally, write the offset to the next ifd */
    write_uint32(f, ifd->next_offset);
}

A quick note on the structure of the C code shown. It’s possible that we could make the code a bit shorter by basically eliminating all the type checking and just stuffing bytes into our structures. I prefer to go ahead and try to get the type information put down as soon as possible, and maintain it through as much of the code as possible. So the read/write functions tend to be quite long, but I think it’s worth it to have a bit more self-documenting code.

This covers most of what we need to know about NEF files in order to complete our task. As we dig further in, we’ll spend a bit more time dealing with the peculiarities of the data (date/time formats, etc.) but the basic concept of reading, manipulating, and writing IFD entries forms the core of the RAW file processing we need. In the next post, I’ll talk about the NMEA data format for GPS log files, and we’ll take a look at how we can implement the matching of images to locations from a log.