C metaprogramming: type-generic lists and queues using sys/queue.h
2022-11-27 00:00:00 +0000 UTCWriting C preprocessor macros allow metaprogramming in C. Macros can enable you to define new control structures and treat types as values. Macros allow this using simple text substitution.
However this also leads to complexity. Because the preprocessor just subsitutes text without defining semantics on the text, programs that use macro metaprogramming heavily may compile (or not) for mysterious reasons. Heavy use of macros to define new control structures can obfuscate what would otherwise be simple C code.
In this post I will look at the sys/queue.h library present on many Unix-family OSes as an example of C macro-driven metaprogramming. sys/queue.h defines macros that create and operate on several linked list data structures. I will use TAILQ for my examples, a doubly-linked tail queue.
I am also not a professional systems developer – so take anything I say about C with a grain of salt. I’m writing for my understanding. Hopefully that helps others as well!
Defining a TAILQ: metaprogramming on types and identifier names
sys/queue.h provides two macros to use when implementing a tail queue: TAILQ_HEAD and TAILQ_ENTRY. A simple example use is:
#include <sys/queue.h>
struct item {
int value;
TAILQ_ENTRY(item) items;
};
TAILQ_HEAD(head_struct, item);
We can see here already that both macros accept a type as an argument. We will see in the expanded macro code that this type value must be the name of a struct. But what is head_struct? We haven’t used this name before yet. Let’s look at some (prettified) macro-expanded code to see 1.
struct item {
int value;
struct {
struct item *tqe_next;
struct item **tqe_prev;
} items;
};
struct head_struct {
struct item *tqh_first;
struct item **tqh_last;
};
See what happened with head_struct? The first value passed to TAILQ_HEAD is just the name for a structure type representing the head of our tailqueue. The value is just directly substituted. Similarly the name item was substituted as part of the struct item type for the tqe_next, tqe_prev, tqh_first, and tqh_last pointers.
Let’s look at the definitions for these macros. On my Debian box, the sys/queue.h header is stored at /usr/include/x86_64-linux-gnu/sys/queue.h.
#define _TAILQ_HEAD(name, type, qual) \
struct name { \
qual type *tqh_first; /* first element */ \
qual type *qual *tqh_last; /* addr of last next element */ \
}
#define TAILQ_HEAD(name, type) _TAILQ_HEAD(name, struct type,)
#define _TAILQ_ENTRY(type, qual) \
struct { \
qual type *tqe_next; /* next element */ \
qual type *qual *tqe_prev; /* address of previous next element */\
}
#define TAILQ_ENTRY(type) _TAILQ_ENTRY(struct type,)
Having first seen the code generated by the preprocessor, we can see clearly how the macros defined here work. Using simple text subtitution, higher order logic operating on types and identifier names are defined.
TAILQ_FOREACH: macro-defined control structures
C preprocessor macros also allow the definition of type-generic control structures. An example is TAILQ_FOREACH. Here is a code snippet:
struct head_struct* headp = TAILQ_HEAD_INIT(headp);
/* insert some items, etc ... */
struct item *iter;
TAILQ_FOREACH(iter, headp, items)
printf("item value: %d\n", iter->value);
This macro provides syntax similar to the iterator or foreach concept in other languages, like this Python snippet:
for item in items:
print(f"item value: {item.value}")
C does not have this kind of construct natively. What is being generated? Here’s a snippet:
for ((iter) = ((headp)->tqh_first);
(iter);
(iter) = ((iter)->items.tqe_next))
printf("item value: %d\n", iter->value);
This macro takes advantage of the humble for loop. Note that a single line statement can be paired with the TAILQ_FOREACH macro just like with the native for. A pair of brackets can also be used to allow multi-line statements to be ran against each element of the list.
This particular genre of metaprogramming helps to avoid a key type of error: the off-by-one. While a from-scratch linked list traversal should be in your repertoire, we’re all guilty of programming at less than 100% focus. Standard, easy to use macros for these operations removes the need to squint at pointer operations each and every time you write traversal code.
Conclusion
The sys/queue.h library includes many more macros for interacting with a TAILQ, including head & tail insertions, access method macros, and more. Each macro performs its assigned action correctly, maintaining the state of the pointers that make up the linked list. The use of standardized traversal functions greatly reduces the probability of off-by-one errors and can make reading code using these data structures easier.
However using metaprogramming and macros like these does have some risks. For one, rolling your own might make your codebase more confusing instead of less. If the abstractions you implement through metaprogramming are nonstandard or idiosyncratic someone looking back at the code could be confused.
Additionally implementation details are hidden from the consumer. In many cases this is good – the API defined by the macros is enough and using that API ensures the structure retains its integrity. But sometimes you may need to peek under the hood and digging through manpages and library code can be daunting.
This type of macro-based metaprogramming is all over large C codebases. Some examples for further reading are:
- sys/queue.h
- link to the glibc version of
sys/queue.h
- link to the glibc version of
- the BSD sys/tree.h
- Linux kernel linked list API
Footnotes
To run just the preprocessor, you can use the
-Eflag ongcc. On my machine I use this script to generate the intermediate file, clean it up, and start styling it for easier reading:#/bin/sh gcc -E -o file1.i file1.c sed -i '/^#/d' file1.i astyle --style=java file1.i