Constant, qualified, and less attemptable

,

Let’s say you’re writing a tokenizer in the C programming language, and your basic function is going to look like this (modelled on standard C library functions like strsep and strtok_r):

/* Return the first token in the string referenced by *stringp, and
   update *stringp to point to the first character after the token */
struct token tokenize(char **stringp);

All goes well until one day you try to tokenize a string via a const char * pointer:

const char *data;
/* ... */
tokenize(&data);

and you get a warning like this:

foo.c:123: warning: passing argument 1 of 'tokenize' from incompatible pointer type

“Ah yes,” you say to yourself, a simple consequence of ISO/IEC 9899:TC3 §6.2.7(1):

Two types have compatible type if their types are the same.

together with §6.7.3(9):

For two qualified types to be compatible, both shall have the identically qualified version of a compatible type

and §6.7.5.1(2):

For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.

So let’s change the type of tokenize so that you can pass a const char ** to it:

struct token tokenize(const char **stringp);

But what about all your old code that passes char **? That’s not going to work any more? Or is it? You dimly remember that there’s a special get-out clause that allows a compiler to automatically convert a pointer to a qualified pointer. §6.3.2.3(2):

For any qualifier q, a pointer to a non-q-qualified type may be converted to a pointer to the q-qualified version of the type

Now const is certainly a qualifier q, so surely char ** may be converted to const char **? After all, you depend on char * being converted to const char * most of the time you call standard C functions like printf and strlen. So let’s try it.

struct token tokenize(const char **stringp);
/* ... */
char *data;
/* ... */
tokenize(&data);

But no, you get the same warning as before.

foo.c:123: warning: passing argument 1 of 'tokenize' from incompatible pointer type

“I see,” you say to yourself, “the type const char ** is not a pointer to a const-qualified type, since it points to const char * which is just an ordinary type, not a const-qualified type. A const-qualified type would be something like const char * const. Elementary.”

Elementary, maybe, but inconvenient, certainly. Why doesn’t the C standard extend its get-out clause so that it works for pointers to pointers to qualified types? The comp.lang.c FAQ points out that if the C standard mandated conversion from char ** to const char ** then the following would compile without a diagnostic:

const char c = 'x';
char *p;
const char **q = &p;
*q = &c;
*p = 'y';

But this invokes undefined behaviour because of §6.7.3(5):

If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.

So what are you going to do about it? Here are some workarounds:

  1. Define two functions, tokenize and tokenize_const. Nightmare! Every time you make a change to one you have to make a change to the other.

  2. Use a macro to define two functions. At least you can maintain them now, but most compilers do a poor job of generating debugging information for macro-generated code. Let’s hope you don’t have to debug them.

  3. In C++, you can use a template (really, this is just a convenient way to express solution 1, generally with better debugger support than solution 2):

    template <class T> struct token tokenize(T stringp);
    
  4. Change the interface to tokenize so that it takes a plain old string pointer and returns the number of characters to advance the pointer by:

    int tokenize(const char *string, struct token *result);
    /* ... */
    char *data;
    struct token result;
    /* ... */
    data += tokenize(data, &result);
    

    (This is like the printf family of functions, except that the later inconveniently overload their return value to return errors as well.)

  5. Change the interface to tokenize so that it takes a plain old string pointer and modifies an integer offset into the string:

    struct token tokenize(const char *string, size_t *index);
    /* ... */
    char *data;
    size_t index = 0;
    /* ... */
    tokenize(data, &index);
    
  6. Just cast the pointer. After all, you know that your tokenize function never invokes undefined behaviour, don’t you?