Printf Format Highlighting

Most of my work is done in either C or C++, so I frequently use the printf family of functions to write a wide variety of stuff to the display. Naturally, I'd like the format specifiers to stand out from the rest of the characters in the strings, so for the longest time I used this piece of elisp code to achieve that:

(defface font-lock-format-specifier-face
  '((t (:foreground "OrangeRed1")))
  "Font-lock face used to highlight printf format specifiers."
  :group 'font-lock-faces)

(defun my-cc-mode-common-hook ()
  "Setup common utilities for all C-like modes."
  (font-lock-add-keywords
   nil
   '(("[^%]\\(%\\([[:digit:]]+\\$\\)?[-+' #0*]*\\([[:digit:]]*\\|\\*\\|\\*[[:digit:]]+\\$\\)\\(\\.\\([[:digit:]]*\\|\\*\\|\\*[[:digit:]]+\\$\\)\\)?\\([hlLjzt]\\|ll\\|hh\\)?\\([aAbdiuoxXDOUfFeEgGcCsSpn]\\|\\[\\^?.[^]]*\\]\\)\\)"
      1 'font-lock-format-specifier-face prepend))))

(add-hook 'c-mode-common-hook #'my-cc-mode-common-hook)

(shamefully taken and modified from Emacswiki)

While this regular expression works just fine to highlight the printf specifiers, it has some problems. Except for the non-readability of the actual regexp of course.

As an example, take this silly C code:

#include <stdio.h>

int main()
{
    printf("%%");   // Should not be highlighted.
    printf("%d", 0);
    printf("%s%d%s", "hel", 1, "o");
    printf("%s\n", "testing...");
    printf("This should be fine: %-20d, %+20lld\n", 10, 10000ll);
    printf("More tests % '2.2f\n", 1.5);
    printf("Star tests... %'*.2f\n", 10, 10001.5);
    printf("testing...%d\n", 10);
    printf("POSIX Extension: %1$u 0x%1$08x\n", 0xDEADBEEF);
    printf("POSIX star tests: %2$*1$d\n", 10, 0x20);

    // Handle scanf specific stuff.
    char buf[2000];
    scanf("test... %[d]\n", buf); // Scanf bracket style.

    // Don't highlight inside comments: 10 % d.

    // This shouldn't be highlighted either:
    int d = 2;
    return 10 % d;
}

The above elisp snippet will erroneously highlight 10 % d in both the comment and in the actual code. I found this distracting, but I never put in the effort to fix it since it worked well enough in most cases.

Recently though, I started to clean up my cc-mode configuration and in the midst of that I stumbled upon a question on Emacs stack-overflow regarding highlighting SQL keywords only inside strings. I wondered, could I apply this to the printf specifiers?

Just a short while later I arrived at this:

(defface font-lock-format-specifier-face
  '((t . (:inherit font-lock-regexp-grouping-backslash
         :foreground "OrangeRed1")))
  "Font-lock face used to highlight printf format specifiers."
  :group 'font-lock-faces)

(defvar printf-fmt-regexp
  (concat "\\(%"
          "\\([[:digit:]]+\\$\\)?"   ; Posix argument position extension.
          "[-+' #0*]*"
          "\\(?:[[:digit:]]*\\|\\*\\|\\*[[:digit:]]+\\$\\)"
          "\\(?:\\.\\(?:[[:digit:]]*\\|\\*\\|\\*[[:digit:]]+\\$\\)\\)?"
          "\\(?:[hlLjzt]\\|ll\\|hh\\)?"
          "\\(?:[aAbdiuoxXDOUfFeEgGcCsSpn]\\|\\[\\^?.[^]]*\\]\\)\\)")
  "Regular expression to capture all possible `printf' formats in C/C++.")

(defun printf-fmt-matcher (end)
  "Search for `printf' format specifiers within strings up to END."
  (let ((pos)
        (case-fold-search nil))
    (while (and (setq pos (re-search-forward printf-fmt-regexp end t))
                (null (nth 3 (syntax-ppss pos)))))
    pos))

(defun my-cc-mode-common-hook ()
  "Setup common utilities for all C-like modes."
  (font-lock-add-keywords
   nil
   '((printf-fmt-matcher (0 'font-lock-format-specifier-face prepend)))))

(add-hook 'c-mode-common-hook #'my-cc-mode-common-hook)

Not only is the code a lot cleaner and readable now, but it actually works perfectly! 'Specifiers' outside of strings are no longer erroneously highlighted! In fact, this seems useful enough to create a proper minor mode for… To be continued!