One-liner to troubleshoot LaTeX references

The Endeavour 2024-05-28

In LaTeX, sections are labeled with commands like \label{foo} and referenced like \ref{foo}. Referring to sections by labels rather than hard-coded numbers allows references to automatically update when sections are inserted, deleted, or rearranged.

For every reference there ought to be a label. A label without a corresponding reference is fine, though it might be a mistake. If you have a reference with no corresponding label, and one label without a reference, there’s a good chance the reference is a typo variation on the unreferenced label.

We’ll build up a one-liner for comparing labels and references. We’ll use grep to find patterns that look like labels by searching for label{ followed by any string of letters up to but not including a closing brace. We don’t want the label{ part, just what follows it, so we’ll use look-behind syntax, to exclude it from the match.

Here’s our regular expression:

    (?<=label{)[^}]+

We’re using Perl-style look-behind syntax, so we’ll need to give grep the -P option. Also, we only want the match itself, not matching lines, so we’ll also using the -o option. This will print all the labels:

    grep -oP '(?<=label{)[^}]+' foo.tex

The regex for finding references is the same with label replaced with ref.

To compare the list of labels and the list of references, we’ll use the comm command. For more on comm, see Set theory at the command line.

We could save the labels to a file, save the references to a file, and run comm on the two files. But we’re more interested in the differences between the two lists than the two lists, so we could pass both as streams to comm using the <(...) syntax. Finally, comm assumes its inputs are sorted so we pipe the output of both grep commands to sort.

Here’s our one-liner

    comm -12 <(grep -oP '(?<=label{)[^}]+' foo.tex | sort) 
             <(grep -oP '(?<=ref{)[^}]+' foo.tex | sort)

This will produce three sections of output: labels which are not references, references which not labels, and labels that are also references.

If you just want to see references that don’t refer to a label, give comm the option -13. This suppresses the first and third sections of output, leaving only the second section, references that are not labels.

You can also add a -u option (u for unique) to the calls to sort to suppress multiple instances of the same label or same reference.

The post One-liner to troubleshoot LaTeX references first appeared on John D. Cook.