Authorship in Binaries
The Laboratorium 2017-01-19
Summary:
Personality always contains something unique. It expresses its singularity even in handwriting, and a very modest grade of art has in it something irreducible, which is one man’s alone. That something he may copyright unless there is a restriction in the words of the act.
Bleistein v. Donaldson Lithographing Co., 188 U.S. 239, 250 (1903) (Holmes, J.)
Previous work shows that coding style is quite prevalent in source code. We were surprised to find out that coding style is preserved to a great degree even in compiled source code. We can de-anonymize programmers from compiled source code with great accuracy and furthermore we can de-anonymize programmers from source code compiled with optimization.
Aylin Caliskan-Islam, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan, When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries, at 13.
These new results have important implications for privacy and security. But they also put a new and slightly surprising spin on debates over software copyright. If programmers have distinctive styles that are recognizable across programs and are present even in optimized executable binaries stripped of symbol information, then the argument that software lacks expressive content is a little weaker. Traces of authorial personality survive, like bacteria, even in unlikely environments.