What do people mean when they say "transpiler"?
composition.al 2017-08-26
A compiler is a computer program that takes a program as input, does some sort of transformation on it, and returns another program. Most of the time, we think of a compiler as something that translates code from a higher-level language (such as, for instance, Rust, Racket, or JavaScript) to a lower-level language (such as, for instance, LLVM IR or x86-64 assembly), but this doesn’t necessarily have to be the case.
Around 2012 or so, it started to become fashionable to use the word “transpiler” for certain kinds of compilers. When people say “transpiler”, what kind of compiler do they mean?
According to Wikipedia, a transpiler is a compiler that “translates between programming languages that operate at approximately the same level of abstraction”. JavaScript is a popular target language for these kinds of compilers, because every mainstream web browser can now execute JavaScript efficiently (and is getting better at it all the time, thanks to significant investment from browser vendors and fierce competition among them). Lots of people want their code to run in every mainstream web browser, and not all of those people want to write JavaScript. Hence we have compilers that transform code from various other high-level input languages to JavaScript.1
Here’s a longer excerpt from the Wikipedia “source-to-source compiler” page (to which the “transpiler” page redirects):
A source-to-source compiler, transcompiler or transpiler is a type of compiler that takes the source code of a program written in one programming language as its input and produces the equivalent source code in another programming language. A source-to-source compiler translates between programming languages that operate at approximately the same level of abstraction, while a traditional compiler translates from a higher level programming language to a lower level programming language.
Interestingly, this definition says that “transpiler” and “source-to-source compiler” are synonyms: it says that they both mean a compiler with input and target languages that are approximately the same level of abstraction. But a “source-to-source compiler” sounds like it ought to be something more specific than that. It sounds like a compiler for which the input and target languages not only operate at similar levels of abstraction, but also both operate at high levels of abstraction (because a language that we’d call a “source” language is presumably high-level). This definition coincides with “translates between programming languages that operate at approximately the same level of abstraction” only if we assume that all compilers have a high-level input language.
There are other ways of defining “transpiler” that overlap with the Wikipedia definition, but don’t coincide with it. Some people seem to use “transpiler” to mean “compiler with a high-level target language”, without saying anything in particular about the level of abstraction of the input language. For instance, Emscripten — which compiles LLVM bitcode, a relatively low-level language, to JavaScript, a relatively high-level language — is often called a transpiler. In fact, Wikipedia even cites Emscripten as an example of a transpiler, even though it contradicts the “approximately the same level of abstraction” part of the Wikipedia definition.
In other contexts, “transpiler” might imply “compiler with relatively human-readable output”, which is yet another overlapping-but-different definition. This definition would rule out Emscripten, because although Emscripten’s output is JavaScript, readability of the output isn’t a priority. TypeScript and CoffeeScript, on the other hand, are high-level, JavaScript-like languages that get compiled to JavaScript by a compiler that tries to more or less preserve readability, and Wikipedia calls them transpiled languages, too.
So, we’ve got several overlapping-but-different definitions for “transpiler”:
- compiler that translates between languages that operate at similar levels of abstraction (the Wikipedia definition)
- compiler with high-level languages as its input and target languages (which is what I think “source-to-source compiler” sounds like)
- compiler that targets a high-level language (examples include Emscripten, TypeScript, and CoffeeScript)
- compiler that produces readable output in a high-level target language (examples include TypeScript and CoffeeScript, but not Emscripten)
In a recent conversation with a friend, I learned about yet another connotation that “transpiler” might have. Compiling from a high-level input language all the way to assembly removes certain opportunities for language interoperability; if you want to link together components that are written in different input languages, then it can be easier to do that if the input languages’ compilers both target the same intermediate language. (Microsoft’s Common Intermediate Language is an example of an intermediate language that compilers for various input languages target, and that was designed with language interoperability in mind.) It’s useful to have a word for compilers that make a point of making interoperability easy in this way, and for my friend, the word “transpiler” serves that purpose. So we have yet another overlapping-but-not-quite-coinciding definition:
- compiler that targets a relatively high-level language for the purpose of facilitating interoperability between different input languages
I don’t think many people use “transpiler” to mean this, but at least one language implementer that I know does.
By now, I hope it’s clear that “transpiler” means different things to different people, and that the definitions are fuzzy and have weird edge cases. Furthermore, even if two people agree that “transpiler” means, say, “compiler that targets a high-level language” or “compiler with input and target languages at similar levels of abstraction”, there’s still room for misunderstanding: “high-level” and “low-level” are relative, and whether or not any given two languages operate at “similar levels of abstraction” isn’t necessarily a cut-and-dried matter. There isn’t a total ordering on languages by level of abstraction. Language constructs that operate at different levels of abstraction can coexist in the same language, making it simultaneously “higher-level” and “lower-level” than another language that has only language constructs that are somewhere in between.
When I wrote the first, unpublished draft of this post a few years ago, it was called “Stop saying ‘transpiler’”. I decided not to call it that, because I’m trying to lay off the prescriptivism (and I have a policy against writing blog posts with “stop doing X” titles now, anyway); I understand that people are using “transpiler” because, for them, it describes something that it’s convenient to have a word for. I do think, though, that before saying “transpiler”, it’s useful to think about what exactly one means by it, and whether the people one is communicating with are going to agree with that meaning. For me, any convenience that might be gained by saying “transpiler” is rarely worth the potential for confusion.
-
For these reasons, JavaScript has been called “the assembly language of the web”. However, it’s worth mentioning that many of the people who pioneered the use of JavaScript in this way have been working on WebAssembly, an emerging standard that defines, among other things, a binary format that compilers can target and that will hopefully enjoy widespread browser support. This is not to say, of course, that JavaScript is going away any time soon. ↩