| ⭅ Previous (Programming Language Complexity) |
Comparing Scripting Language Speed
By building a programming language interpretter
Ever wonder how fast your favorite programming language is? In this article we’ll build a small programming language interpretter using several popular languages, including Javascript, Python, Ruby, Lua, and C.
Since many of these programming languages are themselves interpretted, we’ll also be able to better understand what these languages do or don’t do that allow them to run programs quickly.
What is an interpretter
A programming language interpretter is essentially a simple program with the following outline:
1. Find what command we're supposed to do next
2. Do that thing
3. Set ourselves up to run the next thing
4. Are we done? if not go back to 1.
And that’s really all there is to it. We’ll get a deeper understanding by implementing a tiny scripting language next.
A simple machine: The Turing Machine
We’ll look at a scripting language that very closely mirrors the Turing Machine, originally conceived by Alan Turing. In case you are unfamiliar, the Turing Machine is the idea that provides the foundation for all universal computing. That is, any machine that can implement a Turing Machine can implement algorithms for anything we know how to compute.
The original Turing machine was a theoretical device used for thinking about computation. This version below is a concrete version of it, which can be implemented simply.
The language involves manipulating values along a tape. The tape is a collection of cells, into which an 8 bit number can be written. The machine has a current position along this tape, a position into its program, and can read/write values at the current position. While the theoretical Turing Machine has an infinite tape, we can run nearly every program of interest with 30,000 cells on our tape.
This language has only 8 commands:
1. + : Increase the value at our current position
2. - : Decrease the value at our current position
3. < : Move left by 1
4. > : Move right by 1
5. . : Print the character based on the ascii value of the tape at this position
6. , : Read one character and write at our current position (ignored)
7. [ : If the value at our current position is 0, jump to the matching ] in the program.
8. ] : If the value at our current position is not 0, jump to the matching [ in the program.
Lets take a look at basic implementation in Python, which will hopefully resolve any confusion.
import sys
def run(code):
cells = [0] * 30000
cell_ptr = 0
code_ptr = 0
stack = []
jumps = {}
# determine jump destinations
for i in range(len(code)):
if code[i] == '[':
stack.append(i)
elif code[i] == ']':
assert(len(stack) > 0)
start = stack.pop()
jumps[start] = i
jumps[i] = start
if len(stack) != 0:
print("unmatched jumps")
exit(1)
while code_ptr < len(code):
instruction = code[code_ptr]
if instruction == '>':
cell_ptr += 1
if cell_ptr >= len(cells):
print("tape oob")
exit(1)
elif instruction == '<':
cell_ptr -= 1
elif instruction == '+':
cells[cell_ptr] = (cells[cell_ptr] + 1) % 256
elif instruction == '-':
cells[cell_ptr] = (cells[cell_ptr] - 1) % 256
elif instruction == '.':
sys.stdout.write(chr(cells[cell_ptr]))
sys.stdout.flush()
elif instruction == ',':
print(', not implemented')
pass
elif instruction == '[':
if cells[cell_ptr] == 0:
code_ptr = jumps[code_ptr]
elif instruction == ']':
if cells[cell_ptr] != 0:
code_ptr = jumps[code_ptr]
code_ptr += 1
hello = '++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.';
if __name__ == "__main__":
run(hello)
Our interpretter needs to keep track of the program, its current position in the program, the tape (aka memory), and its position along this tape. It continues running commands aka instructions until it runs off the end of the program, at which point it is done.
Running this program should output the familiar string:
Hello World!
Why build an interpretter
Some people might argue that most code these days isn’t a programming language interpretter, and so this is a silly task. We disagree for several reasons:
- Many programs do actually resemble an interpretter. Parsing (ex json) is very much reading a character and handling each according to a set of rules. And outside of parsing, much business logic involves lots of jumping around to do some computation.
- Even if most programs written aren’t interpretters, most programming languages used today are interpretted (at least partially). So building one gives a better sense of what your tools are doing.
- Especially for a small language, building an interpretter is a nice fun task that leads to better understanding. Computing should be fun.
Measuring Performance
Now lets measure the performance of our tiny interpretter. The hello world program executes too quickly to be a useful benchmark. Since it spends such little time executing (a few ms on this machine), runtime will be dominated by startup time and writing to the console.
Fortunately, a number of large programs have been written for this language. We’ll be using this program to calculate the Mandelbrot fractal The computation essentially reruns some mathematical formula at each x/y point, and the “color” is based on the number of iterations required to diverge. For our purposes, it is sufficient to know that the program takes a long time to run, and the output is easily verified by comparing the output string.
More details on python version etc are found in the environment section
$ time python3 bf.py
real 29m9.627s
user 29m8.802s
sys 0m0.110s
In the later table, we’ll report on the ‘real time’, which includes both your process executing time(user), as well as the operating system(sys) for syscalls such as writing to console. As planned, system time is a negligible fraction of total running time, since most of this program is “computing” and not input/output.
Programming Language Runtime Comparisons / Benchmarks
As they say, all benchmarks (models) are incorrect, but some are useful. These numbers are not meant to say one language is always faster than another. But rather that for this benchmark of building a naive interpretter in each language, we can examine some aspect of scripting language performance.
| Language | Environment | Real Time | Multiple of G++,default (lower is better) |
|---|---|---|---|
| C++ | g++, default | 2m8.342s | |
| C++ | gcc, Ofast | 0m38.974s | |
| C++ | clang,default | 2m16.451s | |
| C++ | clang,Ofast | 0m33.361s | |
| Python | Python3 | 29m9.627s | |
| Python | Python2 | 67m8.927s | |
| Lua | Lua 5.2.4 | 35m3.688s | |
| Lua | LuaJIT | 6m44.680s | |
| Javascript | Node (v8, JIT) | 1m2.869s | |
| Ruby | Ruby 3.2.3 | 41m14.653s | |
| Ruby | 4.0 YJIT | 27m42.814s | |
| PHP | PHP | 6m37.342s |
For a simple baseline, all are compared against the default output mode of g++. This is so that scripting languages aren’t compared against the optimizing compiler of the -Ofast mode. It also gives us a number which approximates the interpretter overhead vs executing native code.
Missing your favorite language? See an issue? The code is on Github, feel free to send a Pull Request.
Why so slow? (Interpretter Overhead)
The main issue with interpretted languages is that, for each step of the scripting language, it has to perform many steps in the interpretter. For each symbol (ex: +), we execute about 5 lines of python. And as python itself is interpretted, each line of python corresponds with several lines of C code and even more instructions for your physical machine / cpu.
Why so fast? (JIT optimizations)
Given the overheads of running an interpretter, you might be surprised to see some languages come much closer or even surpassing our basic compiled code.
This is thanks to Just-in-Time compilation, aka JIT. The specifics of JIT vary by implementation, but the essentials are the same. With a JIT, the interpretter looks for frequently executed code. It then translates this to native machine code. When running a chunk of code(often a basic block, the interpretter looks to see if it already has a native version ready.
If so, it can jump to the native code. This means that while executing that block, there is zero interpretter overhead. It will jump back to the interpretter loop after the chunk executes.
If not, it runs it with the interpretter, updating its records that might push it to be compiled in the future.
One surprising fact of JIT is that it can even surpass native code, since there is additional information available at runtime that isn’t necessarily evident at compile time. This explains how Javascript via V8 actually beat our unoptimized C code implementation (but not the heavily optimized version).
We are planning a small JIT implementation in the future. If you find this interesting, make sure to subscribe to the newsletter so you dont miss it.
Environment information
If you wanted to know more details about the programming environments used in this article, here is more detail:
Host / Ubuntu
$ uname -a
6.14.0-37-generic #37~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC
$ head /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 142
model name : Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
stepping : 12
microcode : 0x100
cpu MHz : 799.998
cache size : 8192 KB
For all runtimes, the version used is the one readily available via apt. This represents the most likely deployed version. A few were compiled from source for additional variations, these are mentioned below.
Python3
$ python3 --version
Python 3.12.3
Python2
This was build from source, as py2 is not available in apt.
$ cpython/build/python --version
Python 2.7.18
Lua
$ lua -v
Lua 5.2.4 Copyright (C) 1994-2015 Lua.org, PUC-Rio
LuaJIT
$ luajit -v
LuaJIT 2.1.1703358377 -- Copyright (C) 2005-2023 Mike Pall. https://luajit.org/
Javascript / NodeJS / V8
$ node -v
v22.20.0
Ruby
From apt. This version lacks the recent JIT compiler.
$ ruby -v
ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux-gnu]
$ ruby --yjit bf.rb
ruby: warning: Ruby was built without YJIT support. You may need to install rustc to build Ruby with YJIT.
Ruby with YJIT
This version of the code was rewritten somewhat. A quick read suggests that yjit operates on methods/functions, so we redesigned the interpretter so that our single step() could be optimized by the jit. The ~35% improvement matches numbers we’ve seen online for the effectiveness of YJIT.
$ external/ruby/ruby -v
ruby 4.0.1 (2026-01-13 revision e04267a14b) +PRISM [x86_64-linux]
PHP
$ php -v
PHP 8.3.6 (cli) (built: Jan 7 2026 08:40:32) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.3.6, Copyright (c) Zend Technologies
with Zend OPcache v8.3.6, Copyright (c), by Zend Technologies
Stay tuned
In our next article we’ll build a very basic JIT. Make sure to subscribe to our newsletter so you don’t miss it.
| ⭅ Previous (Programming Language Complexity) |