Tech Pioneers

Joe Armstrong: How Erlang Brought Fault-Tolerant Concurrency to the World

Joe Armstrong: How Erlang Brought Fault-Tolerant Concurrency to the World

In the annals of programming language history, few creations have been as quietly revolutionary as Erlang. While languages like Java and JavaScript dominated mainstream developer consciousness, Erlang was silently keeping the world’s telecommunications infrastructure running with a level of reliability that bordered on the miraculous. Behind this language stood Joe Armstrong — a British-born, Swedish-based computer scientist whose deep thinking about concurrency, fault tolerance, and distributed systems produced a programming paradigm that would prove decades ahead of its time. Armstrong did not merely create a language; he codified an entire philosophy of building systems that never stop, systems that heal themselves, systems that treat failure not as a catastrophe but as an expected part of operation. His work at Ericsson in the 1980s produced ideas that now power everything from WhatsApp’s messaging backbone to financial trading platforms, and his influence continues to shape how we think about building reliable software at scale.

Early Life and Path to Technology

Joe Armstrong was born on December 27, 1950, in Bournemouth, England. Growing up in post-war Britain, he developed an early fascination with mathematics and the emerging world of computing. His intellectual curiosity led him to pursue a degree in physics at University College London, where he first encountered programming. The experience was transformative — Armstrong found in code a medium that combined the rigor of mathematics with the creative satisfaction of building tangible things.

After completing his studies in physics, Armstrong moved deeper into computer science. He earned a Master’s degree in Control and Measurement from the Royal Institute of Technology (KTH) in Stockholm, Sweden — a move that would anchor him in Scandinavia for the rest of his career. Sweden’s thriving telecommunications industry, led by Ericsson, provided the perfect ecosystem for Armstrong’s emerging interests in concurrent and distributed computing.

Armstrong’s early career took him through several research positions where he explored artificial intelligence and programming language theory. He spent time at the Swedish Space Corporation and the Swedish Institute of Computer Science (SICS), gaining experience in real-time systems and understanding the immense challenges of building software that needed to operate continuously without failure. These experiences planted the seeds for what would become his life’s work. He studied how existing languages handled concurrency and was dissatisfied with what he found — most treated parallel execution as an afterthought, bolted onto fundamentally sequential designs. The ideas of John McCarthy’s Lisp and functional programming made a deep impression on him, as did the theoretical work on communicating sequential processes by Tony Hoare and the actor model pioneered by Carl Hewitt.

In 1986, Armstrong joined the Ericsson Computer Science Laboratory, a research division tasked with solving one of the most formidable challenges in software engineering: how to build telephone switching systems that simply could not fail. This was not an academic exercise. Telephone networks were critical infrastructure, and downtime meant real consequences — emergency calls that could not connect, businesses losing communication, entire regions going silent. The existing approaches to building this software were brittle and inadequate, and Ericsson knew they needed something fundamentally different.

The Breakthrough: Creating Erlang

When Armstrong began his work at the Ericsson lab, the telecom industry was wrestling with a problem that seemed almost paradoxical. Telephone switches needed to handle millions of concurrent connections simultaneously. They needed to run continuously — ideally forever — without downtime for maintenance or upgrades. They needed to recover gracefully from hardware failures, software bugs, and unexpected conditions. And they needed to do all of this while being modified and updated in real time, because taking a switch offline to deploy new code was simply not acceptable.

Armstrong approached this challenge not by trying to patch existing languages but by reimagining what a programming language could be from the ground up. Drawing on his knowledge of functional programming, Prolog (which heavily influenced Erlang’s early syntax), and the actor model of computation, he began developing what would become Erlang in 1986. The name itself carried dual significance: it honored Agner Krarup Erlang, the Danish mathematician who founded the field of traffic engineering and queuing theory, while also serving as a contraction of “Ericsson Language.”

The Technical Innovation

Erlang’s technical innovations were radical for their time and remain distinctive today. At its core, Erlang treats processes — lightweight, isolated units of computation — as the fundamental building block of all programs. Unlike threads in languages such as C++ or Java, Erlang processes share no memory whatsoever. They communicate exclusively through message passing, sending and receiving data asynchronously. This design eliminates entire categories of bugs — race conditions, deadlocks, and data corruption from shared mutable state simply cannot occur in well-written Erlang code.

An Erlang process is extraordinarily lightweight. While an operating system thread might consume megabytes of memory, an Erlang process starts at roughly 300 bytes. This means a single machine can comfortably run millions of concurrent processes. Each process has its own garbage collector, so garbage collection pauses affect only individual processes rather than halting the entire system — a critical property for real-time applications.

Here is a simple example that demonstrates Erlang’s process spawning and message passing:

-module(greeter).
-export([start/0, loop/0]).

start() ->
    Pid = spawn(?MODULE, loop, []),
    Pid ! {self(), "Hello from the main process!"},
    receive
        {Pid, Reply} ->
            io:format("Received reply: ~s~n", [Reply])
    after 5000 ->
        io:format("Timed out waiting for reply~n")
    end.

loop() ->
    receive
        {From, Message} ->
            io:format("Greeter got: ~s~n", [Message]),
            From ! {self(), "Hello back! I am process " ++
                pid_to_list(self())},
            loop()
    end.

This code illustrates several key Erlang concepts: spawning a new process with spawn, sending messages with the ! operator, pattern matching on received messages, and the recursive loop pattern that keeps a process alive. The after clause demonstrates built-in timeout handling — a feature born directly from telecom requirements where waiting forever was never acceptable.

But the most revolutionary aspect of Erlang was its approach to failure. Armstrong encoded a philosophy he called “let it crash” directly into the language’s DNA. Rather than writing defensive code to handle every possible error condition — an approach that leads to bloated, complex, and ultimately still fragile software — Erlang encourages developers to write code for the “happy path” and let supervisor processes handle failures. When a process crashes, its supervisor can restart it, log the error, escalate the problem, or take any other appropriate action. This supervision tree pattern means that failures are contained, isolated, and automatically managed.

Why It Mattered

The impact of Armstrong’s work at Ericsson was nothing short of extraordinary. Erlang was used to build the AXD301 ATM switch, a telecommunications system that achieved the legendary “nine nines” of availability — 99.9999999% uptime. To put this in perspective, that translates to less than 31 milliseconds of downtime per year. No other general-purpose programming platform had achieved anything close to this level of reliability.

The AXD301 contained over a million lines of Erlang code and handled telephone traffic for major carriers around the world. It demonstrated that Erlang was not merely an academic curiosity but a production-grade tool capable of handling the most demanding real-world requirements. The system supported hot code swapping — the ability to upgrade running software without stopping the system — a feature that Armstrong considered essential and that most other languages still struggle to provide.

What made Erlang matter beyond Ericsson was the growing realization that the problems of telecommunications — massive concurrency, distributed operation, fault tolerance, and continuous availability — were becoming the problems of the entire software industry. As the internet scaled and applications moved from single machines to distributed clusters, the challenges Armstrong had solved in the 1980s became universal. The web era transformed Erlang from a niche telecom language into a prescient solution for the distributed computing challenges that would define the 21st century. Modern project management approaches, like those facilitated by tools such as Taskee, similarly emphasize building resilient workflows that can adapt to unexpected changes — a philosophy that echoes Armstrong’s core design principles.

The Open-Source Era and OTP

In 1998, Ericsson made a fateful decision: they open-sourced Erlang. The story behind this decision is tinged with corporate irony. Ericsson management had briefly banned the use of Erlang internally, favoring Java for new projects. Armstrong and his colleagues, frustrated by this decision, convinced the company to release Erlang under an open-source license. Armstrong himself left Ericsson for a period, pursuing his PhD and working in startups, before eventually returning.

The open-sourcing of Erlang came bundled with OTP (Open Telecom Platform), a collection of libraries, design principles, and patterns that codified years of hard-won experience in building fault-tolerant systems. OTP provided standardized implementations of supervisors, generic servers, state machines, event handlers, and application structures. It transformed Erlang from a language into a complete framework for building industrial-strength systems.

OTP’s design patterns deserve special attention because they represent a different level of abstraction from what most programmers encounter. Rather than providing libraries for specific tasks, OTP provides behavioral templates — standardized ways of structuring concurrent processes and their interactions. A gen_server (generic server), for instance, encapsulates the common pattern of a process that maintains state and responds to synchronous and asynchronous requests:

-module(counter_server).
-behaviour(gen_server).

-export([start_link/0, increment/0, get_count/0]).
-export([init/1, handle_call/3, handle_cast/2]).

start_link() ->
    gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).

init([]) ->
    {ok, 0}.  % Initial state: counter = 0

increment() ->
    gen_server:cast(?MODULE, increment).

get_count() ->
    gen_server:call(?MODULE, get_count).

handle_cast(increment, Count) ->
    {noreply, Count + 1}.

handle_call(get_count, _From, Count) ->
    {reply, Count, Count}.

This pattern separates the generic server behavior (process lifecycle, message handling, error recovery) from the specific business logic (counting). The developer only implements the callbacks, while OTP handles all the complex machinery of process management, monitoring, and supervision. This approach to software architecture — separating concerns at the process level — influenced how an entire generation of distributed systems engineers thought about building reliable software.

The open-source release attracted a community of developers who recognized Erlang’s unique strengths. Companies outside the telecom industry began adopting it for systems where reliability and concurrency were paramount. This community would eventually grow to produce one of Erlang’s most significant offspring: the Elixir programming language, created by Jose Valim in 2011, which brought modern syntax and tooling to the Erlang virtual machine (BEAM) while preserving all of its runtime characteristics.

Influence on Modern Systems

The systems built on Erlang and its runtime read like a list of the most demanding distributed applications in the world. WhatsApp, before its acquisition by Facebook for $19 billion, ran its entire messaging infrastructure on Erlang. At the time of acquisition, WhatsApp supported 450 million users with a backend team of approximately 35 engineers — a ratio that would be unthinkable with most other technology stacks. The efficiency came directly from Erlang’s lightweight processes and the BEAM virtual machine’s ability to handle millions of concurrent connections.

RabbitMQ, one of the world’s most widely deployed message brokers, is written in Erlang. Its ability to handle massive throughput while maintaining reliability and supporting complex routing patterns stems directly from the language’s concurrency model. CouchDB, the distributed database that pioneered many concepts later adopted by the NoSQL movement, chose Erlang for its core implementation precisely because of the language’s native support for distributed, fault-tolerant operation. Riak, another influential distributed database, similarly leveraged Erlang’s strengths.

Armstrong’s ideas about concurrency have also profoundly influenced language design beyond the Erlang ecosystem. Rob Pike’s Go language adopted goroutines and channels — lightweight concurrency primitives clearly influenced by Erlang’s processes and message passing. Rust incorporated ideas about safe concurrency and message passing. Even languages like Python and JavaScript have introduced async/await patterns and actor-based frameworks that echo Armstrong’s foundational work. The Akka framework for the JVM brought actor-model concurrency to the Java and Scala ecosystems, explicitly citing Erlang as its primary inspiration.

In the world of modern digital product development, where distributed teams need to coordinate complex projects across time zones, the principles Armstrong championed — isolation, fault tolerance, and graceful error handling — resonate far beyond programming. Platforms like Toimi apply similar thinking to agency workflows, ensuring that the failure of one process does not cascade into systemic breakdown.

Philosophy and Engineering Approach

Armstrong was not merely a language implementer; he was a deep and original thinker about the nature of software and computation. His 2003 PhD thesis, “Making Reliable Distributed Systems in the Presence of Software Errors,” remains one of the most cited and influential documents in the field. In it, he laid out a comprehensive philosophy for building systems that work correctly even when individual components fail — a philosophy that has become the foundation of modern distributed systems design.

Key Principles

Let it crash. This was Armstrong’s most famous and most misunderstood principle. It did not mean writing careless code or ignoring errors. Rather, it meant accepting that errors will occur and designing systems where errors in one component cannot bring down the whole. Instead of writing increasingly complex defensive code, Armstrong advocated for simple, clear process logic combined with robust supervision hierarchies. A process that encounters an unexpected condition should crash cleanly rather than limp along in a corrupted state. Its supervisor then restarts it from a known good state. This approach produces systems that are paradoxically more reliable than those written with exhaustive error handling, because the recovery path is always exercised and always works.

Isolation is everything. Armstrong argued passionately that true concurrency required true isolation. Shared mutable state, he believed, was the root cause of the vast majority of concurrency bugs. He often compared Erlang processes to people in a room communicating by passing notes — each person has their own brain (memory), and the only way to share information is through explicit messages. This metaphor captured both the simplicity and the power of the actor model. The thinking of pioneers like Edsger Dijkstra, who championed structured approaches to managing computational complexity, clearly resonated with Armstrong’s insistence on clean, isolated process boundaries.

The world is concurrent. Armstrong frequently pointed out that the real world operates concurrently — millions of events happen simultaneously, independently, and asynchronously. He argued that sequential programming was actually the unnatural abstraction, and that languages designed around concurrency more faithfully modeled reality. This perspective aligned with the vision of Alan Kay, who similarly saw computation as networks of independent entities communicating through messages rather than as sequences of instructions manipulating shared memory.

Distribution is the natural state of computing. Long before “cloud computing” entered the mainstream vocabulary, Armstrong designed Erlang for a world where computation was spread across multiple machines. Erlang processes communicate the same way whether they are on the same machine or on different machines across a network — a property called location transparency. This design decision, which seemed extravagant in the 1980s, proved visionary as the industry moved toward distributed architectures, microservices, and cloud deployments.

Hot code loading as a fundamental requirement. Armstrong believed that the ability to upgrade running systems without stopping them was not a luxury but a necessity. Erlang’s support for hot code swapping allowed engineers to deploy new versions of modules into a running system while maintaining all existing connections and state. This capability, born from the telecom requirement of zero-downtime operation, has become increasingly relevant as modern systems are expected to provide continuous service.

Legacy and Modern Relevance

Joe Armstrong passed away on April 20, 2019, at the age of 68. The outpouring of grief and tribute from the global programming community testified to the depth of his impact. Developers, language designers, and distributed systems engineers from around the world shared stories of how Armstrong’s work had shaped their thinking and their careers.

Armstrong’s legacy lives on through multiple channels. The Erlang programming language continues to be actively developed and maintained by the OTP team at Ericsson and the broader open-source community. The BEAM virtual machine — the runtime that executes Erlang code — has become a platform in its own right, hosting not just Erlang but also Elixir, Gleam, LFE (Lisp Flavoured Erlang), and other languages that benefit from its battle-tested concurrency and fault-tolerance capabilities.

Elixir, in particular, has brought Armstrong’s ideas to a new generation of developers. Created by Jose Valim, Elixir combines the power of the BEAM with a modern, Ruby-inspired syntax and excellent tooling. The Phoenix web framework, built on Elixir, has demonstrated that Erlang-style concurrency can power real-time web applications with remarkable efficiency — handling millions of simultaneous WebSocket connections on a single server. This represents the fulfillment of Armstrong’s vision in a domain he could not have originally anticipated.

The intellectual lineage from Armstrong’s work extends to the fundamental architecture of modern distributed systems. The supervision tree pattern he developed has been adopted and adapted in frameworks across numerous languages. The concept of “let it crash” has influenced the design of Kubernetes, which manages containerized applications by detecting failures and automatically restarting or rescheduling components — a direct parallel to Erlang’s supervisor behavior. Even Netflix’s Chaos Monkey, which randomly terminates production instances to test system resilience, embodies Armstrong’s conviction that systems must be designed to handle failure as a routine event rather than an exceptional one.

Armstrong was also a gifted communicator and educator. His book “Programming Erlang: Software for a Concurrent World” introduced thousands of developers to concurrent programming concepts. His talks, characterized by warmth, humor, and penetrating insight, are still watched and shared years after they were recorded. He had a remarkable ability to explain complex ideas through simple metaphors — comparing processes to people, message passing to phone calls, and fault tolerance to the immune system. Much like Alan Turing laid the theoretical foundations of computation itself, Armstrong established practical foundations for how concurrent systems could be built reliably.

In an era of ever-increasing system complexity, where applications are expected to handle billions of requests, scale across global infrastructure, and maintain continuous availability, Armstrong’s contributions appear more relevant with each passing year. The problems he solved at Ericsson in the 1980s — building systems that never stop, that handle failures gracefully, that scale effortlessly — are now the central challenges of the entire software industry. His answer to those challenges, embodied in Erlang and its ecosystem, continues to provide both practical tools and intellectual inspiration for engineers building the systems of tomorrow.

Key Facts

  • Born December 27, 1950, in Bournemouth, England; lived and worked primarily in Sweden
  • Studied physics at University College London and earned a Master’s from KTH in Stockholm
  • Created Erlang in 1986 at the Ericsson Computer Science Laboratory
  • The name “Erlang” honors mathematician Agner Krarup Erlang and abbreviates “Ericsson Language”
  • Erlang was open-sourced in 1998 along with the OTP (Open Telecom Platform) framework
  • The Ericsson AXD301 switch, built with Erlang, achieved 99.9999999% availability (nine nines)
  • Completed his PhD thesis “Making Reliable Distributed Systems in the Presence of Software Errors” in 2003
  • Erlang and its BEAM virtual machine power WhatsApp, RabbitMQ, CouchDB, and Riak
  • Authored “Programming Erlang: Software for a Concurrent World,” a foundational text on concurrent programming
  • Passed away on April 20, 2019, at the age of 68, leaving behind a transformative legacy in distributed systems

Frequently Asked Questions

What makes Erlang different from other programming languages?

Erlang was designed from the ground up for concurrency, fault tolerance, and distributed computing — properties that most other languages treat as add-on features. Its lightweight process model allows millions of concurrent processes on a single machine, each completely isolated with its own memory and garbage collector. Processes communicate exclusively through asynchronous message passing, eliminating the shared-state concurrency bugs that plague languages like C++ and Java. Combined with OTP’s supervision trees and hot code swapping, Erlang provides a uniquely integrated platform for building systems that must run continuously and handle failures gracefully. While languages like Fortran were designed for numerical computation and Perl for text processing, Erlang was purpose-built for the distinct challenge of reliable concurrent systems.

Why is the “let it crash” philosophy considered good engineering?

The “let it crash” philosophy may sound reckless, but it is actually a deeply pragmatic approach to building reliable systems. Traditional error handling attempts to anticipate and recover from every possible failure within the failing component itself, leading to complex, hard-to-test code that often handles errors incorrectly. Armstrong’s approach separates error detection from error recovery: a process that encounters an unexpected condition crashes immediately and cleanly, and a separate supervisor process — which is simpler and therefore more likely to be correct — handles the recovery by restarting the failed process from a known good state. This produces systems where the recovery mechanism is always exercised (because crashes happen regularly in any sufficiently large system), making it well-tested and reliable. The result is paradoxically more robust software with simpler, more readable code.

Is Erlang still relevant in modern software development?

Erlang is arguably more relevant today than at any point in its history. The challenges it was designed to address — massive concurrency, distributed computing, fault tolerance, and zero-downtime operation — have become the defining challenges of modern software. WhatsApp’s ability to serve billions of messages daily with a small engineering team demonstrates Erlang’s continued practical value. The BEAM virtual machine now hosts multiple languages, with Elixir bringing Erlang’s runtime strengths to a broader audience through modern syntax and tooling. The Phoenix framework has proven that BEAM-based systems can power real-time web applications with extraordinary efficiency. Furthermore, Armstrong’s design principles have influenced the broader software ecosystem: Kubernetes, microservice architectures, and actor-model frameworks across many languages all reflect ideas that Armstrong pioneered. As systems grow more distributed and the demand for reliability increases, Erlang’s core philosophy continues to guide the industry.