Tech Pioneers

Andrew Viterbi: Creator of the Viterbi Algorithm and Co-Founder of Qualcomm

Andrew Viterbi: Creator of the Viterbi Algorithm and Co-Founder of Qualcomm

In the spring of 1967, a young Italian-born engineer at the Jet Propulsion Laboratory in Pasadena sat down to solve a problem that had been nagging at communications theorists for over a decade: how to efficiently decode convolutional codes transmitted across the vast, noisy emptiness of space. The solution Andrew Viterbi produced — a deceptively elegant dynamic programming algorithm — would become one of the most widely deployed algorithms in the history of telecommunications. Every time you make a phone call on a cellular network, stream a video over Wi-Fi, or receive data from a deep-space probe billions of miles away, the Viterbi algorithm is almost certainly working behind the scenes. And yet, Viterbi’s impact did not stop at theory. Two decades later, he co-founded Qualcomm, the company that would bet everything on CDMA technology and fundamentally reshape the global wireless industry. This is the story of how one mind bridged pure mathematical insight and massive commercial ambition, leaving an indelible mark on the modern connected world.

Early Life and the Journey to America

Andrea Viterbi was born on March 9, 1935, in Bergamo, Italy, to a Jewish-Italian family. The political landscape of fascist Italy was growing increasingly hostile. In 1938, Mussolini’s racial laws stripped Italian Jews of civil rights, and the Viterbi family made the agonizing decision to flee. They arrived in the United States in 1939, settling in Boston, where young Andrea — now Andrew — would grow up as a first-generation American immigrant. The experience of displacement and reinvention shaped his character: a quiet determination, an outsider’s hunger to prove himself, and a deep appreciation for the opportunities that America offered to those willing to work.

Viterbi excelled academically, earning his bachelor’s and master’s degrees in electrical engineering from MIT in 1957, where he studied under some of the finest minds in information theory and communications. He then moved west to the California Institute of Technology, where he completed his Ph.D. in 1962 under the supervision of Solomon Golomb. His dissertation work on sequential decoding planted the seeds of what would become his defining contribution. At Caltech and later at the Jet Propulsion Laboratory, Viterbi was immersed in the practical challenges of deep-space communication — sending reliable signals across millions of miles where every photon of energy counted.

The Birth of the Viterbi Algorithm

The core problem was one that Claude Shannon had framed in his landmark 1948 paper: it is theoretically possible to communicate reliably over a noisy channel, but Shannon’s proof was existential — it told you that good codes existed, not how to build or decode them efficiently. By the 1960s, convolutional codes had emerged as a promising family of error-correcting codes. The trouble was decoding them. The optimal approach — maximum likelihood sequence estimation — required examining every possible transmitted sequence and selecting the one most likely to have produced the received signal. For any practical code, the number of candidate sequences grew exponentially with message length, making brute-force decoding computationally impossible.

Viterbi’s insight, published in 1967, was that the structure of convolutional codes could be exploited using dynamic programming. Because a convolutional encoder operates as a finite-state machine, the decoding problem could be represented as finding the shortest (or most likely) path through a trellis — a graph that unfolds the encoder’s state transitions over time. At each time step, the algorithm only needs to retain the single best path leading into each state, discarding all others. This pruning is exact, not approximate: no information is lost. The result is an algorithm whose complexity grows linearly with message length, rather than exponentially.

How the Viterbi Algorithm Works

The Viterbi algorithm operates on a trellis diagram representing all possible state transitions of a hidden process. At each time step, it computes the cumulative cost (or probability) of reaching every state via every possible predecessor, retains only the best path into each state, and traces back through the stored decisions to recover the optimal sequence. The algorithm is guaranteed to find the globally optimal solution — a maximum likelihood path — without exhaustive search. This same structure makes it the backbone of decoding in hidden Markov models (HMMs), which are used in speech recognition, bioinformatics, and natural language processing.

Below is a Python implementation of the Viterbi algorithm for a simple Hidden Markov Model, the kind of system used in early speech recognition engines and still foundational in computational biology:

import numpy as np

def viterbi(observations, states, start_prob, trans_prob, emit_prob):
    """
    Viterbi algorithm for Hidden Markov Models.
    
    Args:
        observations: list of observed symbols (indices)
        states: list of hidden state indices
        start_prob: dict {state: initial probability}
        trans_prob: dict {state: {next_state: probability}}
        emit_prob: dict {state: {observation: probability}}
    
    Returns:
        best_path: list of most likely hidden states
        best_prob: probability of the best path
    """
    T = len(observations)
    N = len(states)
    
    # Initialization: probability of starting in each state
    # and emitting the first observation
    viterbi_table = np.zeros((N, T))
    backpointer = np.zeros((N, T), dtype=int)
    
    for s in states:
        viterbi_table[s][0] = start_prob[s] * emit_prob[s][observations[0]]
        backpointer[s][0] = -1  # no predecessor at t=0
    
    # Recursion: for each time step, find the best predecessor
    # for each current state — this is the key DP step
    for t in range(1, T):
        for curr_state in states:
            max_prob = 0.0
            best_prev = 0
            for prev_state in states:
                prob = (viterbi_table[prev_state][t - 1]
                        * trans_prob[prev_state][curr_state]
                        * emit_prob[curr_state][observations[t]])
                if prob > max_prob:
                    max_prob = prob
                    best_prev = prev_state
            viterbi_table[curr_state][t] = max_prob
            backpointer[curr_state][t] = best_prev
    
    # Termination: find the state with highest probability at final step
    best_last_state = np.argmax(viterbi_table[:, T - 1])
    best_prob = viterbi_table[best_last_state][T - 1]
    
    # Traceback: follow backpointers to reconstruct optimal path
    best_path = [0] * T
    best_path[T - 1] = best_last_state
    for t in range(T - 2, -1, -1):
        best_path[t] = backpointer[best_path[t + 1]][t + 1]
    
    return best_path, best_prob


# Example: Weather HMM (hidden states: Sunny, Rainy)
# Observations: Walk=0, Shop=1, Clean=2
states = [0, 1]  # 0=Sunny, 1=Rainy
start_prob = {0: 0.6, 1: 0.4}
trans_prob = {0: {0: 0.7, 1: 0.3}, 1: {0: 0.4, 1: 0.6}}
emit_prob = {0: {0: 0.6, 1: 0.3, 2: 0.1},
             1: {0: 0.1, 1: 0.4, 2: 0.5}}

observations = [0, 1, 2, 1, 0]  # Walk, Shop, Clean, Shop, Walk
path, prob = viterbi(observations, states, start_prob, trans_prob, emit_prob)

state_names = {0: "Sunny", 1: "Rainy"}
print("Observations: Walk, Shop, Clean, Shop, Walk")
print("Most likely weather:", [state_names[s] for s in path])
print(f"Path probability: {prob:.6f}")

The elegance of this algorithm lies in its optimal substructure: the best path to any state at time t must pass through the best path to some state at time t−1. This property — shared with other dynamic programming algorithms studied by pioneers like Edsger Dijkstra — allows the algorithm to achieve global optimality through local decisions. The Viterbi algorithm runs in O(N²T) time, where N is the number of states and T is the sequence length, making it practical for real-time applications even on modest hardware.

From Deep Space to Every Pocket: The Algorithm’s Reach

The Viterbi algorithm’s first major deployment was in NASA’s deep-space communications systems. The Voyager missions of 1977, which sent probes past Jupiter, Saturn, Uranus, and Neptune, relied on Viterbi decoding to extract intelligible data from signals so faint that they carried less power than a refrigerator light bulb — after traveling billions of miles. The algorithm allowed NASA to use weaker, cheaper transmitters while maintaining data integrity, saving weight and power on spacecraft where every gram and watt mattered.

But space was just the beginning. The GSM standard for second-generation cellular networks, adopted across Europe and then the world in the early 1990s, embedded Viterbi decoding as a core component of its channel equalization and error-correction pipeline. Every GSM handset ever manufactured contained a hardware Viterbi decoder. The algorithm also became indispensable in dial-up modems, satellite television (DVB-S), digital audio broadcasting, and the 802.11 Wi-Fi standard. By the late 1990s, it was estimated that more Viterbi decoders were running simultaneously on Earth than any other single algorithm.

In parallel, the algorithm found a second life in entirely different domains. In the 1970s and 1980s, researchers in speech recognition — following the work of pioneers like John McCarthy in artificial intelligence — adopted Hidden Markov Models as the dominant framework for acoustic modeling. The Viterbi algorithm became the standard method for finding the most likely sequence of phonemes or words given an acoustic signal. IBM’s pioneering speech recognition systems, and later the systems at AT&T, Dragon, and Nuance, all relied on Viterbi decoding at their core.

Founding Qualcomm and the CDMA Revolution

By the early 1980s, Andrew Viterbi was a tenured professor at UCLA and a respected figure in information theory. He had co-founded Linkabit in 1968, one of the first companies to commercialize satellite communication technology, which was eventually acquired by M/A-COM. But Viterbi’s most consequential entrepreneurial chapter was about to begin.

In 1985, Viterbi and Irwin Jacobs — a fellow MIT-trained engineer and Linkabit co-founder — launched Qualcomm (Quality Communications) in San Diego with a handful of employees and a radical idea. The prevailing wisdom in cellular communications was that TDMA (Time Division Multiple Access) or FDMA (Frequency Division Multiple Access) were the only viable approaches for dividing the radio spectrum among users. Viterbi and Jacobs believed otherwise. They championed CDMA (Code Division Multiple Access), a spread-spectrum technique rooted in military communications that allowed all users to transmit simultaneously over the entire frequency band, distinguished only by unique pseudo-random codes.

The telecommunications establishment was deeply skeptical. Major carriers and equipment manufacturers argued that CDMA could not work in practice — that the “near-far problem” (where a strong nearby signal drowns out a weak distant one) was unsolvable, and that the approach would collapse under real-world conditions. Viterbi’s deep understanding of information theory told him otherwise. He knew that with proper power control and sophisticated signal processing, CDMA could offer dramatically higher capacity than TDMA — a prediction rooted in the same Shannon-theoretic principles that had guided his algorithmic work.

CDMA Spread Spectrum: The Technical Foundation

CDMA works by assigning each user a unique spreading code — a pseudo-random sequence that “spreads” the user’s narrowband signal across a wide frequency band. At the receiver, the same code is used to “despread” the desired signal, while all other users’ signals appear as low-level noise. The processing gain — the ratio of the spread bandwidth to the original signal bandwidth — determines how many simultaneous users the system can support.

The following code demonstrates the fundamental principle of CDMA spreading and despreading, showing how multiple users can share the same channel:

import numpy as np

def generate_walsh_codes(n):
    """
    Generate Walsh-Hadamard orthogonal spreading codes.
    These codes are used in IS-95/CDMA2000 to separate users
    on the forward (base-station to mobile) link.
    """
    if n == 1:
        return np.array([[1]])
    smaller = generate_walsh_codes(n // 2)
    top = np.hstack([smaller, smaller])
    bottom = np.hstack([smaller, -smaller])
    return np.vstack([top, bottom])


def cdma_simulate(num_users=4, data_bits=8, noise_level=0.3):
    """
    Simulate CDMA spread-spectrum communication.
    Each user's data is spread with a unique orthogonal code,
    all signals are summed (shared channel), and each receiver
    despreads using its own code to recover the original data.
    """
    chip_length = num_users  # spreading factor = number of users
    codes = generate_walsh_codes(chip_length)
    
    print(f"=== CDMA Simulation: {num_users} users, "
          f"{data_bits} bits each ===\n")
    
    # Each user generates random binary data (+1 / -1)
    user_data = np.random.choice([-1, 1],
                                  size=(num_users, data_bits))
    
    # Spreading: each bit is multiplied by the user's code
    # This expands each bit into chip_length "chips"
    composite_signal = np.zeros((data_bits, chip_length))
    
    for user_id in range(num_users):
        code = codes[user_id]
        for bit_idx in range(data_bits):
            spread_signal = user_data[user_id][bit_idx] * code
            composite_signal[bit_idx] += spread_signal
    
    # Add channel noise (AWGN)
    noise = np.random.normal(0, noise_level,
                              composite_signal.shape)
    received_signal = composite_signal + noise
    
    # Despreading: correlate received signal with each user's code
    # The key insight — other users' codes are orthogonal,
    # so they average to zero after correlation
    print("User | Sent Data         | Recovered Data      | Match")
    print("-" * 62)
    
    for user_id in range(num_users):
        code = codes[user_id]
        recovered = np.zeros(data_bits)
        for bit_idx in range(data_bits):
            # Correlation: dot product with spreading code
            correlation = np.dot(received_signal[bit_idx], code)
            # Decision: sign of correlation gives the bit
            recovered[bit_idx] = 1 if correlation > 0 else -1
        
        sent_str = ''.join(['1' if b == 1 else '0'
                            for b in user_data[user_id]])
        recv_str = ''.join(['1' if b == 1 else '0'
                            for b in recovered])
        match = "PASS" if np.array_equal(
            user_data[user_id], recovered) else "FAIL"
        print(f"  {user_id}  | {sent_str}          | "
              f"{recv_str}          | {match}")
    
    # Calculate effective SNR after despreading
    processing_gain_db = 10 * np.log10(chip_length)
    print(f"\nProcessing gain: {processing_gain_db:.1f} dB "
          f"(spreading factor: {chip_length})")
    print(f"Channel noise level: {noise_level}")
    print(f"All {num_users} users share the SAME frequency band "
          f"simultaneously")


cdma_simulate(num_users=4, data_bits=8, noise_level=0.3)

This simulation illustrates the fundamental magic of CDMA: multiple users occupy the same frequency band at the same time, yet each receiver can extract its own signal cleanly. The orthogonality of the Walsh codes ensures that the interference from other users cancels out during despreading. In practice, Qualcomm’s IS-95 standard (and later CDMA2000) added sophisticated power control, rake receivers for multipath combining, and soft handoff between cells — all innovations that Viterbi and his team at Qualcomm developed to make CDMA work in the messy real world of mobile communications.

The Battle for the Wireless Standard

The 1990s saw one of the most consequential technology standards battles in history. On one side stood the European-backed GSM/TDMA alliance, supported by massive incumbents like Ericsson, Nokia, and Siemens. On the other stood Qualcomm, a relatively small San Diego startup, championing CDMA. The stakes were enormous: whichever technology won would define the architecture of the global cellular industry for decades.

Viterbi played a crucial role not just as a technologist but as an evangelist and strategist. He traveled the world, presenting to skeptical carriers and regulators, using his academic credibility and deep theoretical knowledge to argue that CDMA’s capacity advantages were not just theoretical but achievable. The breakthrough came when Korean carriers, seeking to leapfrog established players, adopted CDMA for their national cellular network. The success of CDMA in Korea — and later its adoption by major US carriers like Verizon and Sprint — vindicated Qualcomm’s bet.

The ultimate vindication came with 3G. When the International Telecommunication Union defined the standards for third-generation mobile networks, both major 3G standards — WCDMA (used in UMTS, the European 3G standard) and CDMA2000 — were based on CDMA technology. Qualcomm’s patents on CDMA became some of the most valuable intellectual property in the history of technology. The principles of spread spectrum that Viterbi championed continued to influence 4G LTE and even aspects of 5G NR design, much as the networking principles championed by Vint Cerf continued to underpin every generation of internet infrastructure.

Academic Legacy and Information Theory

Throughout his entrepreneurial career, Viterbi never abandoned his roots in theoretical research. His 1979 textbook Principles of Digital Communication and Coding, co-authored with Jim Omura, became a standard reference in graduate programs worldwide. He continued to publish influential papers on topics ranging from error exponents to the capacity of CDMA systems.

Viterbi’s work sits at a fascinating intersection in the history of computer science. His algorithm is a specific instance of dynamic programming — the same paradigm that Donald Knuth catalogued and analyzed in his monumental works on algorithms. The Viterbi algorithm’s trellis structure is closely related to Dijkstra’s shortest-path algorithm, and its application to HMMs connects it to the broader tradition of probabilistic reasoning in AI that traces back to the foundational work of Alan Turing on machine intelligence.

In recognition of his contributions, Viterbi received virtually every major award in engineering and technology. The IEEE Alexander Graham Bell Medal (1990), the National Medal of Science (2007), the Marconi Prize, the Benjamin Franklin Medal, and membership in the National Academy of Engineering, the National Academy of Sciences, and the American Academy of Arts and Sciences all recognized different facets of his work. In 2004, the USC School of Engineering was renamed the Andrew and Erna Viterbi School of Engineering following a $52 million donation — one of the largest gifts to an engineering school in history.

The Broader Impact: Where the Algorithm Lives Today

The Viterbi algorithm’s influence extends far beyond its original domain of channel coding. In computational biology, it is used to identify gene structures, predict protein folding states, and align DNA sequences. In natural language processing, it powers part-of-speech taggers and named entity recognizers. In robotics, it helps with path planning and localization. The algorithm’s generality — applicable to any problem that can be modeled as finding an optimal path through a state-space trellis — has made it one of the most versatile tools in the computational toolkit.

Modern teams building sophisticated software systems, whether in telecommunications, AI, or data processing, often benefit from project management platforms like Taskee to coordinate the complex engineering work that brings algorithms like Viterbi’s from theory to production. The journey from mathematical proof to deployed system requires careful coordination across hardware, software, and testing teams — exactly the kind of multi-disciplinary collaboration that characterized Qualcomm’s development of CDMA.

For organizations looking to build digital products that leverage advanced algorithms and signal processing, working with experienced digital agencies like Toimi can bridge the gap between theoretical capability and practical implementation. The lesson of Viterbi’s career is that the greatest impact comes when deep technical knowledge meets disciplined engineering execution.

Personal Philosophy and Leadership Style

Colleagues describe Viterbi as soft-spoken, methodical, and deeply principled. Unlike many tech entrepreneurs who lead through charisma or force of personality, Viterbi led through the clarity and rigor of his thinking. He was known for his ability to reduce complex problems to their essential structure — a skill that served him equally well in mathematics and in business strategy.

Viterbi has been a generous philanthropist, supporting education, research, and cultural institutions. His donations to USC, to the Technion in Israel, and to various charitable organizations reflect a belief that knowledge and opportunity should be shared broadly. His own life story — a refugee child who became one of the most influential engineers of the twentieth century — embodies the transformative power of education and intellectual freedom.

Lessons from Viterbi’s Career

Andrew Viterbi’s career offers several enduring lessons for technologists, entrepreneurs, and anyone working at the frontier of knowledge:

Deep theory enables practical breakthroughs. The Viterbi algorithm did not emerge from tinkering or incremental improvement. It came from a deep understanding of information theory, probability, and the mathematical structure of coding systems. Viterbi’s ability to move fluidly between abstract theory and concrete implementation set him apart from both pure theorists and pure practitioners.

Patience and conviction matter. Qualcomm spent years fighting entrenched skepticism about CDMA. Many companies would have pivoted or folded. Viterbi and Jacobs held firm because their conviction was rooted not in wishful thinking but in rigorous analysis — they understood the theoretical limits and knew that CDMA could approach them.

Algorithms are infrastructure. Like the foundational work of John Backus on Fortran or the networking protocols that became the invisible fabric of the internet, the Viterbi algorithm became infrastructure — so deeply embedded in technological systems that most people who benefit from it daily have never heard of it. The most impactful technologies often become invisible.

Immigration enriches innovation. Viterbi’s story is inseparable from the story of immigration to America. A child fleeing persecution became the architect of technologies that connected billions of people. His life is a powerful reminder that openness to talent from around the world is not just a moral good but a strategic advantage.

Frequently Asked Questions

What is the Viterbi algorithm used for?

The Viterbi algorithm is used to find the most likely sequence of hidden states in systems modeled as Hidden Markov Models or finite-state machines. Its applications span telecommunications (decoding error-correcting codes in cellular networks, Wi-Fi, satellite links, and deep-space communication), speech recognition (finding the most likely sequence of words or phonemes), computational biology (gene prediction and DNA sequence alignment), natural language processing (part-of-speech tagging), and robotics (path planning and localization). It is one of the most widely deployed algorithms in computing history.

How does the Viterbi algorithm differ from other dynamic programming algorithms?

While the Viterbi algorithm shares the dynamic programming principle of optimal substructure with algorithms like Dijkstra’s shortest path or the Bellman-Ford algorithm, it is specifically designed for sequential decision problems on trellis (lattice) structures. Unlike general shortest-path algorithms that operate on arbitrary graphs, the Viterbi algorithm exploits the regular, time-ordered structure of a trellis — where states at each time step connect only to states at the next time step. This regularity makes it exceptionally efficient for problems involving sequences, such as decoding transmitted signals or analyzing temporal data. Its time complexity is O(N²T), where N is the number of states and T is the sequence length.

What is CDMA and why was it controversial?

CDMA (Code Division Multiple Access) is a wireless communication technique where all users transmit simultaneously over the entire available frequency band, each using a unique pseudo-random spreading code. It was controversial in the late 1980s and early 1990s because the established telecommunications industry believed that only TDMA (Time Division Multiple Access) and FDMA (Frequency Division Multiple Access) — which divide spectrum by time slots or frequency bands — were viable for cellular networks. Critics argued that the near-far problem and multi-user interference would make CDMA impractical. Qualcomm, led by Andrew Viterbi and Irwin Jacobs, proved these critics wrong by developing sophisticated power control and signal processing techniques that made CDMA not only viable but superior in capacity to competing approaches.

What awards did Andrew Viterbi receive?

Andrew Viterbi received numerous prestigious awards throughout his career, including the IEEE Alexander Graham Bell Medal (1990), the National Medal of Science (2007), the Marconi Prize, the Benjamin Franklin Medal in Electrical Engineering, and the IEEE Golden Jubilee Award for Technological Innovation. He was elected to the National Academy of Engineering, the National Academy of Sciences, and the American Academy of Arts and Sciences. In 2004, the University of Southern California renamed its engineering school the Andrew and Erna Viterbi School of Engineering in recognition of his contributions to the field and his philanthropic generosity.

How did the Viterbi algorithm influence modern AI and machine learning?

The Viterbi algorithm was instrumental in the development of modern AI, particularly through its role in Hidden Markov Model-based systems. For decades, HMMs decoded using the Viterbi algorithm were the dominant approach in automatic speech recognition — the technology behind systems from IBM, AT&T, Dragon NaturallySpeaking, and early smartphone voice assistants. In natural language processing, Viterbi decoding powered part-of-speech taggers and named entity recognizers that were state-of-the-art through the 2000s. While deep learning has since replaced HMMs in many applications, the Viterbi algorithm’s principles of efficient sequential inference continue to influence modern sequence models, including CTC (Connectionist Temporal Classification) decoding in neural speech recognition and beam search in neural machine translation. The algorithm remains a foundational concept taught in every machine learning curriculum.