Tech Pioneers

Brendan Gregg: The Performance Engineering Pioneer Who Taught the World to See Inside Running Systems

Brendan Gregg: The Performance Engineering Pioneer Who Taught the World to See Inside Running Systems

In 2004, a young performance engineer at Sun Microsystems was staring at a problem that had tormented system administrators for decades: a production server was slow, and nobody could figure out why. The usual suspects — CPU load, memory pressure, disk I/O — all looked normal. The application logs revealed nothing. The network metrics were clean. Somewhere between the kernel and user space, thousands of microseconds were vanishing into a black hole that no existing tool could illuminate. Brendan Gregg did not accept that the answer was unknowable. Instead, he wrote a one-liner using a revolutionary new technology called DTrace that peered directly into the running kernel, tracing function calls in real time without rebooting or restarting anything. Within minutes he found the culprit — a lock contention issue buried three layers deep in the storage stack that no log file would ever surface. That moment captured everything Brendan Gregg would spend the next two decades doing: building tools and methodologies that make the invisible visible, turning operating systems from opaque black boxes into transparent machines whose every instruction can be observed, measured, and understood.

Early Career and the Sun Microsystems Years

Brendan Gregg grew up in Australia, where he developed an early fascination with computers and operating systems. He studied at the University of Newcastle in New South Wales, earning a degree in applied science. Even as a student, he was drawn to the question that would define his career: when a computer system performs poorly, how do you find out exactly why? The answer, he discovered early, was almost never in the application layer. The real bottlenecks lived in the operating system kernel, in the scheduler, in the memory allocator, in the filesystem and network stack — layers that most developers treated as invisible foundations they never needed to examine.

After working in system administration roles in Australia, Gregg joined Sun Microsystems, the company that had created Solaris, one of the most sophisticated Unix operating systems ever built. At Sun, he became deeply involved with DTrace — a dynamic tracing framework that Bryan Cantrill, Mike Shapiro, and Adam Leventhal had built into Solaris 10. DTrace allowed engineers to insert instrumentation probes into a running kernel and user-space applications without any modification, recompilation, or restart. It was like attaching a high-speed camera to the internals of a running engine while the car was driving at highway speed.

Gregg did not create DTrace, but he arguably did more than anyone else to demonstrate its power. He wrote hundreds of DTrace scripts that could answer specific performance questions — how long does each disk I/O take? Which functions are consuming the most CPU? Where is the application spending time waiting for locks? He published these scripts as the open-source DTraceToolkit, a collection that became the standard reference for Solaris performance analysis. His work showed that DTrace was not merely a debugging curiosity but a fundamental shift in how operating systems could be understood. For engineers working with modern development toolchains, Gregg’s approach demonstrated that visibility into system internals was not optional — it was the foundation of reliable software delivery.

The DTrace Toolkit and Flame Graphs

Making Tracing Accessible

One of Gregg’s most important contributions was making performance tracing accessible to engineers who were not kernel developers. DTrace’s scripting language (the D language) was powerful but had a learning curve. Gregg bridged the gap by creating ready-to-use tools with clear names: iosnoop for disk I/O tracing, execsnoop for tracking new process execution, opensnoop for monitoring file opens, and dozens more. Each tool answered a specific operational question and could be run immediately on a production system without risk. This practical approach — giving engineers tools they could use in the first five minutes of a performance investigation — became a hallmark of Gregg’s philosophy.

His DTraceToolkit was not just a collection of scripts; it was an embodiment of a methodology. Gregg organized performance analysis into systematic checklists. Instead of randomly poking at metrics, an engineer could follow a structured process: check CPU utilization, then CPU saturation, then memory capacity, then storage latency, then network throughput — each step with a specific tool and a specific interpretation. This methodical approach became the foundation of what Gregg later formalized as the USE Method (Utilization, Saturation, Errors), one of the most widely adopted performance analysis methodologies in the industry.

The Invention of Flame Graphs

In 2011, Gregg created what may be his single most recognized contribution: the flame graph. He was analyzing a CPU performance issue and had collected stack trace samples — snapshots of the call stack taken at regular intervals. The problem was that the output was a wall of text: thousands of stack traces, each showing a chain of function calls. Gregg needed a way to see the patterns — which code paths were consuming the most CPU — at a glance. He wrote a Perl script that took stack trace data and generated an SVG visualization where each function was a rectangle, the width represented the proportion of time spent in that function, and the vertical axis showed the call chain. The result was an interactive visualization that looked like flames reaching upward — hence the name.

Flame graphs became a universal standard for performance visualization. They were adopted by Linux perf, Java profilers, Go’s pprof, Node.js, Ruby, Python, and virtually every profiling tool in existence. Netflix, Google, Facebook, and thousands of other companies use flame graphs daily to diagnose production performance issues. The visualization is so effective because it compresses an enormous amount of information — potentially millions of stack samples — into a single image that a human can immediately interpret. Wide plateaus at the top indicate functions where the CPU is spending the most time; narrow spikes indicate rare code paths. An experienced engineer can glance at a flame graph and identify the bottleneck in seconds. Here is a typical workflow for generating a flame graph on a Linux system:

# Record CPU stack traces for 30 seconds at 99 Hz
# using Linux perf (the standard profiling tool)
sudo perf record -F 99 -a -g -- sleep 30

# Convert perf data to folded stack format
sudo perf script | ./stackcollapse-perf.pl > out.folded

# Generate the interactive SVG flame graph
./flamegraph.pl out.folded > cpu_flamegraph.svg

# For off-CPU analysis (time spent waiting/blocked):
# Record scheduler events to find where threads sleep
sudo perf record -e sched:sched_switch -a -g -- sleep 10
sudo perf script | ./stackcollapse-perf.pl > offcpu.folded
./flamegraph.pl --color=io --title="Off-CPU Time" \
    offcpu.folded > offcpu_flamegraph.svg

The flame graph concept was so powerful that Gregg extended it into several variants: off-CPU flame graphs (showing where threads spend time blocked or waiting), hot/cold flame graphs (combining on-CPU and off-CPU data), memory flame graphs (showing allocation patterns), and differential flame graphs (comparing two profiles to show what changed). Each variant applied the same visual principle — proportional width for sampled time — to a different dimension of system behavior. The ability to quickly visualize both where a system is burning CPU and where it is waiting for I/O transformed how engineers at companies of all sizes approach performance analysis.

The Netflix Era: Performance at Hyperscale

In 2010, Gregg joined Netflix as a senior performance architect. Netflix was in the midst of one of the most ambitious infrastructure migrations in tech history — moving from its own data centers to Amazon Web Services (AWS). The challenges were immense: Netflix was streaming video to tens of millions of subscribers, and every millisecond of latency in the content delivery pipeline translated to buffering, degraded video quality, and subscriber churn. Gregg’s job was to ensure that Netflix’s cloud infrastructure performed as well as — or better than — the dedicated hardware it was replacing.

At Netflix, Gregg pushed the boundaries of what was possible with Linux performance observability. He worked with the BPF (Berkeley Packet Filter) and eBPF (extended BPF) technologies that were being developed in the Linux kernel. While DTrace had been revolutionary on Solaris, Linux had lacked an equivalent dynamic tracing capability for years. eBPF changed that: it provided a safe, efficient mechanism for running sandboxed programs inside the Linux kernel, enabling the same kind of deep observability that DTrace had brought to Solaris. Gregg became one of the most prolific contributors to the BPF ecosystem, writing tools, documentation, and educational materials that helped the broader community adopt this technology.

His work at Netflix was not purely tool-building. Gregg developed performance analysis methodologies that Netflix engineers used across the entire stack. He introduced the concept of “performance wins” — systematic investigations that identified and eliminated inefficiencies in production systems. These wins often yielded dramatic results: a single kernel tuning change might reduce CPU utilization across an entire fleet by several percentage points, saving Netflix millions of dollars in cloud costs annually. His analyses covered everything from TCP tuning and filesystem optimization to JVM garbage collection and container resource management. For digital agencies managing complex web infrastructure, Gregg’s methodologies offered a blueprint for systematic performance optimization that could be applied regardless of stack or scale.

BPF and the Modern Observability Revolution

From Packet Filter to Universal Instrumentation

The original Berkeley Packet Filter, created in 1992, was a simple virtual machine for filtering network packets efficiently in the kernel. It was the engine behind tcpdump and similar tools. Extended BPF (eBPF), which began serious development around 2014, transformed this simple packet filter into a general-purpose in-kernel virtual machine capable of tracing any kernel function, any user-space function, any system call, any network event — safely and with minimal overhead. Gregg recognized immediately that eBPF was the technology that would bring DTrace-level observability to the most widely deployed operating system in the world.

He created the BCC (BPF Compiler Collection) toolkit — a Python and C framework for writing BPF programs — and later co-developed bpftrace, a high-level tracing language for Linux that was directly inspired by DTrace’s D language. These tools made it possible for any Linux administrator to run powerful tracing queries against production systems. Need to know which processes are making the most disk reads? Run biosnoop. Want to trace every TCP connection with latency? Run tcpconnlat. Need to see which kernel functions are being called most often? Run funccount. The BCC toolkit contained over 100 tools, each solving a specific observability problem.

Here is an example of using bpftrace — the high-level BPF tracing language — to dynamically trace disk I/O latency in a running production kernel:

#!/usr/bin/env bpftrace
// Trace block I/O latency distribution by device
// This runs as a sandboxed BPF program inside the kernel

BEGIN
{
    printf("Tracing block device I/O... Hit Ctrl-C to end.\n");
}

// Instrument the block I/O request issue point
tracepoint:block:block_rq_issue
{
    @start[args->dev, args->sector] = nsecs;
}

// Instrument the block I/O completion point
tracepoint:block:block_rq_complete
/@start[args->dev, args->sector]/
{
    $latency_us = (nsecs - @start[args->dev, args->sector]) / 1000;
    @usecs = hist($latency_us);
    @avg_latency_us = avg($latency_us);
    @io_count = count();
    delete(@start[args->dev, args->sector]);
}

END
{
    printf("\nI/O Latency Distribution (microseconds):\n");
    print(@usecs);
    printf("\nAverage latency: %d us, Total I/O count: %d\n",
           @avg_latency_us, @io_count);
    clear(@start);
}

This script attaches to kernel tracepoints for block I/O requests. When an I/O is issued, it records the timestamp. When the I/O completes, it calculates the latency and builds a histogram. The entire program runs inside the kernel with negligible overhead, and it can be attached and detached from a live production system without any disruption. This is the power that Gregg spent his career working toward: the ability to ask any question about a running system and get an immediate, precise answer.

The Observability Stack

Gregg’s BPF work contributed to a broader revolution in system observability. Traditional monitoring — collecting metrics like CPU utilization, memory usage, and request rates at regular intervals — could tell you that something was wrong but often could not tell you why. Tracing, by contrast, could capture the exact sequence of events leading to a problem. Gregg advocated for a layered observability approach: metrics for detection (is there a problem?), tracing for diagnosis (what is causing it?), and logging for context (what happened before and after?). This framework influenced the design of modern observability platforms and the broader shift from monitoring to observability in the DevOps community.

The impact of eBPF extended far beyond performance analysis. Projects like Cilium used eBPF for high-performance container networking in Kubernetes environments. Falco used it for runtime security monitoring. Pixie used it for automatic application telemetry. The technology that Gregg helped popularize became a foundational building block of the modern cloud-native infrastructure stack.

Books and Educational Legacy

Gregg is the author of three books that have become essential references for anyone working in systems performance. His first book, Systems Performance: Enterprise and the Cloud (Prentice Hall, 2013; second edition 2020), is widely regarded as the definitive text on operating system and application performance analysis. It covers CPU, memory, filesystem, disk, network, and cloud performance in exhaustive detail, with practical methodologies that readers can apply immediately. The book introduced the USE Method and the TSA (Thread State Analysis) method to a wide audience and became required reading at companies like Netflix, Google, and Facebook.

His second book, BPF Performance Tools (Addison-Wesley, 2019), is the comprehensive guide to using BPF for Linux observability. At over 800 pages, it covers more than 150 BPF-based tools and explains how to write custom BPF programs for any tracing need. The book made eBPF accessible to a generation of Linux engineers who had previously relied on basic tools like top, vmstat, and iostat. His third book, Systems Performance second edition, updated the original work with extensive BPF coverage, container performance analysis, and cloud-specific optimization techniques.

Beyond books, Gregg’s blog posts, conference talks, and technical articles have educated hundreds of thousands of engineers. His blog at brendangregg.com is one of the most widely cited resources in the performance engineering community. His talks at conferences like USENIX, LISA, and various Linux events consistently draw large audiences, and his ability to explain complex kernel internals in accessible terms has made him one of the most effective technical communicators in the field.

The Intel Fellowship and Current Work

In 2022, Gregg joined Intel as an Intel Fellow — one of the company’s highest technical honors, reserved for engineers who have made exceptional contributions to the field. At Intel, he continues his work on performance analysis and observability, now with a focus on hardware-software co-optimization. His position allows him to influence how future processors are designed to support observability and performance analysis, closing the loop between the hardware that executes instructions and the software tools that make execution patterns visible.

At Intel, Gregg has focused on bridging the gap between hardware performance counters and software profiling. Modern processors contain hundreds of performance monitoring events — cache misses, branch mispredictions, instruction retirement rates, memory bandwidth utilization — that can reveal bottlenecks invisible to software-only tools. Gregg’s work integrates these hardware insights with the BPF-based software tracing he developed over the previous decade, creating a unified observability picture that spans from the CPU microarchitecture to the application layer. Teams using Taskee for engineering project management benefit from the cultural shift that Gregg catalyzed: treating performance engineering not as a crisis response but as a continuous practice integrated into every sprint.

The USE Method and Performance Methodologies

One of Gregg’s most influential contributions is the USE Method — a systematic approach to performance analysis that checks every resource for Utilization, Saturation, and Errors. The method provides a structured checklist: for each system resource (CPU, memory, network interfaces, storage devices, controllers, interconnects), measure its utilization (how busy is it as a percentage of capacity?), its saturation (is work queuing because the resource is fully utilized?), and its error count (are there any errors?). This simple framework prevents the most common mistake in performance analysis: jumping to conclusions based on a single metric while ignoring other resources that may be the actual bottleneck.

Gregg also developed complementary methodologies including the TSA Method (analyzing thread states — running, runnable, sleeping, waiting on I/O — to identify where application threads spend their time), the RED Method (Rate, Errors, Duration for request-driven services), and various anti-methods that describe common but ineffective approaches to performance analysis (the “streetlight” anti-method, where engineers look only where the light is good rather than where the problem actually is). These methodologies gave the industry a shared vocabulary for discussing performance problems and a systematic framework for solving them.

The practical impact of these methodologies cannot be overstated. Before Gregg’s work, performance analysis was often ad hoc — engineers would check whatever tools they were familiar with, draw conclusions based on incomplete data, and frequently misdiagnose problems. The USE Method transformed performance analysis into a reproducible engineering discipline. It is now taught in university courses, included in certification programs, and used as a standard operating procedure at companies managing large-scale cloud infrastructure.

Philosophy and Legacy

Brendan Gregg’s career embodies a single, powerful idea: you cannot fix what you cannot see. Operating systems are among the most complex software artifacts ever built — millions of lines of code managing hardware resources, scheduling processes, handling interrupts, and serving application requests. For most of computing history, this complexity was hidden behind a handful of crude metrics. Gregg dedicated his career to making that complexity transparent, giving engineers the tools and methodologies to look inside running systems and understand exactly what is happening at every layer.

His influence extends beyond the specific tools he created. Gregg changed the culture of systems engineering by demonstrating that performance analysis is not a dark art practiced by a few wizards but a systematic discipline that any engineer can learn. His open-source tools lowered the barrier to entry. His books provided the educational foundation. His methodologies gave structure to what had been an ad hoc practice. And his flame graphs gave every engineer — regardless of specialization — a universal visual language for understanding where time is spent in a computer system.

The thread connecting DTrace scripts on Solaris, flame graphs that visualize millions of stack samples, eBPF programs that trace running Linux kernels, and hardware performance counter analysis at Intel is Gregg’s unwavering belief that observability is the foundation of engineering excellence. You cannot optimize what you cannot measure. You cannot debug what you cannot trace. You cannot build reliable systems if the systems themselves are opaque. Brendan Gregg made them transparent, and modern performance engineering exists in its current form because of his work.

Frequently Asked Questions

What is Brendan Gregg best known for?

Brendan Gregg is best known for creating flame graphs — a visualization technique for profiling data that has become a universal standard in software performance analysis. He is also recognized for his extensive work on DTrace and BPF/eBPF tooling, his performance analysis methodologies (particularly the USE Method), and his authoritative books on systems performance. His career spans Sun Microsystems, Netflix, and Intel, where he has consistently advanced the field of operating system observability.

What are flame graphs and why are they important?

Flame graphs are interactive SVG visualizations that display profiling data — typically sampled stack traces — as layered rectangles where width represents the proportion of time spent in each function. They allow engineers to quickly identify performance bottlenecks by visually highlighting which code paths consume the most resources. Before flame graphs, analyzing stack trace data required reading through thousands of text-based samples. The visualization compresses this data into a single image that can be interpreted at a glance, making performance analysis dramatically faster and more accessible.

What is the USE Method?

The USE Method is a performance analysis methodology created by Brendan Gregg that provides a systematic checklist for identifying bottlenecks. For every system resource (CPU, memory, network, storage), you check three things: Utilization (percentage of capacity being used), Saturation (degree to which work is queuing), and Errors (count of error events). This simple framework ensures that engineers examine all resources systematically rather than focusing only on the metrics they happen to know, preventing common misdiagnoses in performance investigations.

What is eBPF and how did Brendan Gregg contribute to it?

eBPF (extended Berkeley Packet Filter) is a technology in the Linux kernel that allows sandboxed programs to run inside the kernel without modifying kernel source code or loading kernel modules. Gregg was one of the most influential figures in the eBPF ecosystem: he created the BCC (BPF Compiler Collection) toolkit with over 100 observability tools, co-developed bpftrace (a high-level tracing language for Linux), and wrote the definitive book on BPF performance tools. His work made eBPF-based observability accessible to the broader Linux engineering community.

How did Brendan Gregg’s work at Netflix influence cloud performance engineering?

At Netflix, Gregg served as senior performance architect during the company’s migration from data centers to AWS. He developed performance analysis methodologies used across Netflix’s entire infrastructure, identified optimization opportunities that saved millions in cloud costs, and pioneered the use of BPF-based tools for cloud performance analysis. His work demonstrated that systematic performance engineering could yield dramatic cost and reliability improvements at cloud scale, establishing practices that influenced the broader industry’s approach to cloud infrastructure optimization.

What books has Brendan Gregg written?

Gregg has written three major books: Systems Performance: Enterprise and the Cloud (first edition 2013, second edition 2020), which is considered the definitive reference on operating system and application performance analysis; BPF Performance Tools (2019), a comprehensive 800+ page guide to Linux observability using BPF; and co-authored DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X, and FreeBSD (2011). All three are widely regarded as essential references for systems engineers and performance analysts.

What is Brendan Gregg’s role at Intel?

Gregg joined Intel in 2022 as an Intel Fellow — one of the company’s highest technical distinctions. In this role, he focuses on hardware-software co-optimization for performance analysis, bridging the gap between CPU-level performance monitoring capabilities and the software observability tools he spent his career developing. His work influences how future processors support observability features, aiming to create unified performance analysis that spans from microarchitecture to application code.