How to Optimize Go Performance with Stack Allocation for Slices

By

Introduction

Heap allocations are a major source of slowdown in Go programs. Each allocation requires a call to the memory allocator, and the garbage collector must later clean up the mess. But there's a simple trick to avoid many of these allocations: stack-allocate slices of constant size. Stacks are cheap—sometimes free—and they impose zero load on the garbage collector. This guide will walk you through identifying hotspots, pre-allocating slices, and verifying that your optimizations actually work on the stack.

How to Optimize Go Performance with Stack Allocation for Slices
Source: blog.golang.org

What You Need

  • Go 1.19 or later (stack allocation improvements are more aggressive in recent versions)
  • A basic understanding of slices and append
  • Familiarity with benchmarking using go test -bench
  • Ability to run escape analysis with -gcflags='-m'

Step-by-Step Guide

Step 1: Identify Hot Loops That Grow Slices

Look for loops that repeatedly append to a slice without a pre-allocated capacity. These are prime candidates for heap allocations. Example:

func process(c chan task) {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

In this code, tasks starts with nil, so every time the backing array fills up, the runtime must allocate a new, larger array on the heap and copy the old contents over. During the first few iterations you get many small allocations—each one creating garbage and stressing the GC.

Step 2: Determine Maximum Slices Size

If you know the maximum number of elements the slice will ever hold (at the point of allocation), you can pre-allocate the backing array once. For example, if you know that c will never send more than 100 tasks, you can allocate exactly that size. If the size is a compile-time constant, the Go compiler may place the backing array on the stack instead of the heap.

Tip: You can often estimate the maximum from the problem domain—reading a fixed number of input lines, processing a known number of work items, etc. When in doubt, measure the median length in your production environment.

Step 3: Pre-Allocate the Slice with make

Replace the initial var tasks []task with a pre-allocated slice:

func process(c chan task) {
    tasks := make([]task, 0, 100)  // capacity 100
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

Now, instead of starting with size 1 and doubling repeatedly, the slice has room for 100 elements right away. All append operations within that loop will simply place the new element in the existing backing array—no further allocation needed. The backing array is allocated once by make.

When is the allocation on the stack? If the capacity is a constant small enough that the Go compiler decides it can be stack-allocated (typically < 64 KB or 1 MB depending on the platform), and if the slice does not escape to the heap (e.g., it is not returned or stored in a global variable), then the backing array lives on the stack. For many hot loops, this condition holds.

Step 4: Verify Escape Analysis

To ensure your pre-allocated slice stays on the stack, run the escape analysis:

go build -gcflags='-m -m' 2>&1 | grep escape

Look for lines like:

./main.go:10:6: make([]task, 0, 100) does not escape

If you see “escapes to heap”, something prevents stack allocation. Common causes:

  • The slice is returned from the function or stored in a global variable.
  • The capacity is very large (the compiler decides stack is too small).
  • The slice is passed to a function that the compiler cannot inline.

If the slice does escape, reconsider your design—perhaps you can process the data without returning the whole slice, or copy the results elsewhere on the heap only once.

Step 5: Benchmark the Difference

Write a simple benchmark to quantify the performance gain:

func BenchmarkProcess(b *testing.B) {
    c := make(chan task, 1000)
    go func() {
        for i := 0; i < 1000; i++ {
            c <- task{...}
        }
        close(c)
    }()
    for i := 0; i < b.N; i++ {
        process(c)
    }
}

Run with:

go test -bench=BenchmarkProcess -benchmem

Compare the number of allocations per operation and the time per operation. The pre-allocated version should show drastically fewer allocations (often zero after the initial one) and lower latency.

Tips for Success

  • Use go test -benchmem – it shows the number of allocations per operation. Your goal is to reduce that to near zero.
  • Keep capacities constant – only constant sizes are candidates for stack allocation. If the capacity is a variable, the compiler will almost always allocate on the heap.
  • Watch out for inlining – sometimes inlining helps the compiler see that the slice does not escape. Use -gcflags='-m' to check.
  • Don’t over-allocate – stack space is limited. A huge constant (e.g., 10 million) will not be stack-allocated and may cause a stack overflow. Stick to sizes that are small enough for your platform (typically a few tens of kilobytes).
  • Prefer slice reuse – if you cannot pre-allocate a constant size, consider reusing a slice from a pool (e.g., sync.Pool) to reduce allocation overhead.
  • Combine with other optimizations – stack allocation works best when the function is small, hot, and doesn't leak references. Profile your program to find the hottest paths first.

By following these steps, you can turn a wasteful heap‑allocation pattern into a clean stack‑based one. The result: faster code, less pressure on the garbage collector, and happier users.

Tags:

Related Articles

Recommended

Discover More

GitHub Patches Critical Remote Code Execution Vulnerability in Git Push Pipeline – No Exploitation DetectedWolfhound: An 8-Bit Fusion of Classic Shooters and Metroidvania ExplorationThe Anthropic-Google Cloud Deal: What It Means for AI and Alphabet InvestorsThe Founder's Trust Stack: A Step-by-Step Guide to Monetizing Attention Without Losing CredibilityHow to Become a Published Co-Author Through NASA Citizen Science