Building a Multi-Agent System for Intelligent Ad Delivery: A Step-by-Step Guide

By

Overview

Modern advertising platforms face a structural challenge: balancing relevance, budget efficiency, and user experience across thousands of campaigns in real time. A single monolithic AI model often struggles to handle conflicting objectives and dynamic constraints. At Spotify Engineering, we encountered exactly this problem when scaling our ad system. Instead of shipping a one-size-fits-all AI feature, we rearchitected our ad decision engine around multi-agent systems—a collection of specialized AI agents that collaborate and negotiate to deliver smarter advertising outcomes. This tutorial walks you through the core concepts, prerequisites, and implementation steps for building such a system, complete with code examples and common pitfalls to avoid.

Building a Multi-Agent System for Intelligent Ad Delivery: A Step-by-Step Guide
Source: engineering.atspotify.com

Prerequisites

Knowledge

  • Familiarity with Python and basic machine learning concepts
  • Understanding of reinforcement learning (RL) fundamentals (value functions, policies)
  • Basic knowledge of message-passing or event-driven architectures

Tools & Libraries

  • Python 3.8+
  • gym (for RL environment simulation)
  • ray (for distributed agent execution)
  • redis or rabbitmq (for inter-agent communication)
  • tensorflow / pytorch (agent policy networks)

Infrastructure

  • At least one server or cloud instance (for production, a Kubernetes cluster is recommended)
  • Streaming data pipeline (e.g., Kafka) for ad events

Step-by-Step Instructions

1. Define Agent Roles and Objectives

In a multi-agent advertising system, each agent specialises in a subproblem. Typical roles include:

  • Bidding Agent – decides optimal bid price per auction
  • Budget Allocation Agent – distributes daily budget across campaigns
  • Creative Selection Agent – picks the best ad creative for a user context
  • Pacing Agent – modulates spend velocity to avoid early budget exhaustion

For each agent, define a clear reward function. For example, the Bidding Agent’s reward might be (conversion_value - bid_cost) while the Pacing Agent’s reward is a penalty for exceeding hourly spend limits.

# Example: BiddingAgent reward function
def bidding_reward(conversion, cost):
    return conversion * value_per_conversion - cost

2. Design Inter-Agent Communication

Agents cannot operate in isolation. They must share state (e.g., remaining budget, current bid landscape) and negotiate decisions. We used a blackboard architecture where agents write and read from a shared Redis store.

  • Use redis-py for simple key-value writes.
  • For complex coordination, implement a contract-net protocol: an agent broadcasts a task, others bid on it, and the best bidder executes.
import redis

r = redis.Redis()
# BiddingAgent writes its current bid price
r.set('current_bid', 0.45)

# PacingAgent reads the bid to adjust pace
current_bid = float(r.get('current_bid'))

3. Train Agents with Multi-Agent Reinforcement Learning

We used a centralised training, decentralised execution (CTDE) framework. Each agent learns its own policy using PPO (Proximal Policy Optimization). The environment is a custom ad auction simulator.

import gym
from ray.rllib.agents.ppo import PPOTrainer

# Register custom ad environment
gym.register('AdAuction-v0', entry_point='ad_env:AdAuctionEnv')

trainer = PPOTrainer(
    config={
        'multiagent': {
            'policies': {
                'bidder': (None, obs_space, act_space, {}),
                'pacer': (None, obs_space, act_space, {}),
            },
            'policy_mapping_fn': lambda agent_id: agent_id,
        },
        'num_workers': 4,
    },
    env='AdAuction-v0'
)

for i in range(100):
    result = trainer.train()
    print(f'Iteration {i}: reward={result["episode_reward_mean"]}')

4. Deploy Agents as Microservices

Each agent runs in its own container or separate process, communicating via gRPC or message queues. Use Ray Serve or FastAPI to expose inference endpoints.

Building a Multi-Agent System for Intelligent Ad Delivery: A Step-by-Step Guide
Source: engineering.atspotify.com
# BiddingAgent service (FastAPI)
from fastapi import FastAPI
import uvicorn

app = FastAPI()

@app.post('/bid')
def get_bid(user_context: dict, campaign_id: str):
    # load model and compute bid
    bid = model.predict(user_context)
    return {'bid_price': bid}

if __name__ == '__main__':
    uvicorn.run(app, host='0.0.0.0', port=8000)

5. Implement Monitoring and Feedback Loops

Agents must continuously learn from live data. Stream ad outcomes (impressions, clicks, conversions) into a Kafka topic, and have a feedback agent update each agent’s experience buffer. Then periodically retrain models.

  • Use Prometheus for metrics (e.g., average bid price, budget utilization).
  • Log all agent decisions with unique IDs for debugging.

Common Mistakes

Mistake 1: No Shared State Coordination

Agents that don’t share budget or bid information will overshoot or underspend. Always implement a centralised state store (Redis) and enforce read-after-write consistency.

Mistake 2: Training Agents in Isolation

If agents are trained separately with static behavior of others, they won’t learn to cooperate. Use multi-agent training with concurrent environment steps.

Mistake 3: Overly Complex Communication Protocols

Don’t build a full-fledged negotiation system on day one. Start with simple broadcast+subscribe; add complexity only after validating core behavior.

Mistake 4: Neglecting Latency and Througput

Advertising decisions must be made in milliseconds. Cache frequent inferences, use async I/O, and precompute features.

Summary

Building a multi-agent architecture for advertising solves the structural problem of competing objectives by decoupling decision-making into specialized, collaborative agents. This guide covered the essential steps: defining agent roles, designing communication, training with multi-agent RL, deploying as microservices, and monitoring. By avoiding common pitfalls (isolated training, lack of coordination, excessive complexity), you can create a system that dynamically adapts to user behavior and campaign goals—just as we did at Spotify. The result is smarter, more efficient ad delivery without sacrificing user experience.

Tags:

Related Articles

Recommended

Discover More

MegaETH's First MEGA Buyback: What It Means for the EcosystemJetStream 3.0: 5 Game-Changing Updates Every Web Developer Should KnowWeekly Cyber Threat Digest: April 27th – Major Breaches, AI Exploits, and Critical PatchesHow to Claim Your Right to See Who Viewed Your LinkedIn Profile (Under GDPR)From Pixels to Frames: Mastering Diffusion Models for Video Generation