I Built an AI Agent That Runs My Google Ads — Here's What I Learned

Everyone's talking about AI replacing marketers. I decided to actually build it.

For the past few months, I've been developing ScaleSearch — a multi-agent system that analyzes PPC campaigns, detects anomalies, and generates recommendations. Not a chatbot that answers questions about Google Ads. An agent that does Google Ads work.

Here's what I've learned building AI that touches real ad spend.

The Hype vs. The Reality

The AI marketing hype goes something like this: "Just connect your ad account and let AI optimize everything!"

The reality is messier.

AI agents are genuinely powerful at:

Pattern recognition — Catching anomalies humans miss (spend spikes, CTR drops, disappeared campaigns)
Data synthesis — Turning 50 CSV columns into actionable insights
Speed — Analyzing what would take a human hours in seconds

But they're terrible at:

Context — Understanding why you're running a brand campaign at a loss
Strategy — Knowing that Q4 budgets are different from Q1
Client relationships — Explaining to stakeholders why you paused their favorite keyword

The unlock isn't replacing humans. It's human-in-the-loop automation — agents that do the grunt work and surface decisions for humans to make.

The Architecture That Actually Works

After several false starts, here's the pattern that stuck:

┌─────────────────────────────────────────┐
│           Director Agent                │
│   (Coordinates analysis, sets goals)    │
└────────────────┬────────────────────────┘
                 │
    ┌────────────┼────────────┐
    ▼            ▼            ▼
┌────────┐  ┌────────┐  ┌────────┐
│ Data   │  │Analysis│  │Account │
│ Team   │  │ Team   │  │ Team   │
└────────┘  └────────┘  └────────┘

Director Agent: The "demanding boss" that coordinates everything. It asks for anomaly reports, budget analyses, and competitor insights — then synthesizes them into recommendations.

Specialized Teams: Each team has focused tools. Data team pulls reports. Analysis team runs statistical tests. Account team knows campaign structures.

The key insight: Agents work better when they have narrow responsibilities. A single "do everything" agent gets confused. Specialized agents with clear handoffs actually ship work.

Three Things That Surprised Me

1. Prompts Are Product Decisions

I spent more time on system prompts than I expected. How you frame an agent's role determines its output quality dramatically.

A prompt like "Analyze this PPC data" gives generic output.

A prompt like "You are a demanding client delivery director who holds analysts accountable for actionable insights. You reject vague recommendations and push for specific, data-backed actions" — that gives you something you can actually use.

The prompt is the product spec.

2. Statistical Analysis Beats LLM Intuition

For anomaly detection, I tried having the LLM "look" at the data and spot outliers. Terrible results.

What works: Traditional statistics (z-scores, IQR) to detect anomalies, then LLM to explain them.

The LLM is great at turning "Campaign X has a z-score of 3.2 on CPC" into "Your competitor likely entered an auction you previously dominated — consider reviewing your bid strategy."

Pattern recognition + natural language explanation = useful output.

3. Memory Is the Hard Part

Agents without memory are just fancy calculators. They can't learn that your client hates automated recommendations, or that Q4 always has higher CPCs.

I'm using a combination of:

Long-term memory (Pinecone) for institutional knowledge
Conversation state (SQLite) for multi-turn interactions
Context files (Markdown) for domain-specific rules

This is still the roughest edge. The agents work, but they don't learn well yet.

What I'd Tell Past Me

Start with the output, not the architecture. What does a useful recommendation look like? Work backward from there.
Humans in the loop aren't a crutch — they're the feature. The goal isn't full automation. It's augmented decision-making.
Ship ugly versions. My first ScaleSearch prototype was embarrassing. It also taught me what actually mattered.
Read the boring stuff. Statistical process control, anomaly detection theory, decision science. The AI is just the interface to these older ideas.

Where This Is Going

I'm building toward a world where PPC managers spend 80% less time in spreadsheets and 80% more time on strategy and client relationships.

Not because AI replaces the thinking, but because it handles the tedium.

ScaleSearch isn't done. But it's running real analyses on real accounts, and the output is genuinely useful. That's the milestone that matters.

If you're building AI agents for any domain, my advice: pick a specific workflow, automate the hell out of it, and keep humans in the loop. The magic is in the mundane work, done reliably.

I'm building in public at skylarmartinez.net. Follow along for more on AI agents, PPC automation, and whatever I'm tinkering with next.