Our AI Agent Reviews Every PR Before We Do — Here's What It Catches

Adam Harris Feb 18, 2026 10 min read

Automated code review showing AI-generated comments on a pull request with severity indicators

Every night at 10 PM Pacific, while our team sleeps, an AI agent clones nine application repositories, reads every file, leaves line-level review comments on the issues it finds, pushes fixes, and merges the PRs — all before anyone opens their laptop in the morning. This isn't a demo. It's been running in production for weeks, and it's caught bugs that would have shipped to users.

This is the story of how we built Alpha Agent's automated code review pipeline, what it actually catches, and why we think every engineering team — especially small ones without dedicated reviewers — should be running something like this.

The Problem: Code Review at a Small Agency

Last Rev is a digital agency. We ship web applications for clients across industries — travel, e-commerce, media. Our engineering team is lean. We don't have the luxury of a dedicated code reviewer or a QA department that catches regressions before they hit production.

The reality of small-team code review looks like this:

PRs sit for hours (or days) waiting for someone to context-switch into review mode
Reviews are often cursory — a quick skim of the diff, an "LGTM," and a merge
Consistency across repos drifts because nobody has the entire codebase loaded in their head
Accessibility, mobile responsiveness, and security hygiene get checked sporadically at best
Pre-existing issues accumulate because reviews only look at diffs, not the full codebase

We needed a system that reviews everything, every day, without getting tired or cutting corners.

The Architecture: Nightly Full-Codebase Audits

Alpha Agent's nightly review isn't a linter or a static analysis tool. It's an AI agent that reads and understands the code, cross-references it against our shared component library and architectural standards, then makes judgment calls about what's actually wrong versus what's just unconventional.

Here's the cycle that runs every night across all nine of our app repos^[1]:

Load domain knowledge — The agent reads our web development conventions, shared component specs, and a learnings file that accumulates patterns from every previous review
Full codebase audit — Every HTML page, every JavaScript module, every config file. Not just the diff — the entire app
Sync & create PR — The current app state is pushed to a review branch before any fixes, so review comments land on the actual code
Two-pass review — First pass identifies issues; second pass re-reads the code to verify each finding and drop false positives
Post review comments — Line-specific comments on the PR with REQUEST_CHANGES status, blocking merge until resolved
Fix everything — The agent fixes every issue it flagged — bugs, suggestions, nits. Nothing gets deferred
Resolve & merge — Each comment gets a reply explaining the fix, then the PR is approved and merged

The entire cycle takes 5-15 minutes per repo. By morning, every app has been audited, fixed, and the PR history tells the full story.

What the Review Actually Checks

The audit isn't a generic "does this code look okay?" pass. It's structured around specific categories that map to our real engineering standards:

Shared Component Compliance

We maintain a library of shared web components (<cc-toast>, <cc-app-nav>, <cc-auth>, <cc-empty-state>, <cc-modal>, etc.) loaded from a shared CDN. The review verifies every page uses them correctly — right attributes, right load order, no inline CSS duplicating what the theme already provides.

DEC Pattern Compliance

Each shared component has a DEC (Design-Engineering Contract) spec that documents correct usage and common mistakes. The agent reads the spec and cross-references it against every usage in the app. This is how it catches attribute mismatches and event name bugs that would be invisible to a linter.

Security & Data Hygiene

API keys in client-side meta tags, missing auth gates on pages with Supabase data, unescaped dynamic attribute values — the review checks for real security issues, not theoretical ones.

Accessibility & Mobile

Touch targets ≥44px, proper ARIA labels, color contrast, responsive layouts at 320px/768px/1024px breakpoints. Every page, every review.

Supabase Setup Validation

Every app that persists data must use our Supabase setup correctly — meta tags present, shared client loaded (not a local copy), no localStorage for data that should be in the database, RLS policies enabled, error handling on every query with user-visible feedback via <cc-toast>.

Real Bugs Caught: Case Studies from This Week

This isn't theoretical. Here are real issues Alpha Agent caught and fixed in the last 48 hours, pulled directly from merged PR review comments.

Case 1: Completely Non-Functional Sort Filter (Sales App)

● Issue: cc-pill-dropdown fires dropdown-change, not pill-change. Sort filter was completely non-functional.

This is the kind of bug that's invisible in manual testing unless someone specifically tests the sort feature. The dropdown rendered fine. It looked interactive. But clicking a sort option did absolutely nothing because the event listener was bound to the wrong event name. The agent caught it by cross-referencing the component's DEC spec against the actual event listener in the code.^[2]

The fix: Split into separate pill-change and dropdown-change listeners. One commit, one line, one bug that would have confused every user who tried to sort.

Case 2: Silent Auth Failures Across Multiple Apps

● Issue: <cc-auth> present but no Supabase meta tags. Auth gate silently fails.

This pattern showed up in three apps in the same night — Sales, Standup, and Lighthouse. Pages had the <cc-auth> component to gate access, but were missing the Supabase configuration meta tags. The auth component rendered, tried to initialize, silently failed, and let everyone through. The door was there; it just wasn't locked.^[3]

The fix: Added Supabase meta tags and supabase-client.js to every affected page. Three apps, same night, same systemic issue caught by a review that reads every file.

Case 3: Stale Local Data Files Instead of Supabase (Multiple Apps)

● Issue: cc-prompts was using src=data/prompts.json instead of app=lighthouse. Missing Supabase meta tags and supabase-client.js.

Several apps had prompts components pointing to local JSON files instead of the Supabase-backed data source. This meant prompts couldn't be updated without a code deploy, and the pages were missing their database connection entirely. The same pattern appeared in Cringe Rizzler, Lighthouse, and Sentiment apps — all caught and fixed in the same nightly cycle.^[4]

The fix: Switched to the app attribute for Supabase-backed data, added missing meta tags, removed stale local JSON files.

Case 4: Dead Code and Phantom Dependencies (Sales App)

● Issue: sql-sync.js and db.js loaded but never used — cc-leads fetches JSON directly. Dead code.

The Sales app was loading two database modules that nothing in the app actually called. They were left over from a refactor. A human reviewer looking at the diff wouldn't catch this — it's pre-existing code, not a new change. But the full-codebase audit reads everything and asks: "Is this file actually used?"^[2]

The fix: Removed both scripts. Deleted db.js entirely.

Case 5: Missing Toast Components

● Issue: landing.html — Missing <cc-toast>. All pages need the toast component for user feedback.

A small omission with real UX consequences. Without the toast component, any error or success message from API calls would silently fail to render. The user would click a button and see… nothing. The agent checks every page against the required component list and flags any gaps.^[5]

Case 6: Unescaped Dynamic Attributes (Sales App)

● Suggestion: Filter/sort JSON attributes not escaped with escAttr().

Dynamic values being injected into HTML attributes without escaping — a classic XSS vector. The agent flagged it as a suggestion rather than a critical bug (the values came from controlled data, not user input), but still applied the fix because the nightly review doesn't defer anything.^[2]

The Two-Pass Review: Why False Positives Kill Trust

The most important design decision in our code review system is the two-pass approach. Here's why.

AI code reviewers have a reputation problem: they flag too many things that aren't actually issues. "Unsafe cast" when there's a type guard three lines up. "Injection risk" on a value that comes from internal config, not user input. "Missing null check" when the caller guarantees non-null. After a few rounds of false positives, developers start ignoring the reviews entirely.

Our two-pass system addresses this head-on:

Pass 1 (Analysis): Read everything, draft findings liberally. Cast a wide net. It's okay to be wrong here — this is an internal working draft.

Pass 2 (Verification): Re-read the actual code for every flagged issue. For each one, ask:

"Is this ACTUALLY a bug, or did I misread the code?"
"Is there a guard or check elsewhere that handles this?"
"Am I sure about the context — is this user input or internal code?"
"Would this actually break in practice, or is it theoretical?"

Anything the agent isn't confident about gets dropped. It's better to miss a minor issue than to flag something that isn't real. This principle is baked into the skill definition, and it's why our developers actually read and trust the review comments.

The Code Review Skill: On-Demand for Team PRs

The nightly review handles our app portfolio automatically. But we also have an on-demand code review skill that any team member can invoke on any PR — their own or someone else's.

The skill follows the same two-pass methodology:

Fetch PR metadata and the full diff via GitHub CLI
Analyze and categorize findings: ✓ What's done well, ● Confirmed issues, ● Architecture notes, ○ Nits
Post a structured general review comment
Re-read every finding against the actual code
Only then post line-specific comments on verified issues

The review scales with PR size. Small PRs (<200 lines) get meticulous attention to detail. Medium PRs (200-1000 lines) focus on logic changes with lighter treatment of test updates. Large PRs (1000+ lines) prioritize architecture, breaking changes, and new code over modifications.

Critically, the review always starts by acknowledging what's done well. Good architecture decisions, solid test coverage, clean abstractions — these get called out explicitly. Code review shouldn't just be a list of complaints. People learn from positive feedback, and it keeps the review relationship healthy even when there are real issues to flag.

The Numbers

Metric	Value
Repos reviewed nightly	9 (accounts, cc-leads, cringe-rizzler, generations, lighthouse, sales, sentiment, standup, superstars, uptime)
Total PRs merged autonomously	100+ since inception
Average review cycle time	5-15 minutes per repo
Review categories checked	7 (shared components, DEC compliance, DRY, code quality, page structure, Supabase setup, accessibility/mobile)
Fix rate	100% — nothing is deferred
False positive rate	Near-zero (two-pass verification)

The Compound Effect: Learnings That Accumulate

One of the most powerful features is the shared learnings file. Every time a nightly review discovers a new pattern — good or bad — it's appended to a persistent memory file that every future review reads before starting.

This means:

A bug pattern found in the Sales app gets checked in all nine repos the next night
Anti-patterns are caught earlier as the knowledge base grows
New shared components get adopted faster because the review knows to recommend them
The review literally gets better every day

After weeks of nightly reviews, the learnings file is a living document of our team's accumulated engineering standards — written not by a human who might forget to update the wiki, but by an agent that learns from every codebase it touches.

What This Means for Your Team

If you're an engineering manager at a small-to-mid-size team, here's the honest assessment:

You don't need dedicated reviewers to have world-class code review. You need a system that:

Reviews the full codebase, not just diffs
Runs every day, not when someone has time
Checks against documented standards, not vibes
Fixes what it finds instead of creating tickets
Builds a knowledge base that compounds over time
Maintains trust by aggressively eliminating false positives

The nightly review has fundamentally changed our relationship with code quality. Issues that used to accumulate for weeks — stale imports, missing components, broken event handlers, security misconfigurations — now get caught and fixed within 24 hours of introduction.

We're not replacing human code review. Our team still reviews complex architecture decisions, discusses trade-offs, and debates approaches. But the mechanical stuff — consistency checks, component compliance, security hygiene, accessibility basics — that's Alpha Agent's job now. And it does it better than we ever did manually, because it never gets tired, never skims, and never says "LGTM" when it isn't.

Footnotes

^[1] As of February 2026, the nightly review covers: ah-accounts, ah-cc-leads, ah-cringe-rizzler, ah-generations, ah-lighthouse, ah-sales, ah-sentiment, ah-standup, ah-superstars, and ah-uptime — all under the last-rev-llc GitHub org.

^[2] Source: PR review comments on last-rev-llc/ah-sales#4, merged 2026-02-17.

^[3] Source: PR reviews on ah-standup#4, ah-lighthouse#5, and ah-sales#4, all merged 2026-02-17/18.

^[4] Source: PR reviews on ah-cringe-rizzler#10, ah-lighthouse#5, and ah-sentiment#4.

^[5] Source: PR review comment on ah-cringe-rizzler#10, merged 2026-02-18.