Before you read: Garry Tan president of Y Combinator shipped 10,000 lines of code and 100 PRs in a single week using one of the plugins in this list. That number should recalibrate your expectations for what Claude Code + plugins can actually do. This article breaks down the 5 plugins behind that kind of output, with enough technical depth to get you from zero to productive.

I've been using Claude Code daily for months now. And here's the honest truth: base Claude Code is already very good. It handles refactors, writes tests, catches bugs I would have missed, and explains gnarly legacy code in plain English. But there's a ceiling you hit when you're working on anything beyond a small-to-medium codebase.

You start noticing it in the token burn. You notice it when Claude reviews a PR but misses a function three files away that the change clearly breaks. You notice it when the AI generates perfectly correct code but skips every testing discipline your team actually follows. Base Claude Code is a great developer but an unsupervised one. These 5 plugins are what turn it into a disciplined, context-aware engineering team.

Let me walk through each one the way I'd explain it to a colleague what it actually does, how it works under the hood, and when it's genuinely worth the setup cost.

1 gstack Your Entire Startup Team as Slash Commands

gstack by Garry Tan (YC President)
⭐ 10,000+ stars in 48 hours πŸ“¦ 23 skills, MIT license github.com/garrytan/gstack Install: claude mcp add gstack

When Garry Tan dropped gstack on GitHub, it hit 10,000 stars in under 48 hours. That's not just hype it's recognition that he'd solved a real problem: the solo developer who needs the quality bar of a whole team but doesn't have one.

gstack ships 23 slash-command skills that map to nine specialist roles across the entire software development lifecycle. Think of it as hiring a virtual team that lives inside your terminal:

  • /plan-ceo-review a CEO/founder lens that challenges your plan's assumptions, asks the hard business questions, and pushes back on scope creep before a line is written
  • /plan-eng-review an engineering manager review that locks in execution strategy, questions architectural trade-offs, and identifies risk
  • /plan-design-review a designer's eye on your plan: visual consistency, spacing, component reuse, AI-generated slop detection
  • /review a pre-land PR review that checks your diff against the base branch for bugs, security issues, and logic errors
  • /ship detects the base branch, merges it, runs tests, and opens the PR one command end to end
  • /qa and /qa-only systematic QA in a real browser, with bug fixing included in /qa, report-only in /qa-only
  • /cso Chief Security Officer mode: OWASP top 10 audit, STRIDE threat modelling, secrets scanning
  • /retro weekly engineering retrospective built from your commit history
  • /browse opens a real headless Chromium instance and navigates your app for inspection

The technical detail that actually matters

Most browser automation tools in this space open a new browser context for every single tool call which means logging in again, losing cookies, re-establishing state. gstack solves this with a persistent headless Chromium process over localhost HTTP. Cookies, tabs, localStorage, and login sessions all survive across commands.

The performance numbers are real: a cold start costs roughly 3–5 seconds on the first tool call. After that, subsequent calls in the same session run in 100–200 milliseconds. That's the difference between a tool you use once and a tool you use twenty times a day.

# Install gstack
curl -fsSL https://gstacks.org/install.sh | bash

# Then inside any Claude Code session:
/plan-ceo-review    # rethink the problem from first principles
/plan-eng-review    # lock the execution approach
/review             # pre-land PR review
/ship               # merge + test + open PR
/qa                 # browser QA with bug fixing

How I use it day-to-day

My personal flow on any non-trivial feature: /plan-ceo-review first (usually kills 30% of the scope I thought was necessary), then /plan-eng-review to settle the architecture before touching a keyboard. After implementation, /review before the PR goes up. /cso on anything that touches auth, payments, or user data. The security officer mode has caught real issues an exposed internal endpoint, a missing rate limit, a JWT validation gap things that would have been embarrassing in production.

"10K lines of code per week. 100 PRs. 50 days straight. That's what gstack + Claude Code looks like when you actually commit to the workflow." Garry Tan

Is that number achievable for everyone? No. But even at 20% of that throughput, you're operating at a different level than you were before.

2 Compound Engineering The Plugin That Makes AI Get Smarter About Your Codebase Over Time

Compound Engineering by EveryInc (Kieran Klaassen & Dan Shipper)
🏒 Built at Every.to πŸ“¦ Open source, MIT github.com/EveryInc/compound-engineering-plugin Spawns 50+ sub-agents

The name is deliberate. Compound interest. Every feature you build makes the next feature cheaper to build. That's the design philosophy behind this plugin, and understanding it changes how you think about AI-assisted development.

Most developers use Claude Code in a stateless way: new task, new context, explain the codebase again, get the output, done. The knowledge Claude built up about your system during that session disappears the moment you close the terminal. Compound Engineering treats that knowledge as an asset worth preserving.

The 4-step loop

The plugin implements a disciplined workflow around four phases:

  1. Plan before writing a single line of code, sub-agents research your codebase in parallel: dependencies, framework versions, existing patterns, best practices. You get a structured plan for approval. This parallel research phase alone is what I'd call the biggest quality improvement over vanilla Claude Code.
  2. Work Claude asks clarifying questions first, then builds the feature and writes tests against the approved plan. It doesn't improvise. It executes.
  3. Review automated review against the plan's contract. Did the implementation match what was agreed? Are tests passing? Are there regressions?
  4. Compound here's the novel part. Every bug encountered, every decision made, every edge case discovered gets written into subsystem knowledge files. These are Markdown files that live in your repo and get fed to future agents as context. The AI is literally building documentation about your codebase, for future AI sessions, automatically.
# After installing the plugin, your Claude Code session gains:
/compound-plan      # parallel codebase research + structured plan generation
/compound-work      # plan-driven implementation with clarification phase
/compound-review    # contract verification against the plan
/compound           # run the full 4-step loop end-to-end

# The compound step writes knowledge files like:
# .compound/subsystems/auth.md
# .compound/subsystems/payments.md
# .compound/decisions/2026-04-api-rate-limiting.md

What 50+ sub-agents actually means in practice

When you trigger the Plan phase on a complex feature, the plugin can spawn over 50 parallel sub-agents each one focused on a specific part of the research. One agent reads your authentication module. Another checks the database schema. Another pulls the latest docs for the framework you're on. They run concurrently and their findings get merged into a single, dependency-aware plan.

Compare this to asking Claude to "plan the feature" directly: you get a single-threaded response based on whatever Claude can hold in context right now. The compound approach gets you something closer to what a staff engineer would produce after two days of codebase archaeology in under a minute.

"One developer doing the work of five." That's the claim from the Every team based on their internal benchmarks. I'd put it at three-to-four for a typical product codebase, but the direction is right.

The community reception on X has been strong. The tweet that stuck with me: "Compound engineering plugin for Claude Code is basically my go-to plan mode in CC now, I rarely use the regular plan mode these days." That shift from ad-hoc planning to structured, documented planning is the real value here.

3 code-review-graph Stop Paying for Tokens You Don't Need

code-review-graph by Tirth Kanani
πŸ“‰ 6.8Γ— fewer tokens on reviews πŸ“‰ 49Γ— fewer tokens on daily tasks github.com/tirth8205/code-review-graph Python 3.10+ required

I want to be direct about this one: if you're working on a codebase with more than 100 files, you are almost certainly wasting money without something like code-review-graph. The default Claude Code behaviour when asked to review a change is to read far more files than it actually needs to. That's not a criticism of the model; it's a structural problem with how context is gathered. This plugin fixes it at the source.

How it actually works the AST graph approach

When you install code-review-graph and run the initial build, it parses your entire repository using Tree-sitter a fast, incremental parser that produces a concrete syntax tree for every file. The plugin then converts those trees into a persistent knowledge graph where:

  • Nodes represent functions, classes, imports, and test cases
  • Edges represent call relationships, inheritance chains, import dependencies, and test coverage links

When you make a change, the plugin doesn't re-parse the whole codebase. It traces the "blast radius" of your change through the graph every caller, every dependent, every test that could be affected and builds the minimal set of files Claude actually needs to read to do a complete review.

# Install
pip install code-review-graph
# or with uv (recommended):
uv pip install code-review-graph

# Wire into Claude Code
code-review-graph install --platform claude-code

# Build the initial graph (one-time, ~10s for 500-file project)
code-review-graph build

# After this, the graph auto-updates on every file save and git commit.
# Claude Code gains new tools:
# - semantic_search_nodes_tool
# - query_graph_tool (callers_of, callees_of, imports_of, tests_for...)
# - get_impact_radius_tool
# - get_review_context_tool

The 49Γ— number what's behind it

The headline claim is a 49Γ— reduction in tokens on daily coding tasks. That's not a made-up marketing number Tirth published his methodology. The 49Γ— represents the extreme case: a large monorepo where you change a utility function deep in a shared library. Without the graph, Claude scans the entire repo to understand impact. With the graph, it reads exactly the 3–7 files in the blast radius. The 6.8Γ— figure is the more typical case for PR reviews on mid-sized projects.

The practical consequence: on a codebase where you were previously burning $0.40–$0.80 per review session, you're now at $0.05–$0.12. Across a month of active development, that's real money. And beyond cost, the quality of reviews improves because Claude is reading the right files rather than skimming hundreds of irrelevant ones.

Cross-platform support

This is one of the few plugins that works across the entire AI coding editor landscape Claude Code, Cursor, Windsurf, Zed, Continue, OpenCode, and Antigravity all supported via the same graph. If your team uses different tools, the graph is a shared asset. One build, multiple platforms.

# Works across editors
code-review-graph install --platform claude-code
code-review-graph install --platform cursor
code-review-graph install --platform windsurf

# The graph itself lives in .code-review-graph/ at your repo root
# Check it into version control your team shares the same graph

4 Superpowers Teaching Claude Code How to Actually Think About Software

Superpowers by Jesse Vincent (obra)
βœ… Official Anthropic Marketplace πŸ“¦ MIT license github.com/obra/superpowers Also: superpowers-marketplace

Most Claude Code plugins extend what Claude can access. Superpowers extends how Claude thinks. That's a meaningful distinction.

Jesse Vincent built Superpowers around a simple observation: AI assistants are genuinely capable coders, but they default to the fastest path which is usually not the best path. They skip tests when not explicitly told to write them. They jump straight to implementation before fully understanding requirements. They patch symptoms rather than finding root causes. Superpowers enforces the practices that senior engineers follow by instinct, but that AI shortcuts around.

The skills that matter most

/brainstorming is where I start any non-trivial feature. This isn't brainstorming in the post-it-notes sense. It's a Socratic interrogation of your requirements: What problem does this actually solve? What are the constraints? What are the failure modes? What are you assuming that might not be true? The skill runs you through a structured sequence of questions before a single line of code is considered. The output is a refined, clarified spec not a plan for the current feature, but an understanding of the real problem underneath it.

/execute-plan is the implementation complement. Once you have a plan, this skill runs it through batched implementation with built-in review checkpoints. It breaks the plan into atomic units, implements each one, runs the relevant tests, checks for regressions, and only moves to the next unit when the current one is clean.

The TDD enforcement is the detail that gets the most attention and for good reason:

# The Superpowers TDD cycle (enforced, not optional):
# 1. Write the failing test first Claude MUST show you the red test
# 2. Implement the minimum code to make it pass no extras
# 3. Refactor with the green tests as a safety net
# 4. Move to next unit only when all tests pass

# Example: asking Claude to add a new API endpoint
# Without Superpowers: Claude writes the handler, maybe writes a test
# With /execute-plan: Claude writes the test, shows you the 404,
# writes the handler, shows you the green, then asks about edge cases

The four-phase debugging methodology is equally disciplined. Most AI assistants, when given a bug, try a fix immediately. Superpowers enforces: (1) reproduce the bug deterministically, (2) hypothesise root causes, (3) gather evidence to eliminate false hypotheses, (4) only then apply a targeted fix. This maps exactly to how experienced engineers actually debug and it dramatically reduces the "fix one bug, introduce two more" pattern that plagues AI-generated patches.

The marketplace

Jesse also ships a curated marketplace at github.com/obra/superpowers-marketplace community-contributed skills built on the same framework. Skills for specific frameworks (Rails, FastAPI, Next.js), domain-specific workflows (data pipeline validation, ML experiment tracking), and team process skills (incident response, architecture decision records). It's still early but growing fast.

# Install Superpowers
git clone https://github.com/obra/superpowers ~/.claude/plugins/superpowers

# The skills are pure Markdown Claude reads them as context before acting
# You can inspect, fork, and modify any skill directly:
cat ~/.claude/plugins/superpowers/skills/tdd.md
"Claude Code got 100Γ— better with Superpowers. Not because it can do more things but because it stopped doing the wrong things confidently." from Medium, codeandbird

That quote captures it well. The value isn't raw capability. It's discipline.

5 Codex Plugin OpenAI's Model, Running Inside Anthropic's Tool

Codex Plugin for Claude Code by OpenAI
🀝 Cross-provider: OpenAI Γ— Anthropic πŸ“¦ Apache 2.0 github.com/openai/codex-plugin-cc Released: March 2026

This one still surprises people when they first hear about it: OpenAI released an official plugin that runs inside Anthropic's Claude Code. The two biggest AI labs, fierce competitors on every other dimension, collaborating on a developer tool. That alone is worth paying attention to it signals something important about where the industry is heading.

But the reason to actually install it isn't the novelty. It's the adversarial review capability.

Three commands, one clear use case for each

/codex:review runs a standard Codex code review on your current changes. Think of it as a second opinion Claude reviewed your diff, now Codex reviews it. Different training data, different architectural biases, different blind spots. The overlap between what both models flag is high-confidence. The things only one flags are worth investigating.

/codex:adversarial-review is the interesting one. This isn't a standard review. Codex is explicitly instructed to challenge your implementation question the design decisions, surface the failure modes, identify the assumptions, stress-test the logic. It's designed to be uncomfortable in exactly the way a good senior engineer's review is uncomfortable. It asks "why did you do it this way?" and expects a good answer.

# Install the Codex plugin
git clone https://github.com/openai/codex-plugin-cc ~/.claude/plugins/codex-plugin-cc

# Set your OpenAI API key
export OPENAI_API_KEY=sk-...

# Available commands inside Claude Code:
/codex:review              # standard Codex review of current changes
/codex:adversarial-review  # challenging design review questions every decision
/codex:rescue              # hand off a stuck task to Codex as an autonomous sub-agent

/codex:rescue is the third command, and it's deceptively powerful. When Claude Code is stuck running in circles on a bug, producing the same broken output repeatedly, missing something structural about the problem you run /codex:rescue. Codex takes over as a sub-agent with fresh context. It investigates the problem independently, attempts a fix, and hands back control. This isn't giving up; it's leveraging the fact that different models have different failure modes, and what one is blind to, the other often sees clearly.

The technical architecture

Under the hood, the plugin uses MCP (Model Context Protocol) to let Claude Code call Codex CLI as a tool during its reasoning process. When you invoke a Codex command, Claude orchestrates the call, passes the relevant context (diff, file list, instructions), receives Codex's output, and integrates it into the conversation. You never leave Claude Code. The cross-provider handoff is invisible.

# What's happening under the hood:
# Claude receives: /codex:adversarial-review
# Claude calls: codex_review_tool({ mode: "adversarial", context: [diff, files] })
# Codex CLI runs: codex --review --adversarial [context]
# Output returns to Claude as a tool result
# Claude synthesises both perspectives and presents a unified review

The licensing is worth noting: Apache 2.0. OpenAI isn't charging for this. It's a direct move to get Codex usage up by meeting developers where they already are inside Claude Code.

Which one should you install first?

Not a rhetorical question. Here's the honest answer based on what I've seen work for different types of developers and teams:

You are… Install this first Why
Solo founder / indie developer gstack You need the whole team, not just one skill. gstack is the broadest force multiplier.
Team worried about API costs code-review-graph 49Γ— token reduction is immediate, measurable ROI. No workflow change required.
Engineering team with process discipline Superpowers Enforces TDD and structured debugging across the whole team, not just individuals.
Building a product at scale Compound Engineering The knowledge files compound over time. The bigger your codebase, the more the ROI grows.
Already heavy Claude Code user Codex plugin The adversarial review catches what Claude's blind spots miss. Second opinions are cheap here.

If you're building something serious and you have the time to set up properly: install code-review-graph first (it improves every other tool since everything uses fewer tokens), then gstack or Superpowers depending on whether you want breadth or depth of process discipline.

A few honest caveats

None of these plugins are magic. A few things I'd flag before you invest setup time:

  • gstack's real value is the workflow, not just the commands. If you install it but don't change how you plan and review, you'll get 20% of the value. The workflow discipline is where the 10K LOC/week number comes from.
  • Compound Engineering has a learning curve. The first time through the 4-step loop feels slow because you're not used to approving plans before coding. Stick with it for two weeks before judging.
  • code-review-graph needs maintenance on large, fast-moving codebases. If you're doing aggressive refactors daily, the graph can briefly drift out of sync. Run code-review-graph rebuild weekly on high-churn repos.
  • Superpowers is opinionated. If your team doesn't already practice TDD, this plugin will create friction before it creates value. Worth the friction, but be prepared for it.
  • The Codex plugin requires an OpenAI API key with active credits. The plugin itself is free but Codex API calls are billed by OpenAI at standard rates. Budget ~$5–10/month for moderate adversarial review usage.

Where this is all going

The plugin ecosystem for Claude Code is still very early. What we have now these 5 tools, plus a growing community of skills on the Superpowers marketplace is probably 10% of what will exist by the end of 2026. The pattern is becoming clear: Claude Code is the runtime, and the plugin ecosystem is where the specialisation lives.

The developers winning right now aren't the ones using the most powerful models. They're the ones who've assembled the right combination of model + workflow + plugins for their specific context. Garry Tan isn't shipping 10K lines a week because he has access to better AI than you do. He's doing it because he's assembled a disciplined system around that AI.

These 5 plugins are a good system to start with.