We’ve reviewed thousands of pull requests over the years. Some reviews caught real bugs before they hit production. Others devolved into arguments about variable names while actual problems slipped through. The difference came down to knowing what matters and what doesn’t.

One review sticks with me. A teammate opened a PR that refactored a payment processing flow. Clean code, solid tests, good naming. Three reviewers approved it. On a final pass before merge, we noticed that an error handling path swallowed a failed charge response and returned success to the caller. The user would see “payment complete,” but no money would actually move. The tests all passed because they mocked the payment gateway and only asserted on the happy path. That near-miss changed how we review code. We stopped reading PRs to confirm they looked reasonable and started reading them to find the specific ways they could fail.

The hardest bugs to catch in review are the ones where the code reads well but behaves wrong. Clean code is not correct code.

Here’s the framework we’ve built for catching real problems without becoming a bottleneck.

Three questions we always ask

Before looking at the code itself, we ask three questions. Everything else is secondary.

1. Does it actually work?

This sounds obvious, but it’s easy to get distracted by style while missing correctness issues. We check:

  • What happens with empty inputs, null values, or malformed data?
  • What happens when external services are down or slow?
  • Are there race conditions in async code? Shared state that could conflict?
  • Does the code handle partial failures and leave the system in a consistent state?

A bug in production costs orders of magnitude more than catching it in review. This is where we spend most of our time.

Consider a function like this:

async function transferFunds(fromAccount: string, toAccount: string, amount: number) {
  await debit(fromAccount, amount);
  await credit(toAccount, amount);
}

In review, the first thing we’d flag: what happens if credit fails after debit succeeds? The user loses money with no record of why. A safer version handles the partial failure:

async function transferFunds(fromAccount: string, toAccount: string, amount: number) {
  const debitResult = await debit(fromAccount, amount);
  try {
    await credit(toAccount, amount);
  } catch (err) {
    await reverseDebit(fromAccount, debitResult.transactionId);
    throw new TransferError("Credit failed, debit reversed", { cause: err });
  }
}

You won’t spot this by scanning code at a surface level. You have to think about execution order and what happens when things fail halfway through.

2. Is it testable?

Code that’s hard to test usually has deeper design problems. Warning signs:

  • Dependencies hardcoded instead of injectable
  • Functions that can’t be tested in isolation
  • Side effects scattered everywhere instead of contained at boundaries
  • Tests that only exercise the happy path

If we can’t imagine how to test something, that’s a signal to talk about the design.

We also watch for tests that assert too little. A test that calls a function and only checks that it doesn’t throw has almost no value. A test that mocks every dependency and asserts that the mocks were called in a specific order is testing implementation details, not behavior. Both are worth flagging.

3. Will the next engineer understand it?

Code is read far more often than it’s written. We ask: if someone unfamiliar with this code needs to change it in six months, will they understand what it does and why?

This isn’t about adding comments everywhere. It’s about clear names, obvious structure, and explicit intent. A well-named function with a clear signature often needs no comments. A comment that says // increment counter above counter++ adds noise. A comment that says // We retry 3 times because the upstream API has transient failures roughly 2% of the time is worth its weight in gold because it captures context the code alone can’t express.

Security review: the patterns that bite

Security issues deserve their own section because they’re easy to miss if you’re not looking, and the consequences are severe. We’ve trained ourselves to pattern-match on a few categories.

SQL injection and query construction

Any time we see a query built with string concatenation or template literals, we stop:

# This should raise an immediate flag in review
def get_user(username):
    query = f"SELECT * FROM users WHERE username = '{username}'"
    return db.execute(query)

The fix is parameterized queries, every time, no exceptions:

def get_user(username):
    query = "SELECT * FROM users WHERE username = %s"
    return db.execute(query, (username,))

This applies to ORMs too. We’ve seen developers bypass the ORM’s parameterization for “performance” and introduce injection vulnerabilities. The performance argument is almost never valid.

Cross-site scripting (XSS)

We watch for any place user input gets rendered into HTML without escaping. Modern frameworks handle this by default, which makes the exceptions more dangerous. Developers reach for dangerouslySetInnerHTML in React or | safe in Jinja and sometimes don’t realize they’ve turned off the framework’s protection:

// This should be scrutinized very carefully in review
function UserBio({ bio }) {
  return <div dangerouslySetInnerHTML={{ __html: bio }} />;
}

If the feature genuinely requires rendering user-supplied HTML (a rich text editor, say), we look for a sanitization step using something like DOMPurify:

import DOMPurify from "dompurify";

function UserBio({ bio }) {
  const sanitized = DOMPurify.sanitize(bio);
  return <div dangerouslySetInnerHTML={{ __html: sanitized }} />;
}

Authorization checks

The most common auth vulnerability we see in review isn’t a missing login check. It’s a missing ownership check. The endpoint requires authentication, but any authenticated user can access any other user’s data:

# Authenticated, but is the user authorized to see THIS resource?
@require_login
def get_document(request, document_id):
    return Document.objects.get(id=document_id)  # No ownership check

We look for this anywhere a resource is fetched by ID. The fix is filtering by the current user:

@require_login
def get_document(request, document_id):
    return Document.objects.get(id=document_id, owner=request.user)

Authorization bugs are silent. They don’t throw errors, they don’t show up in logs, and the tests pass because the test user happens to own the test data.

Secrets and credentials

We check that API keys, tokens, and passwords are never hardcoded in source files. We also check configuration files, test fixtures, and seed data. A pattern we frequently flag: secrets passed as URL query parameters, which end up in server logs and browser history.

What we’ve stopped caring about

Knowing what to skip is just as important as knowing what to check. These used to eat up our review cycles, and we’ve learned to let them go.

Formatting

If your team uses an automated formatter (Prettier, Black, gofmt), formatting is solved. We don’t discuss it in reviews. If your team doesn’t use a formatter, the first priority is adopting one and removing formatting from human review entirely.

Minor naming preferences

fetchUsers vs getUsers vs loadUsers. These debates feel important in the moment but rarely affect code quality. Unless a name is actively misleading, like a function called validate that also mutates state, we move on.

”I would have done it differently”

Different isn’t wrong. If the code works, is testable, and makes sense to the reader, the fact that we would have written it differently isn’t a valid objection. Google’s Engineering Practices guide for code review puts this well: reviewers should approve code that improves the overall health of the codebase, even if it isn’t exactly how they would have written it. That framing has saved us from a lot of unproductive debates.

Questions over statements

We’ve found that questions work better than declarations in review comments:

Instead of “This will fail if the list is empty,” try “What happens if the list is empty here?”

Instead of “You should use a Map instead,” try “Have you considered using a Map? It might simplify the lookup logic.”

Instead of “This is confusing,” try “Could you add a comment explaining why this approach? I want to make sure I understand.”

Questions are less confrontational and often surface context we were missing. Sometimes the author already thought through our concern. Sometimes they haven’t, and the question helps them see it.

But this principle has nuance. There are times when a direct statement is better.

When there’s a clear bug, don’t frame it as a question. “What happens if userId is null here?” can come across as coy when the answer is obviously “it crashes.” Just say: “This will throw a NullPointerException if userId is null, which can happen when the session expires. We need a null check here.” Be direct about facts. Be curious about design choices.

When we’re suggesting an approach but genuinely don’t know if it’s better, questions are honest, not just polite. “Would it be simpler to use reduce here, or is there a reason you went with the explicit loop?” is a real question. Maybe the loop is more readable and we’d learn something from the answer.

When we’re flagging something we’re unsure about, we say so. “I’m not sure about this, but I think this regex might not handle Unicode correctly. Worth checking?” is more useful than a confident assertion we’re not sure about.

The goal of a review comment is to reach the right outcome, not to protect anyone’s ego, including our own.

Reviewing across experience levels

The same code change needs a different review approach depending on who wrote it.

Reviewing junior engineers’ code

When reviewing code from someone early in their career, we shift toward teaching. We explain why, not just what. Instead of “use const here instead of let,” we write: “Since config is never reassigned after this line, const communicates that intent to the next reader. We use let only when we plan to reassign.”

We also pick battles more carefully. If a junior engineer’s PR has a correctness issue, a security issue, and three style suggestions, we lead with correctness and security. We might mention one style point as a learning opportunity and leave the rest. Five or six “nit” comments on someone’s first few PRs can be demoralizing, and demoralized engineers don’t learn faster.

We try to point out what they did well, too. “Nice job handling the error case here, that’s easy to miss” costs five seconds and reinforces good habits.

Reviewing senior engineers’ code

Senior engineers’ code tends to have fewer surface-level issues, but the failure modes are different. We see over-abstraction: building a generic framework when a specific solution would do. Premature optimization that makes the code harder to follow. Assumptions baked in from deep context that aren’t obvious to someone new.

With experienced engineers, we focus on design-level questions. “Do we expect this pattern to be reused, or is the abstraction here just for this use case?” Senior engineers sometimes over-build because they’re anticipating future needs that may never materialize, and a well-timed question can save the team from maintaining unnecessary complexity.

We also don’t soften feedback out of deference to experience. A senior engineer’s code can still have bugs, and tiptoeing around them doesn’t serve anyone. Respect means treating the code honestly.

Reviewing AI-generated code

AI coding assistants are a regular part of many engineers’ workflows now. Code generated or heavily assisted by an LLM has specific patterns worth watching for.

Plausible but wrong logic

AI-generated code tends to look correct at a glance. The variable names are reasonable, the structure follows common patterns, and it often reads more cleanly than human-written code. That’s exactly what makes it dangerous in review. It doesn’t trigger the “something looks off” instinct that sloppy code does.

We’ve learned to slow down on code that looks too polished and trace through the actual logic rather than pattern-matching on the shape.

Missing edge cases

LLMs frequently generate code that handles the main path well but skips edge cases. We’ve seen this with off-by-one errors in pagination logic, missing null checks on optional fields, bad Unicode handling (AI-generated regex is especially suspect), time zone code that works for UTC but breaks everywhere else, and error handling that catches exceptions but does nothing useful with them.

Over-engineering

AI assistants, when asked to “write production-quality code,” tend to produce overly abstract solutions. Factory patterns where a function would do. Class hierarchies for a single use case. Configuration systems for something that will only ever have one configuration.

We ask: “Is all of this necessary for the current requirements?” If not, simplify. Code that exists for hypothetical future needs is code that needs maintenance now.

Dependency choices

AI-generated code sometimes pulls in libraries that are outdated, unmaintained, or unnecessary. We check that any new dependency is actively maintained, appropriately licensed, and actually needed. Could this be done with the standard library or with something we already depend on?

AI-generated code needs more scrutiny, not less. The fact that it reads well is exactly why you need to verify it works well.

Speed is a feature

A pull request that sits for days creates merge conflicts, blocks dependent work, and frustrates the author. We treat review latency as a quality metric.

We aim for first response within a few hours during working hours. Even if a thorough review needs more time, a quick initial pass helps. And we label feedback as either a blocker or a suggestion. “This is a suggestion, not a blocker” is one of the most useful phrases in code review. It lets the author decide whether to address it now or ship and iterate.

How latency compounds

Review latency doesn’t just affect one PR. It affects the team’s entire throughput. When reviews are slow, engineers batch larger changes into fewer PRs because the per-PR wait cost is high. Larger PRs are harder to review, so reviews take even longer. Reviewers do shallower passes because the PRs are overwhelming. Quality drops. It’s a vicious cycle.

We’ve watched this happen across teams. The ones with fast review cycles tend to have smaller PRs, better review quality, and fewer production incidents. Small PRs are easier to review well, and fast reviews encourage small PRs. They reinforce each other.

There’s also a context-switching cost that’s easy to underestimate. When an author gets feedback three days after opening a PR, they’ve mentally moved on. They have to reload the context of their own decisions, re-read their code, figure out what the reviewer means. A review that arrives while the code is still fresh leads to faster, better responses.

Unblocking without complete review

Sometimes we can’t do a thorough review right away. We’re in the middle of focused work, or the PR touches code we don’t know well. A quick message is still better than silence: “I’ll review this properly this afternoon, but I skimmed it and nothing jumped out. If it’s urgent, feel free to grab another reviewer.” That one message unblocks the author’s planning even if it doesn’t unblock the merge.

The goal

Code review isn’t about proving expertise or catching every possible issue. It’s about catching the important stuff while keeping the team moving.

But it’s also about culture. The way a team reviews code shapes how that team works together. Teams that review with curiosity build trust. Teams that treat review as gatekeeping build resentment. Teams that review quickly signal they value each other’s time.

We’ve been on teams where opening a PR felt like submitting homework for grading, where the goal was to avoid criticism rather than improve the code. And we’ve been on teams where opening a PR felt like starting a conversation, where the process made both the code and the people better. The difference wasn’t the individuals. It was the norms.

Those norms get set by how we write comments, how quickly we respond, what we focus on, and what we let go. Every review is a small act of culture-building.

We focus on correctness, security, testability, and clarity. We skip the bikeshedding. We ask questions when we’re genuinely curious and state things directly when we’re sure. We adjust based on who we’re reviewing. We review quickly. And we remember that the person on the other side of the PR is our teammate.

Good code review makes the code better. Great code review makes the team better.