What Are Brittle Tests? Definition & How to Prevent Them

Deboshree Banerjee
Jan 26, 2026

We often come across the word resilience in software engineering. While the term applies to many things, one area that’s often overlooked is tests. Ideally, software tests should only fail when something actually goes wrong in the system. 

With brittle tests, however, failures occur because the tests themselves are fragile, not because the application is broken. An easy way to spot this is during refactoring, i.e., when you change the internal structure of the code without changing its external behavior. By definition, refactoring shouldn’t break anything. 

But if your tests disagree, it may be because you’re testing the wrong layer.

Keep reading to learn more about what brittle tests are, why they exist, and how your team can manage them effectively. We’ll explore modern AI solutions you can use to respond to brittle tests.

What Are Brittle Tests?

The words brittle and fragile don’t have a single canonical definition in software engineering. They mean what they do in everyday English: something that’s easy to break.

A brittle test is one that fails when the code is still correct. In these instances, the application behaves as expected but the test suite turns red anyway. This typically happens after a refactor, a styling change, or any modification to internal implementation that doesn’t affect user-facing behavior.

Just to be clear, let’s also distinguish between “flaky” and “brittle” tests: 

  • Flaky tests fail intermittently on the same piece of code; they show inconsistent behavior and pass sometimes while failing other times. 
  • Brittle tests fail consistently after an irrelevant change. Here, failure isn’t random. It’s a predictable response.
A brittle test is one that fails when the code is still correct.

Why Tests Become Brittle

Tests don't start out brittle. They usually become brittle over time through a combination of well-intended shortcuts and creeping complexity. 

In this section, we’ll examine some of the most common reasons why tests become brittle.

Testing How Code Works Instead of What It Does

Say you have a function that fetches a user's orders:

function getUserOrders(userId) {
  const orders = database.query(`SELECT * FROM orders WHERE user_id = ${userId}`);
  return orders;
}

Here’s an example of a brittle test:

test('getUserOrders calls the database with correct query', () => {
  const spy = jest.spyOn(database, 'query');
  getUserOrders(123);
  expect(spy).toHaveBeenCalledWith('SELECT * FROM orders WHERE user_id = 123');
});

This is brittle because tomorrow some other developer might optimize the query to SELECT id, total FROM orders WHERE user_id = 123. As a result, the function will still return correct orders but the test fails because the exact SQL string changed. 

This test was verifying the implementation (e.g., which SQL query runs) instead of the behavior (e.g., does the function return the right orders?).

Now, let’s look at an example of a non-brittle test:

test('getUserOrders returns orders for the given user', () => {
  // Setup: insert a test order
  database.insert({ table: 'orders', user_id: 123, total: 50 });
  
  const orders = getUserOrders(123);
  
  expect(orders).toContainEqual(expect.objectContaining({ user_id: 123, total: 50 }));
});

This test doesn't care how the data is fetched. It only verifies that calling getUserOrders(123) returns orders belonging to user 123. Even if you refactor the SQL, switch to an object-relational mapper (ORM), or change the database entirely, the test will pass, as long as the correct data comes back.

Hardcoded Locators That Break When the Page Structure Changes

Consider this HTML for a login form:

<body>
  <div class="header">...</div>
  <div class="content">
    <form>
      <input type="text" name="username">
      <input type="password" name="password">
      <button>Login</button>
    </form>
  </div>
</body>

In this instance, an example of a brittle test looks like this:

await page.click('/html/body/div[2]/form/button');

This XPath says: "Click the button inside the form inside the second div inside the body." Now, let’s see why this test would break.

Assume marketing adds a banner between the header and content:

<body>
  <div class="header">...</div>
  <div class="promo-banner">Summer Sale!</div>  <!-- NEW -->
  <div class="content">
    <form>...</form>
  </div>
</body>

Now the content div is div[3]—not div[2]. The login button still works perfectly and users can login, but the test can't find the button because it’s looking for "the second div" and there's now a third one.

Now, let’s look at a non-brittle test:

await page.getByRole('button', { name: 'Login' }).click();

This indicates that we’re clicking a button labeled Login; it doesn't care where the button is in the DOM. Even if you add 10 divs, wrap the form into a modal, or move it to a different section of the page, the test will still pass. 

As long as there's a button that says “Login,” everything should work as expected.

Hardcoded Waits That Assume Specific Timing

Let’s now turn our attention to this brittle test:

await page.waitForTimeout(3000);

On a fast machine, the dashboard might load in 200 ms. But on a slow machine, it might take 4 seconds to load. If the test always waits 3 seconds, it can’t be reliable. 

Now, imagine you have multiple tests running simultaneously. These fixed waits either slow down the entire test suite or cause tests to fail permanently when just waiting a little longer could have produced the expected results.

Here’s a better way to write this test:

await page.waitForSelector('[data-testid="loaded-content"]', { state: 'visible' });

By employing a conditional wait tactic, you can ensure the test remains stable.

Tests That Assume Specific Data Exists

Here’s another example of a brittle test:

test('user profile shows email', async () => {
  await page.goto('/users/42');  // Assumes user ID 42 exists
  expect(await page.locator('.email').textContent()).toBe('john@example.com');
});

Imagine someone runs a database cleanup script and deletes user 42 and another test modifies user 42's email to test an "update email" feature. 

The staging database gets refreshed from a backup that doesn't have user 42 and a developer runs this test on their local machine, which has a fresh database with no users at all. 

The application works correctly—users can create accounts and view profiles—but the test still fails because it assumes a specific user with a specific email address will always exist. These kinds of hidden dependencies make tests brittle.

Here’s a better way to write that test:

test('user profile shows email', async () => {
  // Create the data this test needs
  const user = await api.createUser({ email: 'test-user@example.com' });
  
  await page.goto(`/users/${user.id}`);
  expect(await page.locator('.email').textContent()).toBe('test-user@example.com');
  
  // Clean up
  await api.deleteUser(user.id);
});

How to Prevent Brittle Tests

Now that you have a better idea of how brittle tests appear, let’s examine some tactics you can use to avoid them.

Use Stable, Semantic Selectors

Instead of brittle XPath or deeply nested CSS selectors, use data attributes specifically intended for testing. Something like data-testid="submit-button" or data-automation="user-profile" survives design changes and communicates intent both to developers and to the test suite. 

Playwright (which, incidentally, powers the Autify Nexus platform) has excellent support for semantic selectors like getByRole(), getByText(), and getByLabel(). These selectors approach elements just like a human would, rather than through DOM structure.

Embrace Dynamic Waits

Replace fixed delays with conditional waits. Almost all modern frameworks support conditional waits natively. Instead of using a construct like "wait 3 seconds," use something like "wait until this element is visible" or "wait until the network is idle." 

Isolate Test Data

Each test should create what it needs and clean up after itself. A clear trick here is to use factory patterns, dedicated test accounts, and transactional rollbacks to ensure independence. This makes tests reliable. Plus, you can often parallelize isolated tests in ways you can't with shared-state tests.

The classic formula is to test what your code does—not how it does it.

Test Behavior, Not Implementation

The classic formula is to test what your code does—not how it does it. Your test shouldn’t care about the underlying technology, library, or framework. It should only care about the functionality. Every test you write needs to reflect that. 

Review and Refactor Tests Like Production Code

Tests are also code, which means they also accumulate technical debt. And like regular code, they also benefit from refactoring. Quite often, they get treated as write-only artifacts. 

Ensure that test quality is an essential part of code review. When a test breaks quite often, it’s a signal to rewrite it—not just add temporary fixes. 

What Is the Role of AI in Preventing Brittle Tests?

The traditional approach to dealing with brittle tests has been reactive. It entails fixing things after they break. AI, however, is shifting this to a more proactive stance. 

For instance, Autify's Fix with AI feature can automatically detect and suggest corrections when locators become stale or when minor UI changes break previously working tests. 

Unlike traditional self-healing approaches that slow down execution by searching for alternative element paths at runtime, this method preserves test speed by addressing issues earlier on.

Another important shift is the emergence of AI agents like Autify’s AI Tester, which execute tests based on intent and assist with keeping those tests up to date as application flows evolve, rather than relying on brittle, hard-coded scripts.

We’re still in the early days, and AI won't magically eliminate the need for thoughtful test design. Even so, the tech is getting increasingly good at handling the grunt work, freeing engineers to focus on the most important things.

Conclusion

When it boils down to it, brittle tests are a tax on development velocity. They waste time, erode trust, and make testing feel like a chore instead of a safety net. 

The fix to this involves using stable selectors, dynamic waits, and isolated data, among other things. If you layer AI-powered maintenance on top, you've got a test suite that actually helps you move faster—not one that slows you down.

Ultimately, your tests should be the first to tell you when something’s wrong. With the right approach, you can ensure that they aren’t raising false alarms. 

Ready to spend less time fighting brittle tests? See how Autify AI Tester uses an autonomous AI agent from natural language and executes like a real user would across web, mobile, and desktop. No scripting, zero maintenance.