technical

Why p95 and p99 Matter More Than Average Latency

Average latency often hides real production issues. This article explains why p95 and p99 latency provide a more accurate view of system reliability and real user experience in high-traffic environments.

Selim Çalık

Author

•

Feb 3, 2026

5 min

Why p95 and p99 Matter More Than Average Latency

Average latency often creates a false sense of confidence. Dashboards look green, SLAs appear healthy, and systems are assumed to be performing well. However, real production incidents almost never occur at the “average” level.

They happen at the edges.

That is why in high-traffic, mission-critical systems, percentile latency—especially p95 and p99—provides a far more accurate picture of system behavior than simple averages.

p95 represents the latency under which 95% of requests complete, while p99 reflects the experience of the slowest 1%. This small percentage is where most real-world failures originate.

Why Averages Are Misleading

Averages hide outliers. If 99 requests complete in 50 ms and one takes 5 seconds, the average still looks acceptable. But for some users, the system feels broken.

This discrepancy explains why teams often see healthy dashboards while customers report timeouts and degraded experiences.

In API gateways, IAM layers, payment flows, and integration platforms, these outliers tend to cascade across services, amplifying the impact.

p95 vs p99: When to Use Which

p95 is useful for understanding general system performance. It reflects what most users experience. However, in regulated and high-availability environments, p95 alone is insufficient.

p99 captures rare but critical events:

Timeouts
Circuit breaker activations
Retry storms
Thread pool exhaustion

These events almost always begin in the p99 range. Monitoring p99 allows teams to detect problems before they escalate into outages.

p99 from an API Gateway Perspective

API gateways are often blamed for latency issues, but in practice they usually expose problems rather than cause them.

Common scenarios include:

Rare but extremely slow downstream responses
Intermittent latency in IAM or token validation flows
Low cache miss rates with very expensive misses
Traffic approaching rate limits

In all these cases, average latency remains stable while p99 degrades sharply. Observing p99 at the gateway layer provides early visibility before application teams notice the issue.

Defining SLAs and SLOs with p99

Mature organizations avoid defining SLAs based on averages. Instead, they rely on percentile-based targets:

SLAs defined on p95 or p99
Tighter SLOs for internal objectives
Error budgets calculated from p99 violations

For example:

“API response time p99 < 800 ms”
“Monthly error budget calculated on p99 timeouts”

This approach aligns performance metrics with real user experience rather than theoretical averages.

Conclusion

p95 and p99 do not measure how fast your system is on good days. They measure how reliable it is on bad days.

In production systems with high traffic and complex dependencies, p99 is not just a metric—it is an operational instinct.

Because users never experience the average. They experience your system at its slowest moments.

Tagged with:

p95 latency p99 latency API performance Production monitoring API Gateway Observability Performance metrics

technical

AuthN vs AuthZ: Why API Security Quietly Fails in Production

API security is often considered solved once tokens are validated. In production, the real risks begin when authorization decisions are made in the wrong layer. This article examines why AuthN vs AuthZ is an architectural concern, not just a security detail.

Selim Çalık•Jan 25, 2026

Back to blog

technical

Why p95 and p99 Matter More Than Average Latency

Selim Çalık

Author

•

Feb 3, 2026

5 min

They happen at the edges.

That is why in high-traffic, mission-critical systems, percentile latency—especially p95 and p99—provides a far more accurate picture of system behavior than simple averages.

p95 represents the latency under which 95% of requests complete, while p99 reflects the experience of the slowest 1%. This small percentage is where most real-world failures originate.

Why Averages Are Misleading

Averages hide outliers. If 99 requests complete in 50 ms and one takes 5 seconds, the average still looks acceptable. But for some users, the system feels broken.

This discrepancy explains why teams often see healthy dashboards while customers report timeouts and degraded experiences.

In API gateways, IAM layers, payment flows, and integration platforms, these outliers tend to cascade across services, amplifying the impact.

p95 vs p99: When to Use Which

p95 is useful for understanding general system performance. It reflects what most users experience. However, in regulated and high-availability environments, p95 alone is insufficient.

p99 captures rare but critical events:

Timeouts
Circuit breaker activations
Retry storms
Thread pool exhaustion

These events almost always begin in the p99 range. Monitoring p99 allows teams to detect problems before they escalate into outages.

p99 from an API Gateway Perspective

API gateways are often blamed for latency issues, but in practice they usually expose problems rather than cause them.

Common scenarios include:

Rare but extremely slow downstream responses
Intermittent latency in IAM or token validation flows
Low cache miss rates with very expensive misses
Traffic approaching rate limits

In all these cases, average latency remains stable while p99 degrades sharply. Observing p99 at the gateway layer provides early visibility before application teams notice the issue.

Defining SLAs and SLOs with p99

Mature organizations avoid defining SLAs based on averages. Instead, they rely on percentile-based targets:

SLAs defined on p95 or p99
Tighter SLOs for internal objectives
Error budgets calculated from p99 violations

For example:

“API response time p99 < 800 ms”
“Monthly error budget calculated on p99 timeouts”

This approach aligns performance metrics with real user experience rather than theoretical averages.

Conclusion

p95 and p99 do not measure how fast your system is on good days. They measure how reliable it is on bad days.

In production systems with high traffic and complex dependencies, p99 is not just a metric—it is an operational instinct.

Because users never experience the average. They experience your system at its slowest moments.

Tagged with:

p95 latency p99 latency API performance Production monitoring API Gateway Observability Performance metrics

technical

AuthN vs AuthZ: Why API Security Quietly Fails in Production

Selim Çalık•Jan 25, 2026

Why p95 and p99 Matter More Than Average Latency

Why Averages Are Misleading

p95 vs p99: When to Use Which

p99 from an API Gateway Perspective

Defining SLAs and SLOs with p99

Conclusion

Related Articles

AuthN vs AuthZ: Why API Security Quietly Fails in Production

Why p95 and p99 Matter More Than Average Latency

Why Averages Are Misleading

p95 vs p99: When to Use Which

p99 from an API Gateway Perspective

Defining SLAs and SLOs with p99

Conclusion

Related Articles

AuthN vs AuthZ: Why API Security Quietly Fails in Production