March 9, 2026

Story points might be the most misunderstood concept in software. Some teams swear by them. Others think they're a scam. The debate says more about how teams use them than about the unit itself.
At a recent DeveloperWeek conference, a panel on estimation methods turned into one of the most heated sessions of the day. Half the room argued that story points are the only honest way to estimate software. The other half argued they're a layer of abstraction that exists to avoid accountability. Both sides had a point, which is exactly why the conversation matters.
The problem with story points isn't the unit. It's that most teams use them as a translation layer for hours, which defeats the entire purpose.
Story points exist because hours are a lie. Not a deliberate lie. A structural one.
When you estimate in hours, you're making a prediction about how long something will take a specific person under specific conditions. But you rarely know who will do the work at the point when the estimate matters most. A task that takes two hours for an engineer who's built the same integration before takes eight hours for one who hasn't. Hours create false precision. They make the estimate look concrete when the inputs are vapor.
Story points solve this by measuring relative complexity instead of absolute time. A feature rated at 8 points is roughly twice as complex as one rated at 4, regardless of who builds it. The unit captures effort, risk, and uncertainty in a single number, without pretending to know how long the clock will run.
The story-points-versus-hours debate isn't academic. It reflects a genuine tension in how software gets planned, priced, and delivered.
The case for story points: Relative estimation is something humans are naturally good at. You can easily tell that one task is twice as complex as another, even if you can't predict how many hours either will take. Story points remove the pressure of false precision and let teams focus on the shape of the work rather than the clock. Over time, a team's velocity, the number of points they consistently complete per sprint, becomes a reliable forecasting tool.
The case for hours: Hours are universal. Stakeholders understand them. Contracts are written in them. Budgets are built on them. And critically, you can bill for them. Ron Jeffries, one of the people involved in the team that first used story points, has publicly expressed regret about the unintended consequences of the concept, noting that many teams misuse them in ways that create more confusion than clarity.
Here's where the industry went wrong: it conflated estimation with billing. Story points are better for estimation, because they capture complexity without assuming who does the work or how fast. Hours are better for billing, because they're a concrete unit that maps to payroll and contracts. The problem is that teams try to use story points for both, which leads to the inevitable, destructive question: "how many hours is one story point?"
The answer is: don't. The moment you define one story point as a fixed number of hours, you've eliminated the main benefit, which is that the number doesn't depend on who does the work.
How Story Points Actually Work
If you're new to story points, here's the practical mechanics.
The Fibonacci scale. Most teams estimate using a modified Fibonacci sequence: 1, 2, 3, 5, 8, 13, 21. The gaps between numbers grow larger as complexity increases, which reflects a real pattern: the bigger and more uncertain a task is, the less precision you can claim. If you're debating whether something is a 7 or an 8, call it 8. The scale is designed to prevent false precision, not create it.
Reference stories. The scale only works if the team shares a common understanding of what a "3" means versus a "5." The best way to calibrate is to pick a recently completed story that everyone agrees was moderate complexity. Call it a 5. Everything else gets estimated relative to that reference. "Is this about the same? Twice as big? Half?"
T-shirt sizing. For early-stage estimation, before a team is assembled or a backlog exists, some teams use S/M/L/XL instead of numbers. This is cruder but useful for epics and high-level roadmap conversations where you need directional sizing, not sprint-level precision.
The hours-to-story-points conversion trap. Do not create a conversion table. Not "1 point = 4 hours." Not "1 point = 1 ideal day." The moment you pin story points to a time unit, you've rebuilt the same broken system you were trying to escape. Different engineers will complete the same 5-point story in different amounts of time. That's the whole point. Story points measure the work, not the worker.
Traditional story pointing requires a team that has worked together long enough to calibrate. You need shared context, a reference story everyone has actually built, and enough sprints behind you to have a reliable velocity. For established teams doing sprint planning, this works.
But there's a massive gap: what about before the team exists? Before the first sprint? Before you've even hired a vendor?
This is where AI-assisted estimation changes the model. Instead of asking five engineers to play planning poker, an AI estimator takes your project description and assigns story points based on patterns across thousands of comparable projects. The calibration isn't internal to your team. It's external, drawn from historical data across many teams and many builds.
The output is still story points. But the dependency is different. You don't need a team assembled. You don't need three sprints of velocity data. You get a structured breakdown, features decomposed into tasks with story point ranges, fast enough to inform budgeting conversations before you've committed to anything.
The limitation is real: AI estimation won't capture your team's specific quirks, your legacy codebase, or the political dynamics of your organization. It's a calibrated starting point, not a sprint plan. But for pre-project budgeting, vendor evaluation, and scope validation, it fills a gap that no planning poker session can. (For a comparison of the tools available, see our overview of software estimation tools).
If you're hearing about story points for the first time, the question in your head is probably: "Great, but what does this cost me?"
Story points measure relative complexity, not time. A feature sized at 8 points is roughly twice as complex as one sized at 4. They separate "how hard is this?" from "who's building it and how fast do they work?" That's what makes them useful for budgeting without converting to hours.
Here's how the translation works. Divide your total scope by the team's velocity (story points delivered per sprint) to get the number of sprints. Multiply sprints by the team's cost per sprint, and you have a budget range.
The math is simple. The inputs are the hard part. If you haven't started building yet, you don't have velocity data. You have a vendor's claim, which is a different thing. Velocity varies by team composition, tech stack familiarity, and project phase. So anchor on what you can know: the sprint cost. Then use story points to bound how many sprints the project takes. Ask vendors what velocity they sustained on comparable projects. Assume it could be 20% lower for a new engagement. That gives you a range, not a point estimate.
Two traps. Don't compare story points across teams. One team's 5-point story is not the same as another's. Compare at the sprint-cost level. And don't treat cost-per-point as a universal benchmark. Any figure you find online is an artifact of a specific team and rate structure. Using someone else's cost-per-point to budget your project is like using someone else's grocery bill to plan your restaurant.
Outcome-based pricing takes this further. Instead of billing for sprints or hours, the cost ties to defined story points of delivered work. You're paying for complexity that's been scoped and measured, not for time spent. If the vendor delivers faster, nobody is penalized. The incentives shift from "how long did this take?" to "did this get built?"
The Fraction estimator outputs story points for exactly this reason. Run a project description through it and you get features broken into tasks, with story point ranges and cost bands. That's the unit of currency for outcome-based pricing: defined work, not open-ended hours.
Story points aren't broken. The way most teams use them is.
They break when you convert them to hours. They break when you compare velocity across teams. They break when managers use them as a productivity metric for individual developers. And they break when they become a political tool rather than a planning one.
They work when you treat them as a language for complexity. When the team owns the calibration. When the discussion during planning poker matters more than the number on the card. And when the output feeds a budget conversation grounded in velocity, not a timesheet.
Story points work when you stop treating them as a translation layer for hours and start treating them as what they are: the most honest unit we have for saying "this is how hard the work is, regardless of who does it."
How do you calculate story points?
You don't calculate them like a formula. Story points are assigned through relative estimation: the team compares each piece of work to a reference story and asks, "Is this bigger, smaller, or about the same?" Most teams use a Fibonacci scale (1, 2, 3, 5, 8, 13) where the gaps between numbers reflect increasing uncertainty. The key is consensus: the team discusses, debates, and agrees. Over time, the team's velocity (points completed per sprint) becomes the real metric for planning.
Can you convert story points to hours?
You can, but you shouldn't. The moment you define one story point as a fixed number of hours, you've eliminated the benefit of relative estimation. Different team members will complete the same 5-point story in different amounts of time, and that's expected. If stakeholders need a time-based answer, use velocity: "We complete 25 points per sprint, the project is 200 points, so roughly 8 sprints."
What's the difference between story points and T-shirt sizing?
Both are forms of relative estimation. Story points use a numerical scale (usually Fibonacci) and are better for sprint-level planning where precision matters. T-shirt sizes (S, M, L, XL) are better for high-level roadmap conversations and early-stage scoping where you need directional sizing without false precision. Some teams use T-shirt sizing for epic-level planning and story points for sprint-level estimation.
How does AI estimate story points without a team?
AI estimation assigns story points by pattern-matching your project description against historical data from thousands of comparable projects. It doesn't know your team's velocity or your specific codebase, but it can identify that "user authentication with SSO" is consistently more complex than "static content page" across many builds. The result is a structured starting point, not a final sprint plan, that you can use for budgeting and vendor evaluation before a team is assembled.
Related: How much does custom software cost?
Mountain Goat Software. "What Are Story Points and Why Do We Use Them?"
Mountain Goat Software. "Agile Estimation at Scale Using Story Points."