<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://ahmednadar.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://ahmednadar.com/" rel="alternate" type="text/html" /><updated>2026-03-14T20:19:49+00:00</updated><id>https://ahmednadar.com/feed.xml</id><title type="html">Ahmed Nadar</title><subtitle>Rails developer and product builder in Toronto. I pair Ruby on Rails with AI tools to ship production apps at startup speed.</subtitle><author><name>Ahmed Nadar</name></author><entry><title type="html">Toronto deserves better than 311</title><link href="https://ahmednadar.com/toronto-deserves-better-than-311/" rel="alternate" type="text/html" title="Toronto deserves better than 311" /><published>2026-03-10T00:00:00+00:00</published><updated>2026-03-10T00:00:00+00:00</updated><id>https://ahmednadar.com/toronto-deserves-better-than-311</id><content type="html" xml:base="https://ahmednadar.com/toronto-deserves-better-than-311/"><![CDATA[<p>Three months ago I watched a woman on Bloor Street step around the same pothole I reported by email three weeks earlier. She looked at it, shook her head, and kept walking. She’s not going to call 311. Nobody is.</p>

<p>I know because I tried. I called 311 to report a cracked sidewalk near my neighbourhood. Fifteen minutes on hold. Then I described the location verbally to an operator who typed it into a form I couldn’t see. No photo. No GPS coordinates. No confirmation it went to the right department. I hung up and thought: that’s the last time I’m doing this.</p>

<p>I tried the city’s online reporting tool. Six minutes to fill out forms, describe the location in text, select from dropdowns that didn’t quite match my issue, and upload a photo through a clunky interface. Six minutes for a pothole.</p>

<p>Most people won’t spend six minutes. Most people won’t spend one minute. So the pothole stays. The graffiti stays. The broken streetlight stays. Not because nobody cares. Because the process punishes you for caring.</p>

<p>This is why I built SolveTO.</p>

<h2 id="the-math-behind-the-problem">The math behind the problem</h2>

<p>Toronto received 6,839 pothole reports in February 2026 alone. A fivefold increase from last year. The city repaired 257,000 potholes in 2025. The scale is massive.</p>

<p>But here’s the number that matters: each 311 phone report costs the city $12 to $15 in call center staff time. That’s operator wages, phone infrastructure, data entry, and ticket routing. Multiply that by hundreds of thousands of calls per year.</p>

<p>In the UK, Buckinghamshire Council measured the exact same problem. Email-based reports cost them $12.50 each. When they switched to FixMyStreet, an online civic reporting platform, that dropped to $0.15. The City of Melbourne in Australia saw a 17% improvement in citizen satisfaction scores after adopting Snap Send Solve, a similar platform.</p>

<p>These aren’t hypothetical gains. Cities worldwide are replacing phone-based reporting with digital platforms and seeing real results. Toronto is behind.</p>

<h2 id="what-i-built">What I built</h2>

<p><a href="https://solveto.ca?utm_source=toronto-deserves-better&amp;utm_medium=blog">SolveTO</a> is a free civic reporting platform for Toronto. It’s been live since early February 2026, covering all 25 wards. It costs the city nothing.</p>

<p>Here’s how it works.</p>

<p>You open solveto.ca on your phone or desktop, take a photo of the issue and submit it. That’s it. AI identifies what it is, pothole, graffiti, broken streetlight, illegal dumping, one of 24 categories. It rates the severity. It writes a structured report. GPS pins your exact location and maps it to the correct ward and postal code. The report goes to 311@toronto.ca and your ward councillor simultaneously.</p>

<p><strong>Thirty seconds. No phone tree. No forms. No hold music.</strong> It’s that simple.</p>

<p>The report appears on a public live map that anyone can see. Other residents can verify the issue exists. When it’s fixed, anyone can submit a resolution photo. Everything is tracked, transparent, and public.</p>

<h2 id="what-the-live-map-shows-you">What the live map shows you</h2>

<p>The homepage is a full-screen interactive map of Toronto with every report plotted in real time. It refreshes every five minutes.</p>

<p>You can filter by any of the 24 issue categories, each showing a live count. Filter by time: last 24 hours, 48 hours, 7 days, 30 days. Filter by ward, click any of the 25 wards to see its boundary drawn on the map and all data filtered to that ward. Toggle between open and closed reports, each showing how many exist.</p>

<p>Every filter combination is encoded in the URL. So you can bookmark “all open potholes in Ward 11 from the last 7 days” and share that link with your councillor. The charts update live as you filter. The sidebar updates. The counts update. Everything is connected.</p>

<p>This isn’t a static dashboard. It’s a live accountability tool.</p>

<h2 id="what-councillors-and-the-city-get">What councillors and the city get</h2>

<p>For councillors, SolveTO provides something that doesn’t exist today: a real-time view of what’s broken in their ward.</p>

<p>Ward report cards show total reports, open vs resolved, resolution rate, average response time, and top issue types. There’s a downloadable PDF for council meetings and budget requests. Councillor effectiveness is publicly visible, ranked by responsiveness and resolution rate.</p>

<p>For city operations, every report arrives structured and categorized. AI identifies the responsible department, Transportation, Parks, Solid Waste, Toronto Hydro, Municipal Licensing, or Parking, so reports can be triaged instantly. Reports include photos, GPS coordinates, severity ratings, and safety risk flags. No phone call. No manual data entry.</p>

<p>Duplicate detection clusters reports at the same location within 50 metres. If three people report the same pothole, the city sees one cluster, not three separate work orders.</p>

<h2 id="the-features-that-matter">The features that matter</h2>

<p><strong>Camera-first reporting.</strong> Take a <a href="https://solveto.ca/report">photo</a> and AI from <a href="https://www.anthropic.com/?utm_source=toronto-deserves-better&amp;utm_medium=blog">anthropic</a> handles the rest. No typing required unless you want to add description. Works on any phone with a browser. No app to download.</p>

<p><strong>24 issue categories.</strong> Potholes, graffiti, illegal dumping, illegal signage, streetlights, tree issues, snow and ice, sidewalk damage, abandoned vehicles, parking violations, and 15 more. AI picks the right one.</p>

<p><strong>Face redaction.</strong> AI automatically detects and blurs faces in photos before they’re stored or delivered. Bystander privacy is protected by default, not as an afterthought.</p>

<p><strong>Duplicate detection.</strong> If someone already reported the same issue within 50 metres in the last 30 days, you’ll see it before submitting. This prevents redundant 311 tickets and helps prioritize repeat complaints.</p>

<p><strong>Community verification.</strong> Neighbours can confirm a report: “confirmed,” “still here,” or “fixed.” Democratic validation, not just individual complaints.</p>

<p><strong>Outside-Toronto alerts.</strong> If your GPS falls outside city limits, you’re warned before submitting. No wasted reports.</p>

<p><strong>Public analytics.</strong> Open at <a href="https://solveto.ca/analytics">solveto.ca/analytics</a>. KPIs, resolution rates, response times, SLA compliance. Ward leaderboards. Councillor effectiveness scores. All public.</p>

<p><strong>Works everywhere.</strong> Desktop and mobile. Add it to your home screen and it works like a native app. No app store needed. Works offline too, your report saves locally and syncs when you’re back online.</p>

<h2 id="whats-next">What’s next</h2>

<p>SolveTO is live and serving Toronto residents today. Anyone can report an issue right now and it reaches 311 and their ward councillor for FREE. That works. But it’s only half the picture.</p>

<p>Platforms like <a href="https://www.fixmystreet.com/">FixMyStreet Pro</a> in the UK and <a href="https://www.snapsendsolve.com/">Snap Send Solve</a> in Australia started the same way: citizens reporting, cities receiving. Then cities partnered with them. FixMyStreet Pro now serves 30+ UK councils. Snap Send Solve processes reports for 850+ organizations across Australia and New Zealand. Both operate as city partners, not volunteers. The city contracts the platform, gets direct integration with its internal systems, and the reporting pipeline becomes part of city operations.</p>

<p>That’s the model. SolveTO works on its own, but without city partnership it hits a ceiling. Reports go to 311 by email. We can’t connect directly to the city’s work order systems. We can’t close the loop and tell residents when their pothole is actually fixed. We can’t route reports straight into departmental queues. The platform is ready for all of that. The city just has to say yes.</p>

<p>If you’re a resident, use it. Report what’s broken. The more reports that land on city desks through SolveTO, the harder it becomes to ignore.</p>

<p>If you’re in city operations or on council, let’s talk. This isn’t a side project. It’s infrastructure that already works, built to integrate with yours.</p>

<p><a href="https://solveto.ca?utm_source=toronto-deserves-better&amp;utm_medium=blog">solveto.ca</a> — thirty seconds, that’s all it takes.</p>

<h2 id="by-the-numbers">By the numbers</h2>

<table>
  <thead>
    <tr>
      <th>What</th>
      <th>Number</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Issue categories</td>
      <td>24</td>
    </tr>
    <tr>
      <td>Toronto wards covered</td>
      <td>25</td>
    </tr>
    <tr>
      <td>Time to report</td>
      <td>~30 seconds</td>
    </tr>
    <tr>
      <td>AI analysis time</td>
      <td>~5 seconds</td>
    </tr>
    <tr>
      <td>Delivery per report</td>
      <td>311 + ward councillor</td>
    </tr>
    <tr>
      <td>Duplicate detection radius</td>
      <td>50 metres</td>
    </tr>
  </tbody>
</table>

<hr />

<p>If you live in Toronto, use <a href="https://solveto.ca?utm_source=toronto-deserves-better&amp;utm_medium=blog">SolveTO</a>. Report what you see. Report it again when it’s not fixed. Share it with your neighbours. Tell your councillor it exists. Bring it up at community meetings. The more residents who report, the more data the city can’t ignore.</p>

<p>If you work for the City of Toronto or any other Canadian city, SolveTO is ready to integrate with your systems today. I’m open for business. <a href="https://ahmednadar.com/contact/">Get in touch</a>.</p>]]></content><author><name>Ahmed Nadar</name></author><category term="solveto" /><category term="toronto" /><category term="civic-tech" /><category term="ai" /><summary type="html"><![CDATA[Reporting a pothole shouldn't take 15 minutes on hold. SolveTO replaces 311 with a 30-second photo-to-report flow that costs the city nothing.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://ahmednadar.com/assets/images/posts/og-toronto-deserves-better-than-311.png" /><media:content medium="image" url="https://ahmednadar.com/assets/images/posts/og-toronto-deserves-better-than-311.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Toronto wants AI for potholes… I already built it</title><link href="https://ahmednadar.com/toronto-wants-ai-for-potholes-we-already-built-it/" rel="alternate" type="text/html" title="Toronto wants AI for potholes… I already built it" /><published>2026-03-08T00:00:00+00:00</published><updated>2026-03-08T00:00:00+00:00</updated><id>https://ahmednadar.com/toronto-wants-ai-for-potholes-we-already-built-it</id><content type="html" xml:base="https://ahmednadar.com/toronto-wants-ai-for-potholes-we-already-built-it/"><![CDATA[<p>Today the Toronto Sun <a href="https://torontosun.com/news/local-news/toronto-ai-faster-pothole-repairs?utm_source=toronto-wants-ai-for-potholes&amp;utm_medium=blog">reported</a> that Mayor Chow’s executive committee is exploring AI to speed up pothole repairs. Councillor Paul Ainslie proposed a motion for an AI-driven “pothole blitz strategy” — claiming it could be live within a week at no cost.</p>

<p>I read that and thought: we launched this three weeks ago. <a href="https://solveto.ca?utm_source=toronto-wants-ai-for-potholes&amp;utm_medium=blog">SolveTO</a> is a free platform where anyone in Toronto can photograph a civic issue — pothole, broken sidewalk, graffiti, damaged infrastructure — and AI handles the rest.</p>

<h2 id="what-solveto-does-right-now">What SolveTO does right now</h2>

<p><a href="https://solveto.ca?utm_source=toronto-wants-ai-for-potholes&amp;utm_medium=blog">SolveTO</a> is a free platform where anyone in Toronto can photograph a civic issue — pothole, broken sidewalk, graffiti, damaged infrastructure — and AI handles the rest.</p>

<p>Here’s the flow:</p>

<ol>
  <li><strong>Snap a photo.</strong> The app captures your location automatically.</li>
  <li><strong>AI analyzes it.</strong> Within seconds, it classifies the issue type, assesses severity, and generates a structured report.</li>
  <li><strong>Auto-routed.</strong> The report goes to 311 and the ward councillor’s office. No phone trees. No 45-minute hold times.</li>
  <li><strong>Public accountability.</strong> Every report is public. Resolution rates are tracked per ward. Councillor performance is visible to everyone.</li>
</ol>

<p>It’s been live since early February. We have reports filed across Toronto and every one of the 25 wards is covered.</p>

<h2 id="the-numbers-that-matter">The numbers that matter</h2>

<p>The Sun article cited 6,839 pothole reports in February alone — a fivefold increase from last year. The city repaired 257,000 potholes in 2025. The scale is massive and growing.</p>

<p>The bottleneck isn’t repair crews. It’s reporting. Calling 311, waiting on hold, describing the location, hoping someone logs it correctly. That’s the part AI can eliminate today.
I tried the city online solution and it was a nightmare. I had to describe the location, take a photo, and fill out several forms. It took me 6 minutes to report a pothole.</p>

<p>SolveTO already eliminates it. A photo and 10 seconds is all it takes.</p>

<h2 id="what-makes-this-different-from-311">What makes this different from 311</h2>

<p>311 is reactive. You call, you wait, you describe. Someone logs it. Maybe it gets routed correctly.</p>

<p>SolveTO is proactive. Computer vision identifies the issue. GPS pins the location. The report is structured, categorized, and delivered before you’ve put your phone back in your pocket.</p>

<p>We also added community verification — neighbours can confirm an issue exists, which helps the city prioritize. And duplicate detection clusters reports at the same location so crews aren’t dispatched twice.</p>

<h2 id="the-real-problem-isnt-technology">The real problem isn’t technology</h2>

<p>Councillor Ainslie is right that AI can help. But the hard part isn’t building the tech. The hard part is getting citizens to use it and trust it.</p>

<p>That’s why SolveTO is designed as a public accountability tool, not just a reporting tool. Every report is visible. Every ward’s resolution rate is public. Councillors can’t quietly ignore reports when their constituents can see the scoreboard.</p>

<h2 id="whats-next">What’s next</h2>

<p>I’m reaching out to Councillor Ainslie’s office directly. If the city wants AI-powered pothole reporting, it exists. It’s free. It works today.</p>

<p>If you’re in Toronto and you see a pothole, try it: <a href="https://solveto.ca?utm_source=toronto-wants-ai-for-potholes&amp;utm_medium=blog">solveto.ca</a></p>]]></content><author><name>Ahmed Nadar</name></author><category term="solveto" /><category term="toronto" /><category term="ai" /><category term="civic-tech" /><summary type="html"><![CDATA[The city is exploring AI to fix pothole reporting. SolveTO has been doing it since February.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://ahmednadar.com/assets/images/og-default.png" /><media:content medium="image" url="https://ahmednadar.com/assets/images/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Why contractors lose $50K a year answering the phone too late</title><link href="https://ahmednadar.com/why-contractors-lose-50k-a-year-answering-the-phone-too-late/" rel="alternate" type="text/html" title="Why contractors lose $50K a year answering the phone too late" /><published>2026-03-01T00:00:00+00:00</published><updated>2026-03-01T00:00:00+00:00</updated><id>https://ahmednadar.com/why-contractors-lose-50k-a-year-answering-the-phone-too-late</id><content type="html" xml:base="https://ahmednadar.com/why-contractors-lose-50k-a-year-answering-the-phone-too-late/"><![CDATA[<p>It’s 2pm on a Tuesday. A homeowner in Mississauga just submitted a quote request for a kitchen renovation. They found the contractor’s website, liked the portfolio, filled out the form.</p>

<p>The contractor is on a roof three towns over. Phone is in the truck. He won’t see the notification until 5pm when he’s driving home. Or busy working and doesn’t check his phone until end of the day. By then the homeowner has submitted the same request to two other contractors. One of them replied in four minutes. They’re already booked for a Thursday walkthrough.</p>

<p>This happens every single day.</p>

<h2 id="the-numbers-are-brutal">The numbers are brutal</h2>

<p>Harvard Business Review and InsideSales.com studied 2.24 million leads across industries. What they found:</p>

<p>Responding within 5 minutes makes you <strong>21 times more likely</strong> to qualify a lead than waiting 30 minutes. Wait an hour and you’re already 60 times less likely to convert than if you’d responded in the first five minutes. Wait 24 hours and it’s basically over.</p>

<p>And the construction industry? The average response time is <strong>42 to 47 hours</strong>.</p>

<p>Let that sink in. Almost two full days to respond to someone who is ready to spend money right now.</p>

<p>Here’s the rest of the picture. <strong>78% of customers buy from whoever responds first.</strong> Not whoever has the best reviews. Not whoever has the lowest price. Whoever picks up the phone first. And <strong>85% of customers won’t even leave a voicemail.</strong> They just call the next contractor on the list.</p>

<p>For a remodeling contractor running a decent lead pipeline, slow follow-up costs over <strong>$50,000 a year</strong> in lost jobs. That’s not a guess. Work backward from average job values, lead volume, and conversion drops at each delay interval. The money is disappearing while the phone sits in the truck.</p>

<p>Only <strong>4.7% of companies</strong> across all industries respond to leads within 5 minutes. In contracting, that number is even lower.</p>

<h2 id="why-the-usual-fixes-dont-work">Why the usual fixes don’t work</h2>

<p>The typical advice is “set up an autoresponder” or “get a CRM.” Neither solves the real problem.</p>

<p>Autoresponders send something like “Thanks for reaching out! We’ll get back to you within 24 hours.” Boring. No values and no personality. The customer reads that and thinks “so you’re slow, got it” and keeps calling other contractors. A generic auto-reply doesn’t qualify the lead, doesn’t answer their question, and doesn’t make them feel like someone is actually paying attention.</p>

<p>CRMs are worse. They assume the contractor is going to sit down at a laptop, log into a dashboard, and manage a sales pipeline. Contractors are not at laptops. They’re on ladders. They’re under houses. They’re covered in drywall dust. The problem with speed-to-lead(STL) systems in contracting isn’t that people don’t know it matters. Everyone knows fast response wins jobs. The problem is <strong>access</strong>. The contractor physically cannot get to the tools that are supposed to help them.</p>

<h2 id="what-i-built">What I built</h2>

<p>I spent a weekend building a system focused on one thing: meet contractors where they already are. Their phone. Specifically, Telegram.</p>

<p>Here’s what happens when a lead comes in.</p>

<p>A homeowner fills out a simple contact form. <strong>Within seconds</strong>, the contractor gets a Telegram notification on their phone with the lead details. Name, phone number, email, what service they need, how urgent it is, and their message. Everything they need to decide if this is worth pursuing.</p>

<p>At the same time, AI sends the customer an <strong>intelligent email response immediately</strong>. Not a generic “thanks for reaching out.” An actual response that acknowledges their specific request, asks a relevant follow-up question, and makes them feel like someone is paying attention. The customer gets a real reply in their inbox instead of waiting in a void.</p>

<p>The contractor, still on the roof, glances at their phone. They see the lead summary and the urgency level. Emergency leads get flagged hard so nothing critical gets missed. They know exactly what this lead needs and can decide on next steps right from Telegram. No app to open. No dashboard to log into.</p>

<p>From Telegram, they can:</p>
<ul>
  <li>Type <code class="language-plaintext highlighter-rouge">/leads</code> to see all waiting leads</li>
  <li>Type <code class="language-plaintext highlighter-rouge">/done</code> to mark a lead as contacted</li>
  <li>Type <code class="language-plaintext highlighter-rouge">/book</code> to schedule an appointment</li>
  <li>Type <code class="language-plaintext highlighter-rouge">/close</code> to mark a job as won</li>
  <li>Type <code class="language-plaintext highlighter-rouge">/stats</code> to see today’s numbers</li>
</ul>

<p>Additionally, every morning they get a <strong>briefing</strong> with overnight leads, today’s scheduled jobs, and an AI-generated summary of what needs attention. If a lead has been sitting for two hours without contact, the system sends a follow-up reminder automatically.</p>

<p>The entire workflow lives inside an app they already have on their phone. Zero friction.</p>

<h2 id="before-and-after">Before and after</h2>

<p><strong>Before STL Agent:</strong> A lead comes in at 2pm. The contractor doesn’t see it until 5pm. They make a mental note to call back tomorrow. Tomorrow they forget. The customer hired someone else two days ago. The contractor never even knew they lost the job.</p>

<p><strong>After STL Agent:</strong> A lead comes in at 2pm. AI emails the customer within seconds with an intelligent qualifying response. The contractor gets a Telegram ping moments later with the lead details and urgency level. They glance at the summary between tasks, see the customer needs a kitchen reno, and call them back on their next break. The customer, who was about to call another contractor, already has a thoughtful email in their inbox. The contractor books the walkthrough that evening.</p>

<p>The gap between those two scenarios is tens of thousands of dollars a year.</p>

<h2 id="whats-coming-next">What’s coming next</h2>

<p>The current system handles web leads through Telegram. That’s the foundation. What’s coming:</p>

<p><strong>AI phone calls.</strong> When a lead calls and the contractor can’t answer, AI picks up, has a natural conversation, qualifies the project, and books the appointment. The contractor gets a transcript and summary on Telegram.</p>

<p><strong>Outbound SMS.</strong> Automated text follow-ups to leads who haven’t responded, timed based on engagement patterns.</p>

<p><strong>Voice transcription.</strong> Every voicemail and call automatically transcribed, summarized, and delivered to Telegram.</p>

<p>The goal is simple. No lead should ever have to wait.</p>

<h2 id="how-its-built">How it’s built</h2>

<p>Built with <a href="https://rapidfy.dev?utm_source=stl-agent&amp;utm_medium=blog">Rapidfy</a>. AI agent build Rails applications in seconds. Claude powers the AI responses and lead qualification. Telegram Bot API delivers notifications and receives commands. The whole thing runs on a single server with SQLite. It’s deliberately simple because reliability matters more than architecture when someone’s livelihood depends on it.</p>

<h2 id="take-a-look">Take a look</h2>

<p>If you’re a contractor losing leads to slow follow-up, or you work with contractors and see this problem every day, <a href="https://stl.ahmednadar.com">take a look at STL Agent</a>. It’s live and working.</p>

<p>The math is straightforward. If faster response wins more jobs, and the barrier to faster response is access, then the fix is meeting contractors where they already are. Everything else is noise.</p>

<hr />

<p><img src="https://ahmednadar.com/assets/images/posts/stl-agent-dashboard.png" alt="STL Agent Dashboard" />
<em>STL Agent Dashboard</em></p>

<p><img src="https://ahmednadar.com/assets/images/posts/stl-agent-telegram.png" alt="Telegram Notification" />
<em>Telegram Notification</em></p>

<p><img src="https://ahmednadar.com/assets/images/posts/stl-agent-telegram-morning-briefing.png" alt="Telegram Morning Briefing" />
<em>Telegram Morning Briefing</em></p>]]></content><author><name>Ahmed Nadar</name></author><category term="business" /><category term="contractors" /><category term="ai" /><summary type="html"><![CDATA[Contractors lose thousands every year because they can't respond fast enough. I built a system that fixes that.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://ahmednadar.com/assets/images/og-default.png" /><media:content medium="image" url="https://ahmednadar.com/assets/images/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Context window economics for AI agents</title><link href="https://ahmednadar.com/context-window-economics-for-ai-agents/" rel="alternate" type="text/html" title="Context window economics for AI agents" /><published>2026-02-24T00:00:00+00:00</published><updated>2026-02-24T00:00:00+00:00</updated><id>https://ahmednadar.com/context-window-economics-for-ai-agents</id><content type="html" xml:base="https://ahmednadar.com/context-window-economics-for-ai-agents/"><![CDATA[<p>Your agent has a 200k token context window. You might think that means 200k tokens of useful work. It doesn’t.</p>

<p>After model overhead, system prompts, tool definitions, autocompact buffer, memory files, SKILL files, and accumulated conversation history, you might have half that for actual reasoning. Add an MCP server or two and you’re down to a third.</p>

<p>Many developers building with agents never measure this. They add tools, extend prompts, wire up MCP servers, and wonder why the agent starts forgetting instructions halfway through a task. The answer is always the same, you’re taxing your agent’s brain and not tracking the bill.</p>

<h2 id="where-tokens-actually-go">Where tokens actually go</h2>

<p>Here’s a rough breakdown from my agent system:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: left">Category</th>
      <th style="text-align: left">Approximate cost</th>
      <th style="text-align: left">Notes</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left">Model overhead</td>
      <td style="text-align: left">~24k tokens</td>
      <td style="text-align: left">System prompt, safety instructions, built-in tool definitions</td>
    </tr>
    <tr>
      <td style="text-align: left">Your harness prompt</td>
      <td style="text-align: left">~3-10k tokens</td>
      <td style="text-align: left">Instructions, spec file, task list</td>
    </tr>
    <tr>
      <td style="text-align: left">MCP server (each)</td>
      <td style="text-align: left">~20-30k tokens</td>
      <td style="text-align: left">Tool schemas, protocol overhead, response data</td>
    </tr>
    <tr>
      <td style="text-align: left">Conversation history</td>
      <td style="text-align: left">Grows per turn</td>
      <td style="text-align: left">Every tool call + response accumulates</td>
    </tr>
    <tr>
      <td style="text-align: left"><strong>Available for reasoning</strong></td>
      <td style="text-align: left"><strong>What’s left</strong></td>
      <td style="text-align: left">This is what determines agent quality</td>
    </tr>
  </tbody>
</table>

<p>A 200k context window with two MCP servers and a detailed prompt might leave 100-120k for actual work. That sounds like a lot. Then the agent hits iteration 8 and the conversation history has consumed another 60k.</p>

<p>I measured this by accident. My agent was performing well for 4-5 iterations, then quality would drop sharply. Features that should have taken one iteration started taking three. The agent would forget constraints from its instructions and start writing code that violated my coding standards.</p>

<p>The problem wasn’t the model. The problem was that by iteration 6, there wasn’t enough room left in the context window for the model to hold the instructions <em>and</em> reason about the code <em>and</em> remember what it had already done.</p>

<h2 id="the-mcp-tax">The MCP tax</h2>

<p>MCP servers are the biggest hidden cost in agent architectures. Each server adds three layers of overhead:</p>

<p><strong>Tool definitions.</strong> Every tool the server exposes gets serialized as a JSON schema in the context. A server with 10 tools might add 3,000-5,000 tokens just for the definitions. Before you ever call a tool.</p>

<p><strong>Protocol overhead.</strong> Handshake, capability negotiation, connection state. Small per-call, but it adds up across dozens of tool invocations per iteration.</p>

<p><strong>Response data.</strong> This is the killer. Tool outputs go directly into the context window. For most tools, responses are small (a file listing, a search result). For Puppeteer, a single screenshot costs roughly 3,000 to 6,000 tokens, depending on page size.</p>

<h3 id="what-mcp-puppeteer-actually-costs">What MCP Puppeteer actually costs</h3>

<p>I ran this experiment. One verification check, “are the routes healthy?”, in two formats.</p>

<p><strong>MCP Puppeteer (inside the agent loop):</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Agent: [tool_call] puppeteer_navigate { url: "http://localhost:3000/" }
Server: { status: "navigated", title: "Rapidfy" }
Agent: [tool_call] puppeteer_screenshot {}
Server: { screenshot: "data:image/png;base64,iVBORw0KGgoAAAA..." }
         ^^^ 3,000-6,000 tokens for ONE screenshot
Agent: "The page looks correct, I can see the header and navigation..."
</code></pre></div></div>

<p>Repeat for 5-10 routes per iteration. Total: <strong>15,000-60,000 tokens per verification pass</strong>, plus the MCP overhead that’s always there whether you call a tool or not.</p>

<p><strong>Text summary (outside the agent loop):</strong></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## Route Verification (14:23:13)
10 passed, 0 failed (of 10 routes)
OK   / -- 200 (49750B)
OK   /apps -- 200 (12637B)
...
</code></pre></div></div>

<p>Total: <strong>~200 tokens.</strong> Same signal. The agent knows all routes are healthy. It doesn’t need to <em>see</em> the pages to know they render.</p>

<p>That’s a 75-300x difference for the same information.</p>

<h2 id="the-one-window-one-goal-principle">The one-window-one-goal principle</h2>

<p>Geoffrey Huntley’s <a href="https://ghuntley.com/ralph">harness philosophy</a> can be summarized in one line:</p>

<blockquote>
  <p>One window, one goal. Reset between iterations.</p>
</blockquote>

<p>This means:</p>
<ul>
  <li>Each agent invocation gets a <strong>fresh</strong> context window</li>
  <li>The agent does ONE thing (implement one feature, fix one bug)</li>
  <li>All state lives <strong>on disk</strong> (files, git), not in conversation history</li>
  <li>Tools are minimal: code editing and shell access, nothing more</li>
</ul>

<p>My harness implements this literally. Each iteration is a fresh invocation:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">OUTPUT</span><span class="o">=</span><span class="si">$(</span>build_prompt | claude <span class="nt">-p</span> <span class="nt">--dangerously-skip-permissions</span> 2&gt;&amp;1 <span class="se">\</span>
         | <span class="nb">tee</span> <span class="s2">"</span><span class="nv">$LOG</span><span class="s2">"</span> <span class="o">||</span> <span class="nb">echo</span> <span class="s2">"ERROR"</span><span class="si">)</span>
</code></pre></div></div>

<p>The prompt is piped in. The agent runs. Output is captured. The loop checks for progress and either continues (fresh invocation) or stops. No accumulated conversation history across iterations.</p>

<p>This is why the circuit breaker from <a href="/7-patterns-for-long-running-agent-harnesses">Part 1</a> matters, each iteration starts clean, so the agent can’t spiral. And it’s why the progress file matters, since context resets between iterations, the agent needs something on disk to tell it where it left off.</p>

<h2 id="the-test-summary-injection-pattern">The test-summary injection pattern</h2>

<p>This is the single most impactful optimization in my harness. Instead of the agent running <code class="language-plaintext highlighter-rouge">bin/rails test</code> itself (which fills the context with 50-100 lines of raw test output), the harness pre-computes the results and injects a compact summary.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>build_prompt<span class="o">()</span> <span class="o">{</span>
    <span class="nb">local </span>prompt
    <span class="nv">prompt</span><span class="o">=</span><span class="si">$(</span><span class="nb">cat</span> <span class="s2">"</span><span class="nv">$PROMPT_FILE</span><span class="s2">"</span><span class="si">)</span>

    <span class="c"># Run tests OUTSIDE the agent, inject summary</span>
    ./test-summary.sh <span class="o">&gt;</span> /dev/null 2&gt;&amp;1 <span class="o">||</span> <span class="nb">true
    </span><span class="k">if</span> <span class="o">[</span> <span class="nt">-f</span> <span class="s2">".test-summary"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
        </span><span class="nv">prompt</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">prompt</span><span class="k">}</span><span class="s2">

## Current Test State (pre-computed, do not re-run unless you change code)
</span><span class="si">$(</span><span class="nb">cat</span> <span class="s2">".test-summary"</span><span class="si">)</span><span class="s2">"</span>
    <span class="k">fi

    </span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$prompt</span><span class="s2">"</span>
<span class="o">}</span>
</code></pre></div></div>

<p>The test-summary script extracts just the stats and failures from raw test output:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">RAW</span><span class="o">=</span><span class="si">$(</span>bin/rails <span class="nb">test </span>2&gt;&amp;1<span class="si">)</span> <span class="o">||</span> <span class="nb">true</span>

<span class="c"># Extract summary line (e.g. "34 runs, 84 assertions, 0 failures")</span>
<span class="nv">STATS</span><span class="o">=</span><span class="si">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$RAW</span><span class="s2">"</span> | <span class="nb">grep</span> <span class="nt">-E</span> <span class="s1">'^\d+ runs,'</span> | <span class="nb">tail</span> <span class="nt">-1</span><span class="si">)</span>

<span class="c"># Extract failure lines only (compact)</span>
<span class="nv">FAILURES</span><span class="o">=</span><span class="si">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$RAW</span><span class="s2">"</span> | <span class="nb">grep</span> <span class="nt">-E</span> <span class="s1">'(Failure|Error):$'</span> <span class="nt">-A</span> 2 | <span class="nb">head</span> <span class="nt">-30</span><span class="si">)</span>
</code></pre></div></div>

<p>Full test output: 80 lines. Summary: 3-15 lines. That’s an 80-90% token reduction on test state, every single iteration.</p>

<p><strong>The instruction that saves the most tokens:</strong></p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gu">## Orientation</span>
<span class="p">-</span> If a "Current Test State" section is appended below, use it to orient —
  don't re-run tests unless you change code
<span class="p">-</span> Focus on failures listed in the summary first
</code></pre></div></div>

<p>Two lines. Without them, the agent re-runs the entire test suite on its own, wasting tokens on raw output that the harness already summarized. With them, the agent trusts the pre-computed results and only runs tests after making changes.</p>

<p>Over 15 iterations, this saves roughly 25,000-40,000 tokens. That’s the difference between an agent that finishes its task and one that starts forgetting its instructions on iteration 10.</p>

<h2 id="what-to-keep-inside-vs-outside-the-context">What to keep inside vs outside the context</h2>

<p>Here’s the framework I use:</p>

<h3 id="keep-inside-the-context-window">Keep inside the context window</h3>
<ul>
  <li><strong>Task definition:</strong> what to build, acceptance criteria, test commands</li>
  <li><strong>Relevant code:</strong> files the agent needs to read and modify (on demand, not pre-loaded)</li>
  <li><strong>Pre-computed state:</strong> test results, route health, as compact text</li>
  <li><strong>Conventions:</strong> coding standards, commit format, project rules (short list)</li>
</ul>

<h3 id="externalize">Externalize</h3>
<ul>
  <li><strong>Raw test output:</strong> run tests outside, inject a summary</li>
  <li><strong>Route verification:</strong> check outside, inject pass/fail counts</li>
  <li><strong>Screenshots:</strong> don’t. Use text-based checks instead</li>
  <li><strong>Full file contents:</strong> let the agent read files on demand rather than front-loading everything</li>
</ul>

<p>The general principle, <strong>if it can be computed outside the context window and summarized in under 20 lines of text, do it outside.</strong></p>

<h2 id="practical-strategies">Practical strategies</h2>

<p><strong>1. Pre-compute everything you can.</strong> Test results, route checks, linting output. Run it in the harness, summarize it, inject it as text. The agent should consume results, not produce them.</p>

<p><strong>2. Use compact formats.</strong> <code class="language-plaintext highlighter-rouge">10 passed, 0 failed</code> beats a full test log. <code class="language-plaintext highlighter-rouge">OK /apps -- 200 (12637B)</code> beats a screenshot. Design your summaries for scanability.</p>

<p><strong>3. Reset between iterations.</strong> Don’t accumulate conversation history across agent invocations. Each iteration starts with a fresh context window and a current state snapshot from disk.</p>

<p><strong>4. Measure your overhead.</strong> Count the tokens in your system prompt, tool definitions, and injected context. If overhead exceeds around 40% of your context window, you’re leaving performance on the table.</p>

<p><strong>5. Question every MCP server.</strong> For each MCP server in your agent’s config, ask: “Can this be replaced with a shell script that produces a text summary?” If yes, do it. The token savings compound across every iteration.</p>

<p><strong>6. Tell the agent what not to do.</strong> “Do not re-run tests unless you change code” saves more tokens than any optimization technique. Agents are eager. They’ll redo work to be safe. Explicit instructions to trust pre-computed state prevent this.</p>

<h2 id="the-optimization-that-matters">The optimization that matters</h2>

<p>The best agent optimization isn’t faster models or bigger context windows. It’s spending fewer tokens on overhead so more tokens go to actual thinking.</p>

<p>Every MCP server you add is a tax on your agent’s ability to reason about the problem. Every raw test log you inject is noise that pushes out signal. Every screenshot is thousands of tokens that could have been ten lines of text.</p>

<p>The agent that ships the best code isn’t the one with the most tools. It’s the one with the most tokens left for reasoning.</p>

<p><em>This concludes the series “Building agent harnesses that work.” 
Start from <a href="/7-patterns-for-long-running-agent-harnesses">Part 1 — 7 Patterns for long-running agent harnesses</a> or revisit <a href="/where-verification-actually-belongs-in-agent-harnesses">Part 2 — Where verification actually belongs in agent harnesses</a>.</em></p>]]></content><author><name>Ahmed Nadar</name></author><category term="ai" /><category term="agents" /><category term="context-window" /><category term="harness" /><summary type="html"><![CDATA[Every tool, MCP server, and progress file you add to an agent costs tokens. Most teams don't measure this. Here's how to think about context allocation.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://ahmednadar.com/assets/images/og-default.png" /><media:content medium="image" url="https://ahmednadar.com/assets/images/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Where verification actually belongs in agent harnesses</title><link href="https://ahmednadar.com/where-verification-actually-belongs-in-agent-harnesses/" rel="alternate" type="text/html" title="Where verification actually belongs in agent harnesses" /><published>2026-02-23T00:00:00+00:00</published><updated>2026-02-23T00:00:00+00:00</updated><id>https://ahmednadar.com/where-verification-actually-belongs-in-agent-harnesses</id><content type="html" xml:base="https://ahmednadar.com/where-verification-actually-belongs-in-agent-harnesses/"><![CDATA[<p>Here’s a failure mode that took me too long to catch: the agent marks a feature as “done” because the test passes and the route returns HTTP 200. But the page is blank. Or it shows “We’re sorry, but something went wrong” inside a 200 response. Or the layout is completely broken but the controller test doesn’t care because it never rendered the actual page.</p>

<p>The agent doesn’t know. It can’t see the page. It sees a status code and moves on.</p>

<p>This is the E2E verification gap, and there are two competing philosophies on how to close it.</p>

<h2 id="the-problem-http-200-does-not-mean-correct">The problem: HTTP 200 does not mean “correct”</h2>

<p>A Rails controller test can pass while the actual rendered page is broken. Here’s why:</p>

<ul>
  <li>Controller tests mock the request cycle. They don’t start a real server</li>
  <li>A route can return 200 with a Rails exception page in the body</li>
  <li>A missing partial renders an empty page but no 500 error</li>
  <li>A bad database query returns the layout with no content</li>
</ul>

<p>I had exactly this situation. A route was returning HTTP 200, all tests passed, and the agent committed the feature as complete. When I opened the browser, the page said “We’re sorry, but something went wrong” inside a perfectly valid 200 response. Rails wraps some errors in a 200 because the error page itself renders successfully.</p>

<p>Unit tests tell you the code works. They don’t tell you the <em>app</em> works.</p>

<h2 id="anthropics-recommendation-mcp-puppeteer">Anthropic’s recommendation: MCP Puppeteer</h2>

<p>Anthropic’s guide recommends giving agents browser access via <a href="https://modelcontextprotocol.io/">MCP</a> (Model Context Protocol). Specifically, they recommend the Puppeteer MCP server, which lets the agent:</p>

<ol>
  <li>Navigate to a URL</li>
  <li>Take a screenshot</li>
  <li>Analyze the screenshot visually</li>
  <li>Decide if the page looks correct</li>
</ol>

<p>The appeal is clear, the agent can <em>see</em> what it built. Broken page? The screenshot shows it. Any missing content, it’s visible in the image.</p>

<p>I wired this into my harness as an opt-in flag:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">CLAUDE_MCP_FLAGS</span><span class="o">=</span><span class="s2">""</span>
<span class="k">if</span> <span class="o">[[</span> <span class="s2">"</span><span class="k">${</span><span class="nv">MCP_PUPPETEER</span><span class="k">:-}</span><span class="s2">"</span> <span class="o">==</span> <span class="s2">"true"</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then
    </span><span class="nv">CLAUDE_MCP_FLAGS</span><span class="o">=</span><span class="s2">"--mcp-server puppeteer"</span>
<span class="k">fi</span>
</code></pre></div></div>

<p>I keep it <strong>off by default</strong>. Here’s why.</p>

<h2 id="geoffrey-huntleys-counterpoint-verification-outside-the-loop">Geoffrey Huntley’s counterpoint: verification outside the loop</h2>

<p>Geoffrey Huntley, creator of the <a href="https://ghuntley.com/ralph">Ralph loop</a>, takes a different position. When asked why he doesn’t use MCP Puppeteer for visual verification, his answer was one word: <a href="https://github.com/antithesishq/bombadil">Bombadil</a>.</p>

<p>His argument has three parts.</p>

<p><strong>1. Context window cost.</strong> MCP servers consume tokens just by being loaded, tool definitions, schemas, protocol overhead. Puppeteer adds base64 screenshot data on top of that. Huntley found that MCP servers can drop usable context from ~176k to ~120k tokens. That’s 30% of the agent’s reasoning capacity gone before it writes a line of code. (I break down the full token budget in <a href="/2026/02/24/context-window-economics-for-ai-agents">Part 3</a>.)</p>

<p><strong>2. Role confusion.</strong> An agent that codes <em>and</em> visually inspects is doing two jobs in one context window. It’s like having a developer also be the QA tester. Same screen, same time, half their desk taken up by QA tools.</p>

<p><strong>3. Separation of concerns.</strong> Verification should be a gate <em>after</em> the agent runs, not a tool <em>inside</em> the agent’s loop. The agent codes. Something else verifies.</p>

<h2 id="my-experience-with-mcp-puppeteer">My experience with MCP Puppeteer</h2>

<p>I tried it. Here’s what happened.</p>

<p>Each screenshot costs roughly 3,000 to 6,000 tokens depending on page size. Over multiple iterations of verifying key routes, this adds up fast, especially before you count the MCP overhead itself, tool schemas, protocol definitions, capability negotiation. Every MCP server you load eats context just by existing, whether you call its tools or not. (<a href="/2026/02/24/context-window-economics-for-ai-agents">Part 3</a> has the full token math.)</p>

<p>By iteration 5, the agent started forgetting parts of its instructions. By iteration 8, it was re-implementing features it had already built. The context window was so full of screenshot data that the actual task instructions were getting pushed out.</p>

<p>I turned it off and the agent immediately got better. Not because the model improved. Because it had room to think again. Dont use MCP, it eats your context window.</p>

<h2 id="the-fundamental-problem">The fundamental problem</h2>

<p>Asking the agent to verify its own work is like asking an author to proofread their own book. They see what they <em>intended</em> to write, not what’s actually on the page.</p>

<table>
  <thead>
    <tr>
      <th> </th>
      <th style="text-align: left">MCP Puppeteer (inside the loop)</th>
      <th style="text-align: left">External verification (outside the loop)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Approach</strong></td>
      <td style="text-align: left">“Screenshot /apps, check layout”</td>
      <td style="text-align: left">“Explore the app, check invariants”</td>
    </tr>
    <tr>
      <td><strong>Context cost</strong></td>
      <td style="text-align: left">3-6k tokens per screenshot + MCP overhead</td>
      <td style="text-align: left">Zero, standalone process</td>
    </tr>
    <tr>
      <td><strong>Coverage</strong></td>
      <td style="text-align: left">Tests exactly what you script</td>
      <td style="text-align: left">Can discover paths you didn’t think of</td>
    </tr>
    <tr>
      <td><strong>Architecture</strong></td>
      <td style="text-align: left">Agent = coder + visual QA</td>
      <td style="text-align: left">Agent = coder. Verifier = QA</td>
    </tr>
    <tr>
      <td><strong>Failure mode</strong></td>
      <td style="text-align: left">Agent decides its own work looks fine</td>
      <td style="text-align: left">Independent judgment</td>
    </tr>
  </tbody>
</table>

<h2 id="what-i-actually-built">What I actually built</h2>

<p>Bombadil (the tool Huntley points to) is the right direction but not ready for me yet. It’s v0.2.1, Linux-only binaries, pre-1.0 API. So I built a pragmatic middle ground, a route verification script that follows Huntley’s philosophy, external, zero context cost.</p>

<p><strong>Step 1: Discover routes programmatically.</strong></p>

<p>Not by parsing <code class="language-plaintext highlighter-rouge">rake routes</code> text output, which breaks when column alignment changes between Rails versions, but by querying the Rails router directly:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">ROUTES</span><span class="o">=</span><span class="si">$(</span>bin/rails runner <span class="s2">"
  Rails.application.routes.routes.select { |r|
    r.verb.to_s.include?('GET')
  }.map { |r|
    r.path.spec.to_s.gsub('(.:format)', '')
  }.reject { |p|
    p.include?(':') || p == '/up' || p.start_with?('/rails/')
  }.uniq.sort.each { |p| puts p }
"</span> 2&gt;/dev/null<span class="si">)</span>
</code></pre></div></div>

<p><strong>Step 2: Start a real server, hit every route.</strong></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">PORT</span><span class="o">=</span><span class="k">$((</span>RANDOM <span class="o">%</span> <span class="m">1000</span> <span class="o">+</span> <span class="m">9000</span><span class="k">))</span>
bin/rails server <span class="nt">-p</span> <span class="s2">"</span><span class="nv">$PORT</span><span class="s2">"</span> <span class="o">&gt;</span> /dev/null 2&gt;&amp;1 &amp;
<span class="nv">SERVER_PID</span><span class="o">=</span><span class="nv">$!</span>

<span class="c"># Wait for boot</span>
<span class="k">for </span>_ <span class="k">in</span> <span class="si">$(</span><span class="nb">seq </span>1 30<span class="si">)</span><span class="p">;</span> <span class="k">do
    </span>curl <span class="nt">-sf</span> <span class="s2">"http://localhost:</span><span class="nv">$PORT</span><span class="s2">/up"</span> <span class="o">&gt;</span>/dev/null 2&gt;&amp;1 <span class="o">&amp;&amp;</span> <span class="nb">break
    sleep </span>0.5
<span class="k">done</span>
</code></pre></div></div>

<p><strong>Step 3: Check invariants, not screenshots.</strong></p>

<p>This is the key difference from MCP Puppeteer. Instead of asking “does this page look right?”, the script checks a few rules that all working pages must satisfy:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for </span>route <span class="k">in</span> <span class="nv">$ROUTES</span><span class="p">;</span> <span class="k">do
    </span><span class="nv">STATUS</span><span class="o">=</span><span class="si">$(</span>curl <span class="nt">-s</span> <span class="nt">-o</span> /tmp/body <span class="nt">-w</span> <span class="s1">'%{http_code}'</span> <span class="se">\</span>
             <span class="nt">--max-time</span> 5 <span class="s2">"http://localhost:</span><span class="nv">$PORT$route</span><span class="s2">"</span><span class="si">)</span>
    <span class="nv">BODY_SIZE</span><span class="o">=</span><span class="si">$(</span><span class="nb">wc</span> <span class="nt">-c</span> &lt; /tmp/body | <span class="nb">tr</span> <span class="nt">-d</span> <span class="s1">' '</span><span class="si">)</span>

    <span class="k">if</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$STATUS</span><span class="s2">"</span> <span class="o">=</span> <span class="s2">"200"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
        <span class="c"># Property: 200 responses must not contain error page content</span>
        <span class="k">if </span><span class="nb">grep</span> <span class="nt">-qiE</span> <span class="s2">"(Internal Server Error|something went wrong)"</span> /tmp/body<span class="p">;</span> <span class="k">then
            </span><span class="nb">echo</span> <span class="s2">"FAIL </span><span class="nv">$route</span><span class="s2"> -- 200 but error page content"</span>
        <span class="c"># Property: 200 responses must have substantial content</span>
        <span class="k">elif</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$BODY_SIZE</span><span class="s2">"</span> <span class="nt">-lt</span> 100 <span class="o">]</span><span class="p">;</span> <span class="k">then
            </span><span class="nb">echo</span> <span class="s2">"FAIL </span><span class="nv">$route</span><span class="s2"> -- 200 but body too small (</span><span class="k">${</span><span class="nv">BODY_SIZE</span><span class="k">}</span><span class="s2">B)"</span>
        <span class="k">else
            </span><span class="nb">echo</span> <span class="s2">"OK   </span><span class="nv">$route</span><span class="s2"> -- 200 (</span><span class="k">${</span><span class="nv">BODY_SIZE</span><span class="k">}</span><span class="s2">B)"</span>
        <span class="k">fi
    elif</span> <span class="o">[[</span> <span class="s2">"</span><span class="nv">$STATUS</span><span class="s2">"</span> <span class="o">=</span>~ ^<span class="o">(</span>301|302<span class="o">)</span><span class="nv">$ </span><span class="o">]]</span><span class="p">;</span> <span class="k">then</span>
        <span class="c"># Redirects are expected for auth-gated routes</span>
        <span class="nb">echo</span> <span class="s2">"OK   </span><span class="nv">$route</span><span class="s2"> -- </span><span class="nv">$STATUS</span><span class="s2"> (redirect)"</span>
    <span class="k">else
        </span><span class="nb">echo</span> <span class="s2">"FAIL </span><span class="nv">$route</span><span class="s2"> -- HTTP </span><span class="nv">$STATUS</span><span class="s2">"</span>
    <span class="k">fi
done</span>
</code></pre></div></div>

<p>The invariants are simple:</p>
<ul>
  <li>HTTP status must be 200, 301, or 302</li>
  <li>200 responses must not contain error page strings</li>
  <li>200 responses must have a body larger than 100 bytes</li>
  <li>Anything else is a failure</li>
</ul>

<p>These catch the catastrophic failures, the ones where the agent thinks everything is fine but the app is broken.</p>

<h2 id="how-it-integrates-into-the-harness">How it integrates into the harness</h2>

<p>The verification runs <strong>after</strong> the agent commits, not during the agent’s work. The agent never sees the verification tool. It sees the <em>results</em>, pre-computed and injected as text on the next iteration:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Post-commit route verification (non-blocking warning)</span>
<span class="k">if</span> <span class="o">[</span> <span class="nt">-x</span> <span class="s2">"./verify-routes.sh"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
    ./verify-routes.sh 2&gt;/dev/null <span class="o">||</span> <span class="se">\</span>
        <span class="nb">echo</span> <span class="s2">"WARNING: route verification failed"</span>
<span class="k">fi</span>
</code></pre></div></div>

<p>The results get injected into the next iteration’s prompt:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## Route Verification (02:08:13)
10 passed, 0 failed (of 10 routes)

OK   / -- 200 (49750B)
OK   /apps -- 200 (12637B)
OK   /users/cancel -- 302 (redirect)
OK   /users/sign_in -- 200 (14404B)
OK   /users/sign_up -- 200 (14980B)
</code></pre></div></div>

<p>Ten lines. About 200 tokens. The agent sees route health the same way it sees test results, pre-computed facts, not a tool to invoke. Compare that to thousands of tokens for a single MCP screenshot.</p>

<h2 id="the-gotchas-i-hit-building-this">The gotchas I hit building this</h2>

<p><strong>Puma 7 removed the <code class="language-plaintext highlighter-rouge">-d</code> (daemonize) flag.</strong> I had to background the process with <code class="language-plaintext highlighter-rouge">&amp;</code> and manage the PID manually.</p>

<p><strong><code class="language-plaintext highlighter-rouge">bin/rails server</code> refuses to start if <code class="language-plaintext highlighter-rouge">tmp/pids/server.pid</code> exists.</strong> If a previous run crashed without cleanup, the next run fails silently. I use a separate PID file path: <code class="language-plaintext highlighter-rouge">-P tmp/pids/verify.pid</code>.</p>

<p><strong>Turbo’s <code class="language-plaintext highlighter-rouge">*_historical_location</code> routes return 13-14 byte bodies.</strong> These are redirect stubs by design, not errors. I filter them out to avoid false positives.</p>

<p><strong>Auth-gated routes return 302, not 401.</strong> Devise redirects to sign-in instead of returning unauthorized. Expected behavior, not failures. The script counts 302 as a pass.</p>

<h2 id="the-principle">The principle</h2>

<p>Separation of concerns applies to agents too:</p>

<ul>
  <li>The coding agent codes.</li>
  <li>A separate tool verifies.</li>
  <li>The harness orchestrates both.</li>
</ul>

<p>Putting coding and verification in one context window is like asking a developer to write code and manually test it on a split screen. Half their monitor taken up by browser devtools. They <em>can</em> do it. But they’ll do both jobs worse than if they focused on one.</p>

<p><em><strong>Next</strong>: <a href="/context-window-economics-for-ai-agents">Part 3 — Context window economics for AI agents</a></em></p>]]></content><author><name>Ahmed Nadar</name></author><category term="ai" /><category term="agents" /><category term="verification" /><category term="harness" /><summary type="html"><![CDATA[Anthropic says use MCP Puppeteer for screenshots. Geoffrey Huntley says keep verification outside the loop. Who's right? Both, but for different reasons.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://ahmednadar.com/assets/images/og-default.png" /><media:content medium="image" url="https://ahmednadar.com/assets/images/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">7 Patterns for long-running agent harnesses</title><link href="https://ahmednadar.com/7-patterns-for-long-running-agent-harnesses/" rel="alternate" type="text/html" title="7 Patterns for long-running agent harnesses" /><published>2026-02-22T00:00:00+00:00</published><updated>2026-02-22T00:00:00+00:00</updated><id>https://ahmednadar.com/7-patterns-for-long-running-agent-harnesses</id><content type="html" xml:base="https://ahmednadar.com/7-patterns-for-long-running-agent-harnesses/"><![CDATA[<p>Most teams building with AI agents focus on the agent — the model, the prompt, the system instructions. But the thing that determines whether your agent <em>actually ships working code</em> isn’t the agent itself. It’s the <strong>harness</strong>, aka: “the loop around it”.</p>

<p>This harness decides what the agent sees, when it stops, what gets committed, and how failures are caught. It’s like the “controller” in a Rails application. Get it wrong and your agent writes code that looks correct but breaks everything because it was built on wrong assumptions.</p>

<p>I build Rapidfy, an agentic system that generates full Rails applications from a spec. The agent (an LLM with tools) writes the code. The harness (a bash loop) orchestrates: prepare the prompt, invoke the agent, check if progress was made, loop or stop.</p>

<p>Anthropic published a guide called <a href="https://docs.anthropic.com/en/docs/build-with-claude/agentic">“Effective Harnesses for Long-Running Agents”</a> that distills this into 7 key patterns. I applied them to my harness and found that the first four are easy, most developers nail them naturally. The last three are where agents quietly fail. Here’s what each pattern looks like in practice, and what goes wrong when you skip them.</p>

<h2 id="pattern-1-feature-checklist-json">Pattern 1: Feature checklist (JSON)</h2>

<p><strong>The pattern:</strong> Give the agent a machine-readable list of what to build, with pass/fail tracking.</p>

<p>My harness reads a JSON spec file with structured acceptance criteria. Each feature has a test command and a <code class="language-plaintext highlighter-rouge">passes</code> field:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"features"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
    </span><span class="p">{</span><span class="w">
      </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"FEAT-001"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"title"</span><span class="p">:</span><span class="w"> </span><span class="s2">"User authentication"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"criteria"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
        </span><span class="p">{</span><span class="w">
          </span><span class="nl">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"User can sign up with email/password"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"test"</span><span class="p">:</span><span class="w"> </span><span class="s2">"bin/rails test test/integration/signup_test.rb"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"passes"</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="w">
        </span><span class="p">}</span><span class="w">
      </span><span class="p">]</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">test</code> field is what makes this work. Without it, the agent decides <em>on its own</em> whether a feature is done, and agents are optimistic. They’ll mark a feature complete because the code compiles, not because it actually works. With an explicit test command, the harness can verify: run this command, did it pass? Yes or no.</p>

<p>Without it the agent builds features in whatever order it feels like. It skips hard ones. It marks things “done” based on vibes. You end up with 8 features “completed” and 3 of them broken.</p>

<h2 id="pattern-2-progress-file">Pattern 2: Progress file</h2>

<p><strong>The pattern:</strong> Keep a persistent record of what’s done on disk, not in conversation history.</p>

<p>The harness maintains a markdown file with checkboxes. In each iteration the agent finds the first unchecked task and works on it and checks it off.</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">-</span> [x] FEAT-001: User authentication
<span class="p">-</span> [x] FEAT-002: Landing page with RUI components
<span class="p">-</span> [ ] FEAT-003: Dashboard with stats cards    &lt;-- next task
<span class="p">-</span> [ ] FEAT-004: CRUD for deals
</code></pre></div></div>

<p>Every iteration starts with a fresh context window. The agent doesn’t remember what it did last time. Without a file on disk, it has no idea what’s been built and what hasn’t. It will re-implement features that already exist, or skip features it thinks it already did. That’s why the progress file is the agent’s memory between sessions. Cheap to implement, critical to have.</p>

<h2 id="pattern-3-one-feature-per-session">Pattern 3: One feature per session</h2>

<p><strong>The pattern:</strong> The agent does ONE thing per invocation, then exits.</p>

<p>The harness enforces this with a circuit breaker. After each invocation, it checks if the git HEAD changed (meaning the agent committed something). If three iterations pass with no commit, the harness exits:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="o">[[</span> <span class="s2">"</span><span class="si">$(</span>git rev-parse HEAD<span class="si">)</span><span class="s2">"</span> <span class="o">==</span> <span class="s2">"</span><span class="nv">$BEFORE</span><span class="s2">"</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then
    </span><span class="nv">NO_CHANGE</span><span class="o">=</span><span class="k">$((</span>NO_CHANGE <span class="o">+</span> <span class="m">1</span><span class="k">))</span>
    <span class="nb">echo</span> <span class="s2">"No commit (</span><span class="nv">$NO_CHANGE</span><span class="s2">/3)"</span>
    <span class="o">((</span> NO_CHANGE <span class="o">&gt;=</span> 3 <span class="o">))</span> <span class="o">&amp;&amp;</span> <span class="o">{</span> <span class="nb">echo</span> <span class="s2">"Circuit breaker."</span><span class="p">;</span> <span class="nb">exit </span>1<span class="p">;</span> <span class="o">}</span>
<span class="k">else
    </span><span class="nv">NO_CHANGE</span><span class="o">=</span>0
<span class="k">fi</span>
</code></pre></div></div>

<p>I learned this one the hard way. Without a circuit breaker, the agent once ran 12 iterations without committing anything. It kept editing the same file over and over, convinced each edit was progress. Each iteration cost tokens and time, producing nothing. Twelve iterations of an LLM spinning on the same problem is expensive and pointless.</p>

<p>The circuit breaker forces the agent to either ship something or stop trying. Three attempts is generous. If it can’t make progress in three tries, a human needs to look at it.</p>

<h2 id="pattern-4-git-as-memory">Pattern 4: Git as memory</h2>

<p><strong>The pattern:</strong> Each task = one commit. The git log becomes a readable history of what the agent built.</p>

<p>The harness checks <code class="language-plaintext highlighter-rouge">git rev-parse HEAD</code> before and after each invocation. If HEAD changed, a commit happened, which means progress. The commit messages follow a convention:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[FEAT-006] Dashboard CRM stats, seed data, 7 system tests
[FEAT-005] Activity model + controller + views, 21 tests
[FEAT-004] Deal model + controller + views, pipeline stages, 20 tests
</code></pre></div></div>

<p>Incrementally committing each task to Git isn’t just a log, it’s a safety net for when the agent breaks something. If the agent breaks something on iteration 8, you can <code class="language-plaintext highlighter-rouge">git diff HEAD~1</code> to see exactly what changed. You can revert one commit without losing everything. Without per-task commits, you get one giant diff at the end and no way to isolate what broke what.</p>

<hr />

<p>Those first four patterns are the easy ones. Most of us get them right because they’re just good engineering habits applied to automation. The next three are where it gets interesting.</p>

<hr />

<h2 id="pattern-5-e2e-testing">Pattern 5: E2E testing</h2>

<p><strong>The pattern:</strong> Not just unit tests, does the app actually work when you open it in a browser.</p>

<p>A route returning HTTP 200 with a Rails error page in the body was a common failure mode for my agent. All unit tests passed. The agent marked the feature as done. But the page said “We’re sorry, but something went wrong”, wrapped in a valid 200 response that the agent never visually checked.</p>

<p>HTTP 200 does not mean “correct.” The gap between “tests pass” and “app works” is where agents silently fail. Closing that gap requires a manual verification step outside the agent’s loop that starts a real server, hits every route, and checks the responses for actual content, not just status codes.</p>

<p>I built a route verification script that does exactly this, and tried Anthropic’s MCP Puppeteer approach as an alternative. The tradeoffs between the two are more interesting than the script itself. <a href="/where-verification-actually-belongs-in-agent-harnesses">Full implementation and comparison in Part 2</a>.</p>

<h2 id="pattern-6-startup-health-check">Pattern 6: Startup health check</h2>

<p><strong>The pattern:</strong> Before starting new work, verify nothing is already broken.</p>

<p>Before each iteration, I run the full test suite, test it visually, and check the route health. I do manual verification because I want to catch failures that unit tests miss. And if any test fails, I stop immediately:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="o">[[</span> <span class="s2">"</span><span class="nv">$MODE</span><span class="s2">"</span> <span class="o">==</span> <span class="s2">"build"</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then
    </span><span class="nv">HEALTH</span><span class="o">=</span><span class="si">$(</span>bin/rails <span class="nb">test </span>2&gt;&amp;1<span class="si">)</span> <span class="o">||</span> <span class="nb">true
    </span><span class="k">if </span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$HEALTH</span><span class="s2">"</span> | <span class="nb">grep</span> <span class="nt">-qE</span> <span class="s1">'[1-9][0-9]* (failures|errors)'</span><span class="p">;</span> <span class="k">then
        </span><span class="nb">echo</span> <span class="s2">"Health check failed. Fix failures before continuing."</span>
        <span class="nb">exit </span>1
    <span class="k">fi
    </span><span class="nb">echo</span> <span class="s2">"Health check passed."</span>
<span class="k">fi</span>
</code></pre></div></div>

<p>Note the regex: <code class="language-plaintext highlighter-rouge">[1-9][0-9]* (failures|errors)</code>. It matches “1 failures”, “3 errors”, “12 failures” — but <em>not</em> “0 failures”. That distinction matters. <code class="language-plaintext highlighter-rouge">grep "failures"</code> would match the success line too.</p>

<p>While building this harness, I hit a wall where iteration 4 introduced a subtle bug in a model validation. The tests for that specific model still passed, but it broke a different controller that depended on the validation behavior. Iterations 5, 6, and 7 kept building on top of the broken foundation. Each one added code that <em>assumed</em> the validation worked correctly. By the time I noticed — three iterations later — I had to revert three commits and redo the work. Three iterations of wasted LLM calls, wasted tokens, wasted time. This was a failure mode that could have been caught much sooner if I had run the full test suite before each iteration.</p>

<p>A single <code class="language-plaintext highlighter-rouge">bin/rails test</code> before each iteration would have caught it immediately. The cost: 15 seconds of test runtime. The savings: three wasted iterations.</p>

<h2 id="pattern-7-prompt-efficiency">Pattern 7: Prompt efficiency</h2>

<p><strong>The pattern:</strong> Minimize token overhead so the agent has more context for actual reasoning.</p>

<p>When building this harness, I learned that pre-computing what you can outside the LLM call, then injecting compact summaries into the prompt is the most impactful pattern. Test results, route health, linting output — anything that can be summarized in a few lines of text shouldn’t be produced by the agent itself. Raw test output is 50-100 lines. A summary is 3-15 lines. That difference compounds across every iteration.</p>

<p>This turned out to be the least obvious pattern but the most impactful. <a href="/context-window-economics-for-ai-agents">Full implementation and token math in Part 3</a>.</p>

<h2 id="what-i-learned">What I learned</h2>

<p>The first four patterns (checklist, progress file, one-feature-per-session, git as memory) are table stakes. You need them, but they’re not what makes or breaks your agent. They’re just good habits.</p>

<p>The last three are where the real value is. <strong>Startup health check</strong> catches regressions before they compound — five lines of bash that save entire iterations of wasted work. <strong>E2E verification</strong> catches the failures that unit tests miss. <strong>Prompt efficiency</strong> determines how long your agent stays useful before it starts forgetting its instructions.</p>

<p>The easy patterns give you a working agent. The hard patterns give you a reliable one. Parts 2 and 3 dig into the two hardest: verification and context economics.</p>

<p><em><strong>Next</strong>: <a href="/where-verification-actually-belongs-in-agent-harnesses">Part 2 — Where verification actually belongs in agent harnesses</a></em></p>]]></content><author><name>Ahmed Nadar</name></author><category term="ai" /><category term="agents" /><category term="rapidfy" /><category term="harness" /><summary type="html"><![CDATA[What I learned applying Anthropic's 7 harness patterns to my agent build system. The easy ones don't matter much — the hard ones are what separate shipping from spinning.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://ahmednadar.com/assets/images/og-default.png" /><media:content medium="image" url="https://ahmednadar.com/assets/images/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">How I went from one button to building entire Rails apps in 10 seconds</title><link href="https://ahmednadar.com/agentic-rapid-rails-app-in-second/" rel="alternate" type="text/html" title="How I went from one button to building entire Rails apps in 10 seconds" /><published>2026-02-10T00:00:00+00:00</published><updated>2026-02-10T00:00:00+00:00</updated><id>https://ahmednadar.com/agentic-rapid-rails-app-in-second</id><content type="html" xml:base="https://ahmednadar.com/agentic-rapid-rails-app-in-second/"><![CDATA[<p>Two years ago I had a simple idea. Build UI components made only for Rails. Not ported from React. Not wrapped around some JavaScript library. Just pure Rails native components.</p>

<p>It started with a button. Literally one button component.</p>

<p>Today it’s 32 components. Forms, steppers, tables, modals, navigation, dropdowns. All built with Rails 8, Turbo, and StimulusJS. No external JavaScript libraries. No outside frontend. They work with your Rails forms out of the box because they were built for Rails forms. https://rapidrails.cc/docs/</p>

<p>I’m using them in production right now, which is really good.
But something kept bugging me. A new idea!</p>

<p>Every week I see tools like v0 and Lovable letting developers describe what they want and get a working app back in seconds. Type a prompt, get a React app. It’s impressive. But if you’re a Rails developer, you get nothing. None of these tools generate Rails native code. None of them know about Turbo or Stimulus or Rails form or ViewComponents. You’re on your own.</p>

<p>That felt wrong. Rails developers deserve the same thing.</p>

<p>So I asked myself a question. I already have 32 components that follow Rails conventions. What if an AI agent already knows how those components work, how they connect, and how Rails 8 expects things to be structured? What would happen if I pointed an agent at all of that and said “build me a blog”?</p>

<p>I tried it. And it worked.</p>

<p>Let me tell you what happens right now when you give Rapidfy a clear prompt.</p>

<p>You say “build me a blog with authors, posts, and comments.” Rapidfy creates the models. Generates the migrations. Runs them. Builds the views using RapidRails UI components. Makes two commits. The whole thing takes about 10 seconds and ready for testing.
Ten seconds. That’s faster than running “rails new”.</p>

<p>You get a working Rails 8 application with real UI, real database tables, and real components. Not a prototype. Not a wireframe. A working app.</p>

<p>And everything it builds is Rails native. Turbo for page updates. Stimulus for interactivity. ViewComponents for the UI. No React anywhere. No npm install anything. Just Rails the way Rails is meant to be.</p>

<p>Why does this work?</p>

<p>It works because Rails is predictable. Convention over configuration means there are clear patterns for where things go and how they connect. Models go here. Views go there. Routes follow this pattern. Forms work this way. 
But Rails alone isn’t enough. The agent also needs to understand the UI layer. That’s where RapidRails UI makes the difference.
Every component follows a clear, consistent structure. A button works like a button. A stepper works like a stepper. The naming is predictable. The options are documented. The patterns repeat. An agent doesn’t have to guess how to use a modal or figure out five different ways to build a form.</p>

<p>And the documentation wasn’t built just for humans. It was built with AI agents in mind from the start. Every component has clear inputs, expected outputs, and usage examples that an agent can read and act on immediately. No ambiguity. No “it depends.” Just structure an agent can follow.</p>

<p>That predictability is exactly what AI agents need. When the framework is consistent and the components follow conventions, the agent makes fewer mistakes. React has a thousand ways to do everything. Rails has one good way. RapidRails UI makes sure the frontend follows that same philosophy.
That’s the advantage. The whole stack speaks the same language.</p>

<p>I’m not saying this is perfect. It’s not. Right now it needs a clear, specific prompt to do its best work. Complex apps with unusual requirements will trip it up. Edge cases exist. But it’s a start. And it’s a start that already works faster than most developers expect.</p>

<p>My goal is simple. I want Rapidfy to be for Rails what v0 is for React. You describe what you want. You get a working Rails app back. Built with Rails, for Rails.</p>

<p>I’m getting there.</p>

<p>If you want to see what 32 Rails native components look like, start here rapidrails.cc/docs/
And if you want to follow along as Rapidfy gets better, I’ll be sharing demos and progress right here.</p>

<p>Meet Rapidfy 🚀</p>

<p><img src="https://ahmednadar.com/assets/images/posts/rapidfy_templates.png" alt="Rapidfy" />
<img src="https://ahmednadar.com/assets/images/posts/rapidfy_dashboard.png" alt="Rapidfy" /></p>]]></content><author><name>Ahmed Nadar</name></author><category term="rapidfy" /><category term="ai" /><category term="rapidrails" /><summary type="html"><![CDATA[Building entire Rails apps in 10 seconds with AI agents and RapidRails UI components]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://ahmednadar.com/assets/images/og-default.png" /><media:content medium="image" url="https://ahmednadar.com/assets/images/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Why focused UI components matter</title><link href="https://ahmednadar.com/why-focused-ui-components-matter/" rel="alternate" type="text/html" title="Why focused UI components matter" /><published>2026-02-03T00:00:00+00:00</published><updated>2026-02-03T00:00:00+00:00</updated><id>https://ahmednadar.com/why-focused-ui-components-matter</id><content type="html" xml:base="https://ahmednadar.com/why-focused-ui-components-matter/"><![CDATA[<p>I get this question a lot: “Why do I need a focused UI component library? Can’t I just use regular native forms?”
My answer is always the same. Let me show and tell OR, present before and after.</p>

<p>I came across this registration form from StockLive, an online livestock auction platform. You’ve probably seen forms like this in every industry. The goal is simple, get the user’s information and get them on board quickly.</p>

<p>It’s a single page. Eighteen fields. All visible at once. One big “Create Bidder Registration” button at the bottom.</p>

<p>You can see the form makes a few assumptions about the person filling it out:
Their internet connection is stable and nothing will be lost if the page reloads. They’ll know exactly what every field means. They won’t mind scrolling through all 18 fields to find the one they missed. They have the patience to fill everything out in one sitting. And if something goes wrong with validation, they’ll happily start over again.
Sound right?</p>

<p>On the contrary, every one of these assumptions is wrong. I tried registering. It took me almost 5 minutes. Five minutes for a registration form, that’s 300 seconds of friction between a potential bidder and their first auction.
Most of that time wasn’t spent typing. It was spent dealing with phone number formatting, address validation, misleading dropdown list and postal code errors. One mistake and the form reset. Eventually I gave up trying to enter real data and just copied the placeholder values to get through. Tada.</p>

<p>If I a developer who builds forms for a living nearly abandoned this form, what happens to a cattle farmer in regional Australia who just wants to bid on livestock?</p>

<p>What’s actually going wrong.
This isn’t about bad intentions. The team that built this form was trying to collect necessary information. But presenting it this way creates real problems for the people using it.
The form has eighteen fields on a single screen that overwhelms people. The moment someone sees that wall of inputs, their first instinct is to close the tab, not fill it out. Researchers call this cognitive overload. I call it losing customers before they start.
There’s no indication of progress or how far along you are or how much is left. Without that, every field feels like it might be followed by ten more. Uncertainty kills motivation.
Sale selection sits next to personal details. Business information is mixed with agent contacts. PIC numbers live alongside email addresses. When everything is presented the same way, nothing stands out and nothing guides the user through.
When you fill out 18 fields, hit submit, and then discover three things are wrong, what then!!.  Now you’re scrolling back up, hunting for red text, fixing things you thought were fine. Each error-submit cycle increases the chance someone walks away becasue errors arrive too late. 
Life interrupts. Phones ring. Browsers crash. If you lose your progress on a long form, you’re probably not starting over. No. You’re finding another way or another platform.</p>

<p>The second -small- image shows the same registration, rebuilt with RapidRails UI’s Steps component. Same information collected. Same fields. Completely different experience.
Instead of 18 fields on one page, the form is broken into six clear steps: Sale, Personal, Address, Business, Agent, and Review. A progress bar at the top shows exactly where you are.
Each step shows only 2 to 4 fields. Just the fields that belong together. Personal details on one screen. Address on the next. Business information after that.</p>

<p>What changes for the user.
Two or three fields at a time means the person filling out the form can actually think about what they’re entering. No scrolling and no scanning. Just answer what’s in front of you and move forward. This way users are focused instead of being overwhelm. 
The stepper at the top shows completed steps in green, the current step highlighted, and remaining steps ahead. This way people know exactly where they are. That certainty keeps them going. 
Made a mistake three steps back? Click that step and go straight there. No scrolling through a wall of fields. Just fix it and continue. And the data you already entered is waiting for you. Users have freedom to move around.
Name and email belong together. Street address and postal code belong together. Trading agent and selling agent belong together. When information is grouped logically, it makes sense. When it’s all jumbled on one page, it doesn’t.
The final step shows a summary of everything entered. Catch a typo in your ABN? Click back to the Business step, fix it and continue. This alone reduces support tickets from incorrect registrations.
Progress is saved as you go. When browser crashes, phone rings, connection drops, come back and pick up where you left off because data survives interruptions.</p>

<p>This isn’t just about prettier forms only. It is crucial for businesses. It’s about what happens to your numbers. Forms with high field counts have abandonment rates above 60%. That’s a lot. Breaking them into steps with progress indicators typically cuts that in half, if not more. That’s not a design opinion, that’s measurable.</p>

<p>For StockLive specifically, every abandoned registration is a bidder who didn’t make it to the auction. A bidder who didn’t bid. Revenue that didn’t happen. Not because the product is bad, but because the door was too hard to open.
And the fix isn’t complicated. It’s the same form, same data, and same validation. You’re just presenting it in a way that respects how people actually think and work.</p>

<p>Back to the original question, why this needs a component, not a hack?
You could build a multi-step form from scratch. Wire up the JavaScript to show and hide sections. Track which step is active. Handle navigation between steps. Manage state. Build the progress indicator. Make it accessible. Test it across browsers.</p>

<p>Or you could use a component that already does all of that.</p>

<p>RapidRails UI’s Steps component handles the stepper navigation, progress tracking, step validation, free navigation between completed steps, and the review summary. It’s built with Rails Turbo, StimulusJS and no external JavaScript libraries and it works with your existing Rails forms. It’s accessible out of the box, first class citizen components.</p>

<p>One component that you drop it into your view and define your steps. Done.
That’s what focused UI components are for. Not to impress other developers. To solve real problems for real users, consistently, without rebuilding the wheel every time.</p>

<p>→ See the full StockLive case study: rapidrails.cc/docs/stocklive
→ Steps component documentation: rapidrails.cc/docs/steps</p>

<p><img src="https://ahmednadar.com/assets/images/posts/not-focused-ui-components.png" alt="Rapidfy" />
<img src="https://ahmednadar.com/assets/images/posts/focused-ui-components.png" alt="Rapidfy" /></p>]]></content><author><name>Ahmed Nadar</name></author><category term="ui" /><category term="components" /><category term="rapidrails" /><summary type="html"><![CDATA[18 form fields taught me a lot about why focused UI components matter]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://ahmednadar.com/assets/images/og-default.png" /><media:content medium="image" url="https://ahmednadar.com/assets/images/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The price is not the cost</title><link href="https://ahmednadar.com/what-are-you-building-while-ai-speeds-up/" rel="alternate" type="text/html" title="The price is not the cost" /><published>2026-01-26T00:00:00+00:00</published><updated>2026-01-26T00:00:00+00:00</updated><id>https://ahmednadar.com/what-are-you-building-while-ai-speeds-up</id><content type="html" xml:base="https://ahmednadar.com/what-are-you-building-while-ai-speeds-up/"><![CDATA[<p>When a client requested ongoing support, I knew they’d need approximately 20 hours in the first quarter for integration work, setup, testing and training. So I’d say, “It will be 20 hours minimum per quarter. Which is $4,000.”</p>

<p>However, they hesitated. Meetings got scheduled. Budgets got reviewed. Scopes changed a couple of time. And three weeks later, still no answer.</p>

<p>So, I tried something different with the next client. Same situation, clear work scope, same amount of expected work. This time I said: “5 hours minimum per quarter. Which is $1,000.”
When I received a response a few hours later, I was surprised. They said yes!</p>

<p>Interestingly, the last client asked for 22 hours in the first quarter and paid $4,400. More than the first client would have paid!</p>

<p>What just happened!? I used to think that, my job was to accurately estimate what a client needed and quote that number upfront. After losing a few opportunities, it’s not.</p>

<p>That’s when I learned something new: the price is what they agree to, and the cost is what they actually spend. Why they looks and sound the same, they are different numbers. More importantly, the psychology around each is completely different.</p>

<p>When someone sees “$4,000 minimum,” they think, “that’s a big commitment. What if we don’t need that much work? What if it doesn’t work out? Let me check with finance.”</p>

<p>On the other hand, when someone sees “$1,000 minimum,” they think, “That’s reasonable. We can do this.”
The first framing invites scrutiny. The second invites action.</p>

<p>Interestingly, the work doesn’t change. Still takes 20 hours. The support still costs the same per hour. The total invoice ends up being almost the same. What changes is how they feel saying yes the first time.</p>

<p>I used to resist this approach for a long time. It felt like I was underselling myself or playing games with pricing. However, that is not what is happening.</p>

<p>A low minimum is not a discount; it’s a door. The cost walks through later, naturally, once they already trust you and see the value of the work.</p>

<p>So when you are pricing your work, don’t anchor on what they’ll spend. Anchor on what they’ll agree to. 
The price gets you in. The cost is what happens once you’re there.</p>]]></content><author><name>Ahmed Nadar</name></author><category term="pricing" /><category term="psychology" /><category term="rapidrails" /><summary type="html"><![CDATA[How framing the price differently can help you get more clients]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://ahmednadar.com/assets/images/og-default.png" /><media:content medium="image" url="https://ahmednadar.com/assets/images/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">What are you building while AI speeds up?</title><link href="https://ahmednadar.com/the-price-is-not-the-cost/" rel="alternate" type="text/html" title="What are you building while AI speeds up?" /><published>2025-12-16T00:00:00+00:00</published><updated>2025-12-16T00:00:00+00:00</updated><id>https://ahmednadar.com/the-price-is-not-the-cost</id><content type="html" xml:base="https://ahmednadar.com/the-price-is-not-the-cost/"><![CDATA[<p>A year ago, I fought with AI over an OK-ish commit message.
Now I run agent swarms.</p>

<p>That escalated quickly.</p>

<p>Every week in 2025, experts said “AI is improving, get ready.” They undersold it. We’re not improving, we’re in free-fall acceleration. The kind where you check your phone for an hour and three new capabilities dropped while you weren’t looking.</p>

<p>Mid December 2025, I coordinate agents with specialized skills. They have objectives. They complete tasks. They work while I think about what’s next.</p>

<p>Who predicted solo developers or creators could do this in early 2025? Nobody. Not even the people building the tools. They all say, “we don’t know how AI works!” Which is both terrifying and hilarious.</p>

<p>And, the trade-off? Time. Learn one thing Monday, it’s ancient history by Friday. It’s like TikTok scrolling, except the algorithm is super-intelligence and nobody programmed a pause button. The Genie is out.</p>

<p>The train isn’t just moving, it’s breaking the speed limit. You’re either on it or watching from the platform. (Spoiler, the platform sells tickets to nowhere land.)</p>

<p>But here’s the beautiful part, that same insane speed means you can build things this year (2 weeks left!) that were science fiction or a dream early this year. You can help more people. Create more value. Ship products faster. Leave something behind.</p>

<p>Because if time flies, and AI flies faster, then builders who can keep up will build useful things -not slop- that seemed impossible.</p>

<p>So… what are you building while the train is moving?</p>

<p><img src="https://ahmednadar.com/assets/images/posts/open-graph-AI-project.png" alt="Rapidfy" /></p>]]></content><author><name>Ahmed Nadar</name></author><category term="pricing" /><category term="psychology" /><category term="rapidrails" /><summary type="html"><![CDATA[How AI is speeding up the build process and what you can build while it's happening]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://ahmednadar.com/assets/images/og-default.png" /><media:content medium="image" url="https://ahmednadar.com/assets/images/og-default.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>