Building Paveway: Beta-Ready in Record Time - How AI Agents Accelerated Everything

I planned 90 days. Turns out AI agents had different ideas.

Three weeks ago, I wrote about building Paveway from zero to feature-complete in 8 days. That felt impossibly fast at the time. Now? It’s beta-ready, and the product barely resembles what I started with. Simple daily logging became a 5-dimension career scoring system. I added Slack integration, an AI-powered mentor feature, and a complete organization management layer for engineering managers.

None of this is about cutting corners. It’s about what happens when AI tooling becomes an actual development partner instead of just autocomplete on steroids.

The Unexpected Acceleration

Back on October 12th, I sketched out what felt like a reasonable timeline: four weeks for foundation work, another four for the sophisticated features, final four for polish. Standard waterfall thinking, honestly. By week 2, I was running ahead. By week 3, the whole timeline looked ridiculous.

Here’s the thing though - the speed wasn’t really about typing faster. It was about making decisions with more confidence. Refactoring entire systems in an afternoon instead of spending a week planning it out. Deleting deprecated code without that nagging fear that you’ll need it later, because you know you can rebuild better versions quickly if needed.

Using Cursor this intensively changed something fundamental about how I approach building software. Not just faster execution - different thinking entirely.

The Pivot: From Daily Logs to the Paveway Index

The biggest shift happened around week 2. My original idea was pretty basic: log what you did today, let AI crunch the numbers, spit out your IC level. It worked, technically. But using it felt flat. One number doesn’t tell you much.

The lightbulb moment came from thinking about how my managers have actually evaluated me over the years. Nobody ever just asked “what level are you?” They’d look at multiple things - the complexity of what I owned, how much hand-holding I needed, my technical chops, whether I moved the needle on anything important, if I helped others get better.

That’s how I landed on the Paveway Index. Instead of one score, you get five numbers (each 1.0-6.0) across different dimensions:

1. Scope - Size and complexity of work you own
2. Autonomy - Level of independence and self-direction
3. Technical Depth - Expertise and problem-solving ability
4. Impact - Effect on users, team, and business
5. Mentorship - Ability to guide and elevate others

You might be crushing it on Technical Depth (say, 4.5) but barely registering on Mentorship (2.5). That’s actually useful - you know exactly where to focus your growth. Here’s what the type structure looks like:

export type DimensionName = 'scope' | 'autonomy' | 'technical_depth' | 'impact' | 'mentorship';

export interface DimensionScore {
  dimension: DimensionName;
  score: number; // 1-6
  confidence: number; // 0-1
  justification: string;
  evidenceCitations: EvidenceCitation[];
}

export interface ICEvaluation {
  id: string;
  userId: string;
  evaluationDate: string;
  evaluationType: EvaluationType;
  overallScore: number; // Paveway Index (1.0-6.0)
  mappedLevel: MappedLevel; // Entry-Level, Mid-Level, Senior, Staff, Principal, Distinguished
  confidenceScore: number;
  aiSummary: {
    summary: string;
    keyInsights: string[];
    recommendations: string[];
  };
  dataSourcesUsed: Record<string, boolean>;
}

The evaluation system runs on every reflection submission:

const DIMENSIONS: DimensionName[] = [
  'scope',
  'autonomy',
  'technical_depth',
  'impact',
  'mentorship',
];

export async function evaluateUserWithRubric(
  userId: string,
  contextNotes?: string
): Promise<ICEvaluationResult> {
  // Step 1: Gather all user data
  const userData = await gatherUserData(userId);

  // Step 2: Evaluate each dimension (sequential to avoid rate limits)
  const dimensionScores: DimensionScore[] = [];

  for (const dimension of DIMENSIONS) {
    const result = await evaluateDimension(dimension, userData, contextNotes);
    dimensionScores.push({
      dimension,
      score: result.score,
      confidence: result.confidence,
      justification: result.reasoning,
      evidenceCitations: result.evidenceCitations,
    });

    // Small delay to avoid rate limiting
    await new Promise((resolve) => setTimeout(resolve, 1000));
  }

  // Step 3: Calculate Paveway Index (average of 5 dimensions)
  const overallScore = dimensionScores.reduce((sum, d) => sum + d.score, 0) / dimensionScores.length;

  // Step 4: Map to level
  const mappedLevel = mapScoreToLevel(overallScore);

  // Step 5: Calculate average confidence
  const confidenceScore = dimensionScores.reduce((sum, d) => sum + d.confidence, 0) / dimensionScores.length;

  // Step 6: Generate overall summary
  const aiSummary = await generateOverallSummary(dimensionScores);

  return {
    dimensions: dimensionScores,
    overallScore: parseFloat(overallScore.toFixed(2)),
    mappedLevel,
    confidenceScore: parseFloat(confidenceScore.toFixed(2)),
    aiSummary,
    dataSourcesUsed,
  };
}

Each dimension gets evaluated independently - confidence scores, evidence citations, the whole thing. Then the Paveway Index is just the average of those five scores.

How AI Agents Changed Everything

Let me get specific about what “AI-assisted development” actually looked like on this project. Because it’s not just “type faster with autocomplete.”

Complex Refactors Became Manageable

Week 3, I realized the whole “daily logs” thing had to go. The new “reflections” system was better, but it meant ripping out a bunch of code, creating new database tables, migrating data, updating references everywhere, deleting deprecated files. You know the drill.

Normally this is a multi-day affair. Maybe a week if you’re being careful. I knocked it out in an afternoon.

The way it worked: I’d describe what needed to happen, and Cursor would help me work through each file systematically. It’d spot all the references I needed to update, handle the imports, modify database queries while keeping everything type-safe. Something broke? It’d suggest the fix immediately. Pattern emerged? Applied consistently across similar files.

What’s wild is the confidence boost. I wasn’t scared to nuke deprecated code. Wasn’t stressing about the scale of the refactor. Just worked through it, knowing the AI would catch the edge cases I’d inevitably miss.

Database Migrations Got Way Less Scary

The database schema changed constantly as I figured out what the product actually needed to be. Every migration got documented and tested properly - Row Level Security policies, data transformations, constraints, indexes, all that.

Here’s one that moved data from the old system to the new dimension-based approach:

-- Add 'github_analysis' as a valid evaluation type
ALTER TABLE ic_evaluations
DROP CONSTRAINT IF EXISTS ic_evaluations_evaluation_type_check;

ALTER TABLE ic_evaluations
ADD CONSTRAINT ic_evaluations_evaluation_type_check
CHECK (evaluation_type IN ('ai_triggered', 'ai_continuous', 'migrated_from_legacy', 'reflection', 'resume', 'github_analysis'));

-- Migrate experience_analyses (GitHub analyses) to ic_evaluations
INSERT INTO ic_evaluations (
  user_id,
  evaluation_date,
  evaluation_type,
  overall_score,
  mapped_level,
  confidence_score,
  ai_summary,
  submission_responses,
  data_sources_used
)
SELECT 
  user_id,
  analyzed_at,
  'github_analysis',
  3.0, -- Default middle score, will be recalculated
  'Senior', -- Default level
  0.7, -- Default confidence
  jsonb_build_object(
    'summary', analysis_result->>'summary',
    'key_insights', COALESCE(analysis_result->'key_insights', '[]'::jsonb),
    'recommendations', COALESCE(analysis_result->'recommendations', '[]'::jsonb)
  ),
  NULL,
  jsonb_build_object('github', true)
FROM experience_analyses
WHERE analysis_result IS NOT NULL
ON CONFLICT DO NOTHING;

Didn’t write that from scratch, obviously. Cursor knows the schema, understands the constraints and data types, follows the migration patterns I’d been using. Generated something that followed best practices, had proper error handling, even added useful comments.

Component Iteration Became Stupidly Fast

UI work is where this stuff really pays off. I’d describe what I wanted - “dimension score card, big number for the score, progress bar for confidence, include the justification text, add a subtle gradient tied to the score value” - and get back a complete, type-safe component using the project’s patterns, shadcn/ui, Tailwind, everything.

But the real win? Iteration. “Make that confidence bar purple” - done. “Add icons for each dimension” - done. “Extract score calculation to a util” - done.

That feedback loop means you can actually experiment with UX instead of committing to your first idea because refactoring is too painful.

What Actually Got Built

Reflections went from “log once a day” to “submit whenever, get evaluated immediately on all 5 dimensions.” Slack integration took about 2 days - /paveway commands, modal forms, daily reminders, the works. Here’s what the service layer looks like:

export class SlackService {
  private botToken: string;

  constructor(botToken: string) {
    this.botToken = botToken;
  }

  async postMessage(options: SlackMessageOptions): Promise<void> {
    const response = await fetch('https://slack.com/api/chat.postMessage', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        Authorization: `Bearer ${this.botToken}`,
      },
      body: JSON.stringify(options),
    });

    const data = await response.json();
    if (!data.ok) {
      throw new Error(`Slack API error: ${data.error}`);
    }
  }

  async openModal(options: SlackModalOptions): Promise<void> {
    const response = await fetch('https://slack.com/api/views.open', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        Authorization: `Bearer ${this.botToken}`,
      },
      body: JSON.stringify(options),
    });

    const data = await response.json();
    if (!data.ok) {
      throw new Error(`Slack API error: ${data.error}`);
    }
  }

  // Format helpers
  static createHeaderBlock(text: string): Block {
    return {
      type: 'header',
      text: {
        type: 'plain_text',
        text,
      },
    };
  }

  static createSectionBlock(text: string, markdown = true): Block {
    return {
      type: 'section',
      text: {
        type: markdown ? 'mrkdwn' : 'plain_text',
        text,
      },
    };
  }
}

The AI mentor feature pulls in your last 90 days of reflections, GitHub activity, and dimension scores to give you actual context-aware guidance. Not generic career advice - specific stuff based on your actual work.

For managers, there’s a whole organization layer: team dashboards, member management, dimension distribution across your team, privacy controls (important), custom rubrics, growth planning, billing. Basically everything you’d need to help your team grow without hovering over them constantly.

GitHub integration does weekly analysis, triggers evaluations on commits and PRs via webhooks, analyzes code patterns. The learning system curates courses and articles based on which dimensions you’re weak in.

The Tech Stack

Nothing exotic: Next.js 16, React 19, TypeScript, Supabase for the database and auth, OpenAI’s API for evaluations, Stripe for billing, Slack and GitHub OAuth for integrations, Vercel for hosting, Sentry for error tracking. Boring, proven stuff. The kind of stack that lets you build product instead of fighting infrastructure.

Database schema kept evolving as I went. Row Level Security on everything (multi-tenancy from day one), composite indexes where it made sense, JSON columns for the AI-generated content that doesn’t fit neatly into relational tables.

Product Decisions That Actually Mattered

Database-First: Started every feature with the migration. Schema design, RLS policies, indexes - do it up front. Feels slow at first, but it saved me from so many emergency schema rewrites. Went from 4 tables to 15+ without any major drama.

Server Components Everywhere: Made them the default. Only went client-side when I actually needed interactivity. Subscription checks run on the server, data fetching is optimized, users see correct state immediately. No flash of wrong content.

Proprietary Metric: Calling it the “Paveway Index” instead of just “IC score” turned out to be smart. It’s memorable, shareable, and positions the product differently than “yet another IC level assessment tool.”

B2B From Day One: Built organization features alongside the individual ones instead of tacking them on later. Same core product, just an extra analytics layer for managers. Way easier than retrofitting multi-tenancy after launch.

Ruthless Cleanup: Deleted deprecated code aggressively before hitting beta. No “we’ll clean this up later.” Clean codebase means you can iterate fast without tripping over technical debt.

What Actually Made This Fast

Cursor’s AI:
Cmd+K for inline edits and refactors. Codebase-aware suggestions that follow patterns I’d already established. Terminal integration so I could run migrations and type-check without context switching.

What AI Didn’t Do:
Product decisions. UX design. Architecture tradeoffs. Business strategy. All still on me.

Tech Choices That Eliminated Work:
Supabase gave me auth, migrations, RLS, and realtime for free. shadcn/ui meant consistent, accessible components without design system bikeshedding. Next.js App Router defaulted to Server Components. Stripe handled subscription complexity, trials, B2B billing.

Tactics That Kept Velocity High:
Ship complete features - UI, API, database, admin tools, all in one go. No half-finished work sitting around. Delete deprecated code the moment it’s deprecated, not later. Write the database migration before touching any UI code.

What Got Shipped

Completely pivoted from daily logging to a 5-dimension evaluation system. Built the reflection workflow, Slack integration, AI mentor, organization dashboards for managers, GitHub integration, learning resources, billing. Database went from 4 tables to 15+. Full platform with auth, multi-tenancy, integrations, analytics, admin tools.

What’s Next

Beta testing to dial in the dimension scoring and evaluation accuracy. Custom rubric weights per organization (different companies value different things). Public profiles and an API for third-party integrations. Eventually: skill assessments, anonymized industry benchmarking, deeper integration with project management tools.

Takeaways

AI agents speed up execution but they don’t think for you. They let you refactor aggressively without being scared and catch the edge cases you’d miss. That’s valuable, but it’s not magic.

Ship complete features. Delete deprecated code the second it’s deprecated - technical debt grows fast. Start features with the database migration, not the UI. Beta-ready doesn’t mean perfect, it means usable.

For engineers: career growth is multi-dimensional. Technical depth matters, but so does scope, autonomy, impact, and mentorship. Track your actual work, quantify your impact, own your career data.

For managers: dimension-specific insights actually improve coaching conversations. But privacy and trust matter more than data - share aggregate insights up the chain, protect individual team member data.

Final Thoughts

Three weeks ago, I had a feature-complete product. Today it’s beta-ready and the product direction changed completely - from basic daily logging to a multi-dimensional career scoring system.

AI agents made complex refactors feel manageable instead of terrifying. Smart tech choices cut out entire categories of grunt work. Staying focused on complete features kept momentum high. The 90-day experiment moved way faster than I expected.

The real lesson isn’t that AI makes you faster. It’s that AI lets you execute on good decisions quickly, refactor without being paralyzed by fear, and iterate constantly without burning out. That’s where the actual acceleration comes from.

This is part 3 of the Paveway series. Read about the 90-day experiment and the first 8 days.