Engineering · AI Agents · Behind the Scenes

31 Agent Skills, 12 Products, One Engineer

We run a 12-product software company with a single developer. Not by working harder. By giving AI agents recurring responsibilities and holding them to the same standards as employees.

February 20, 2026
31
Agent Skills
12
Live Products
4
Auto Hooks
6
Recurring Audits

Most companies hire people to do repetitive work. We write a text file instead.

At Voxos, every recurring task that an AI agent can handle is encoded as a skill: a plain Markdown file that describes what to do, when to do it, and how to verify the result. These skills are version-controlled, code-reviewed, and deployed the same way we deploy software. They are not prompts. They are job descriptions.

We currently run 31 skills, 4 automatic hooks, and 6 recurring maintenance audits across 12 live products spanning two AWS accounts. One person manages all of it. This post is a complete inventory of every skill we use, what it does, and how often we use it.

What We Learned Building This

  1. Skills compound. Each new skill unlocks the next. A research pipeline makes blog posts possible. A bench tool makes the pipeline measurable. A deploy skill makes blog posts publishable. The system is more valuable than the sum of its parts.
  2. Recurring beats one-shot. The highest-value skills are the ones that run daily without being asked: engagement audits, session tracking, cost monitoring. One-time tasks are solved once. Recurring tasks are solved forever.
  3. Measurement makes trust possible. We don't trust our agent because it's smart. We trust it because every change must be proven with numbers. Before and after. Delta shown. No vibes.
  4. Hooks are the nervous system. The 4 automatic hooks that fire on session start and end are invisible but critical. They track every session, log every task, and catch orphaned processes. Without them, the skills are just documents.
  5. Security is a skill, not an afterthought. A monthly SOC 2 audit skill scans all three AWS accounts for encryption gaps, public S3 buckets, exposed secrets, and overprivileged IAM roles. A penetration testing skill probes every endpoint from the outside. These run on the same infrastructure as every other skill.
  6. The hardest skills to write are the simplest. /commit is four words: "stage, describe, commit." Getting the agent to produce the right commit message every time took more iterations than building the research pipeline.

Research & Intelligence

These are the skills that go find things out. They launch multi-agent pipelines, scrape the web, and synthesize findings into structured output.

Code & Development

The daily drivers. These skills handle the mechanical parts of writing, committing, testing, and auditing code.

Infrastructure & DevOps

The skills that keep the lights on. These handle deployment, security, and the AWS plumbing that a 12-product company requires.

Content & Frontend

The skills that produce user-facing output: blog posts, favicons, translations, and frontend quality checks.

Utility & Workflow

Small skills that remove friction from the daily workflow. Individually minor. Collectively, they eliminate hours of context-switching.


The Invisible Layer: Hooks

Skills are invoked by name. Hooks fire automatically. We run four of them, and they provide the telemetry that makes everything else accountable.

Session register fires when any Claude session starts. It creates a JSON tracking file with the session ID, process ID, working directory, model name, and start timestamp. Session end fires when the session closes, marking it as ended and parsing the transcript for token and turn counts.

Task check fires before exit and nudges if no tasks were logged during the session. If you did meaningful work but didn't track it, the agent won't let you leave without acknowledgment. Artifact push handles file uploads to the Scholar platform via presigned S3 URLs.

Together, these hooks create an audit trail for every session: when it started, what model ran, how many tokens were consumed, and what tasks were completed. We can reconstruct any session's cost and output from this data alone.

The Orphan Problem

When a terminal tab crashes, the session hook never fires. The tracking file stays marked as "active" indefinitely. Our cc-sessions.sh utility detects these orphaned sessions by checking if the recorded process ID is still alive. At the start of every new session, the agent checks for orphans and offers to resume them. Dead sessions get cleaned. Active work never gets lost.

Recurring Maintenance

Six tasks run on a cadence, from session-start to monthly, tracked in a single manifest file:

Session-start: triage the reminders file, verify check dates, remove resolved items, flag anything overdue.

Hourly: verify that memory files and project documentation match the current system state. Run golden path audits on all customer-facing frontends. Collapse completed milestones into summaries.

Daily: check CloudWatch for per-project API spend. Flag any project exceeding $10/month. Run the full engagement audit across DynamoDB, Stripe, CloudFront, and Lambda metrics.

Monthly: run the SOC 2 compliance scan across all three AWS accounts. Compare findings against the previous month's baseline. Surface regressions.

Token Budgeting

Every non-trivial task follows a lifecycle: estimate (predict token cost before starting), start (record the timestamp), complete (log actual tokens, files changed, and commit hash). This gives us estimation accuracy data over time.

Why tokens instead of hours? Because an AI agent cannot predict wall-clock time. It can reason about output volume. A config change is 2-5k tokens. A multi-file feature is 10-30k. A new pipeline stage is 50-100k. We calibrate against these benchmarks and track how often the estimate matches reality.


The Pattern

Every skill is a directory containing a single file: SKILL.md. The file has YAML frontmatter for metadata and a Markdown body for instructions. That's it. No framework. No SDK. No runtime dependency.

The instructions describe what the agent should do, how to verify the result, and what guardrails to respect. The agent reads the file when the skill is invoked and follows the instructions using whatever tools are available: file reads, shell commands, API calls, web searches.

The power of this approach is not in any single skill. It is in the accumulation. Each skill we write makes the next one cheaper to build, because the agent already knows the codebase, the deployment patterns, and the verification standards. Skill #31 took fifteen minutes to write. Skill #1 took an afternoon.

The Compound Effect

We started with /commit. Then we needed /commit-project for the monorepo. Then /finished to automate end-of-session cleanup. Then session hooks to track what happened. Then /engagement to measure whether any of it mattered. Each skill existed because the previous one created a gap. The system designed itself.

If you're building something similar, start with the task you do most often. For us, that was committing code. For you, it might be running tests, deploying a service, or triaging bug reports. Write the instructions in a Markdown file. Run it. Fix what breaks. Do it again tomorrow. The skill gets better every time because you refine the instructions, and the accumulation of skills creates an infrastructure that no individual prompt could replicate.

Try Scholar

The /research skill powers our Scholar platform. Pick any topic. Our multi-agent pipeline will search, extract, cross-reference, and synthesize findings into a sourced report. Every blog post on this site was produced using the same pipeline.

Start Researching
No account required to explore. Free tier available.
Voxos Engineering
Written by the Voxos team using the same agent skills described in this article.