AI Agents: Skills & Distribution

The defining insight of this sub-topic is that domain-knowledge distribution is a documentation problem, not a code problem: the "skills as markdown" pattern (Corey Haines' marketingskills repo at 17.4K stars, consumable by any agent) plus a hierarchical architecture with a foundational context file every skill references first, bidirectional cross-referencing, and the emerging .agents/skills/ cross-platform convention. Distribution layers compete and consolidate — GitHub-backed Claude Teams marketplaces auto-syncing multi-plugin bundles across every instance, Codex's in-app plugin model collapsing context switching, Browserbase's researched web-agent skill catalog turning site navigation into reusable playbooks, and FieldTheory localizing X bookmarks as agent-readable context. The recurring operational pattern is LLM-maintained knowledge bases (raw/ → compiled wiki with backlinks, health checks finding inconsistencies, idea files shared instead of code) and the org-scale version (Single Brain ingesting all company data every 15 minutes, where months of accumulated proprietary data IS the moat and context amnesia — "scanning is not knowing" — is the fundamental constraint a memory system must solve).

New skill categories are expanding the library: prose refinement (stop-slop removes AI writing patterns via a skill file), real-time transcription (Codex Meeting Recorder at $0.017/min using GPT Realtime Whisper, with live preview pane and Q&A during the meeting), frontend design (impeccable skill for designer-level aesthetics, imagegen-frontend-web + image-to-code for 1:1 image conversion, Frontend App Builder for engineering-grade fidelity), plan optimization (iterative self-scoring that stops when gains become noise — Sean Geng's installable skill, effective with Fable 5), and project research (Hermes' /grill-me skill to uncover unknowns before starting development). Managed agent platforms mark Anthropic's move up the stack: Claude Managed Agents (PaaS, $2.58 fulfillment cost on $1k of service, Linear SDK integration), the Advisor Strategy (Opus advisor + Sonnet/Haiku executor), the Monitor tool (event-driven background scripts replacing polling), and Symphony assigning a Codex agent to every open issue. Self-improving skill and eval loops are now operational, not frontier — define 3-5 binary criteria, run 10 trials, score, rewrite, retest (hook-writer 32/50 → 47/50 overnight); /autobrowse graduates a converged workflow into a reusable skill; ml-intern automates the post-training research loop end-to-end and beats Claude Code on GPQA. Skill discovery and human-visibility infrastructure (browser-tool benchmarking as a cost line item, Astropad remote desktop for "headless" Mac minis) round out the distribution stack.

Insights

Skills and Knowledge Distribution

The "skills as markdown" pattern is becoming a standard for AI agent extensibility — Corey Haines' marketingskills repo (17.4K stars) packages 36 marketing skills as .md files consumable by any agent (Claude Code, Codex, Cursor), proving domain knowledge distribution is a documentation problem, not a code problem (from marketing skills ai agents)
A hierarchical skill architecture with a foundational context file (product-marketing-context) that all other skills reference first ensures every AI marketing task starts grounded in the company's positioning and audience — not generic advice (from marketing skills ai agents)
Cross-referencing skills bidirectionally (copywriting ↔ page-cro ↔ ab-test-setup) creates an interconnected knowledge graph within the agent, so optimizing a landing page automatically pulls in copy principles and testing methodology (from marketing skills ai agents)
The .agents/skills/ directory convention (migrated from .claude/) is emerging as a cross-platform standard for agent skill storage, with CLI, plugin marketplace, submodule, and SkillKit competing as distribution layers (from marketing skills ai agents)
A GitHub-backed plugin marketplace for Claude Teams auto-installs multi-plugin bundles (sub-skills + agents) across every team member's instance and keeps them in sync from a single repo — centralized plugin management with zero manual per-instance intervention (from claude teams github plugin marketplace)
Codex's plugin model collapses context switching by bringing tools in-app: Chrome and Hyperframe operate as Codex plugins for AI creative workflows; install Slack/Gmail/Computer Use plugins and create custom skills for repeated workflows instead of learning external tools or separate dashboards (from chrome hyperframe codex plugins ai workflow, gpt codex frontend prototype workflow)
Imagegen-first prototyping: use GPT-5.5's image model to generate visual prototypes, then have the model implement code from those prototypes — annotate slides/docs directly in the Codex app and send visual instructions to the agent (from gpt codex frontend prototype workflow)
Multi-device always-on agent setup: a Mac mini as always-connected "home base" running 24/7 heartbeat threads (survive device switching), a MacBook as "satellite" for mobile work, the two added as connected devices with mutual SSH so threads continue and files are reachable from either machine (from codex multi device workflow setup)
The Hermes community is building a cross-platform corpus of real (not theoretical) use cases scraped from X, GitHub, Reddit, Hacker News, YouTube, blogs, and podcasts — community-driven user-story collection becomes a discovery/adoption resource for the agent ecosystem (from hermes agent community use cases)
Connect Google Workspace (Gmail/Calendar/Drive/Docs/Sheets) first when setting up a personal agent — without it the agent can't effectively manage a workflow; Discord channels can then be wired so the agent processes support tickets each morning and auto-organizes them (from hermes agent integrations superpowers)
Specialized professional roles like senior design architecture are being packaged as custom agents for Claude, expanding beyond general-purpose assistants into domain-specific professional workflows (from claude senior design architect agent)
Open source browser automation tools designed specifically for AI agents are emerging as key infrastructure for web interaction and data gathering in agentic workflows (from browser automation agent tooling)
Cheng Lou's @chenglou/pretext package enables developers to integrate AI into demo creation workflows — installable via npm/bun for immediate use (from pretext ai demo package)
Index source documents in a raw/ directory then let LLMs incrementally compile a wiki of .md files with summaries, backlinks, and categorized concepts — LLM writes and maintains all wiki data, you rarely touch it directly (from llm powered personal knowledge bases)
Share abstract "idea files" (gist format) instead of specific code — other people's agents read the idea and customize/build implementations for their specific needs, enabling knowledge distribution without code maintenance burden (from llm personal knowledge base workflow)
Run LLM health checks over wikis to find inconsistent data, impute missing information with web searchers, and suggest new article candidates — agents that maintain their own knowledge base quality compounds over time (from llm powered personal knowledge bases)
GStack autoplan skill generates architectural specs for upgraded systems (e.g., git wiki → SQLite GBrain) through single-line prompts, demonstrating agents as system architects not just implementers (from garry tan openclaw git wiki gstack)
Exo is an open source email client that uses Claude for automated inbox management — described as "Claude Code for your inbox," applying autonomous agent patterns to personal communication (from exo claude email client)
Single Brain architecture at Single Grain: unified vector DB ingesting all company data every 15 minutes; fleet of specialized agents (Alfred/ops, Arrow/sales, Oracle/SEO, Flash/content, Cyborg/recruiting) with a World Agent coordinator — 50+ daily cron jobs as the nervous system (from shared link without context)
Agent coordination conflicts are the biggest operational challenge: sales agent promises timelines SEO data contradicts, content agent uses deprioritized keywords, ops agent double-books time slots — required building explicit conflict resolution and security systems (from shared link without context)
DRI (Directly Responsible Individual) system applied to agent teams: spin up a temporary team around a specific goal, 90-day deadline, agents return to general pool when done, learnings absorbed into World Brain — failures improve the system too (from shared link without context)
Compounding curve for AI-native orgs: Month 1 terrible (hallucinations, 3am broken automations), Month 2 AutoResearch surfaces patterns humans missed (sales call keywords correlating with 3x close rates), Month 3 flywheel turns as accumulated data improves every agent's output (from shared link without context)
Months of continuous data ingestion creates a world model competitors need years to replicate — not because the tech is secret but because proprietary data accumulates in ways that can't be fast-forwarded; the data compounding IS the moat (from shared link without context)
agent-browser (Vercel Labs, 26K+ stars) lets AI agents scrape JavaScript-heavy sites, pages behind logins, and dynamic content using 82% fewer tokens than Playwright MCP — 5-6x more pages per session for knowledge base ingestion (from nick spisak shared link)
AI agents create decision checkpoints automatically: agent drafts a pricing proposal, human adjusts discount and adds reasoning note — the model's proposal is the structured prior, the human's edit is the judgment signal; tacit knowledge becomes observable (from ashugarg shared link)
Context amnesia is the fundamental agent problem: a 200K context window changes how much text the model can scan, not how much it "knows" — scanning is not knowing; without a memory system every session IS a first date (from nyk builderz shared link)

Writing and Prose Skills

The stop-slop GitHub repository (hardikpandya/stop-slop) provides a skill file specifically designed to identify and remove telltale AI writing patterns from prose, addressing the growing need to humanize AI-generated content (from stop slop ai writing pattern removal)
The project is distributed as a skill file format, enabling integration with AI agent workflows for automated prose refinement (from stop slop ai writing pattern removal)
Codex Meeting Recorder skill uses GPT Realtime Whisper endpoint for live transcription at $0.017 per minute ($0.51 for 30-minute meetings), allowing questions about transcript content as it's being generated through the Codex interface (from codex realtime meeting transcription gpt whisper)
Codex displays live transcription in preview pane and generates both full transcript and formatted version upon meeting completion; local realtime option using Nemotron Speech Streaming is being considered as a cost-effective alternative (from codex realtime meeting transcription gpt whisper)
Use 'impeccable' design skill in Codex for frontend design — outperforms popular uiuxpromax skills; combine imagegen-frontend-web skill with image2 for high-quality design references, then image-to-code skill for 1:1 image-to-webpage conversion (from codex design skills frontend development)
Use Frontend App Builder skill (built into Codex) for engineering-grade workflows requiring strict image restoration with precise color interpretation and icon alignment; install Build Web Apps plugin for complete design implementation including shadcn/ui components and browser acceptance testing (from codex design skills frontend development)
Use /grill-me skill in Hermes to systematically uncover project unknowns before starting development, saving all results to memory vault for reference (from hermes agent development workflow)
Sean Geng's plan-optimizer skill can be installed with one command and uses iterative self-scoring to keep the best plan version each cycle — works particularly well with Fable 5 which can break through previous scoring ceilings (from claude plan optimizer iterative improvement)

Managed Agent Platforms

Anthropic launched Claude Managed Agents as a Platform-as-a-Service for AI — pairs an agent harness tuned for performance with production infrastructure, taking deployments from prototype to launch in days (from claude managed agents production platform)
Claude Managed Agents economics: $2.58 fulfillment cost for $1k of service delivery (~99.7% margin potential); 4 user personas determine fit; live console exposes sessions, analytics, and per-agent costs (from claude managed agents breakdown economics)
The Managed Agents API integrates with Linear's Agents SDK — Claude one-shotted a complete deployment example (linear/claude-managed-agents-demo) for shipping custom agents on a Linear instance (from claude managed agents linear sdk integration)
Advisor strategy on the Claude Platform: pair Opus as advisor with Sonnet/Haiku as executor to get near-Opus intelligence at a fraction of the cost — splits reasoning from execution as a first-class platform pattern (from claude advisor strategy platform)
The Monitor tool lets Claude create background scripts that wake the agent only when needed — eliminates polling loops, follows logs for errors, polls PRs via script, and dramatically reduces token consumption (from claude monitor tool background scripts)
Symphony is an open-source orchestrator that assigns a Codex agent to every open issue in a task tracker — turns issue trackers into always-on agentic systems, shifting humans from doing to reviewing and directing (from symphony codex agent orchestrator)

Self-Improving Skills and Eval Loops

Self-improving Claude Code skills: define 3-5 binary eval criteria, run the skill 10 times with varied inputs, evaluator scores every output, identifies failure patterns, rewrites the prompt, retests, keeps the winner — a hook-writer skill went 32/50 → 47/50 overnight (from claude code self improving skills automation)
The improvement loop method works for any creative skill (hooks, briefs, ad copy, scripts, reports) and ends manual prompt tweaking — ideal for DTC brands and agencies whose skills are great 70% of the time and unusable the other 30% (from claude code self improving skills automation)
/autobrowse skill (inspired by Karpathy's autoresearch): agent explores web pages via the Browserbase CLI, learns from failed attempts, iterates until it converges on a reliable workflow, then graduates the winning approach into a reusable browser skill once token usage is optimized (from autobrowse skill web automation agent)
ml-intern automates the post-training research loop: reads arXiv papers, walks citation graphs, pulls Hugging Face datasets, launches HF Jobs training when no local GPUs are available, monitors runs, diagnoses failures, retrains — beat Claude Code on GPQA (32% vs 22.99% in <10h) by finding OpenScience+NemoTron-CrossThink and running 12 SFT runs on Qwen3-1.7B (from ml intern automated research agent)
ml-intern can recognize low-quality data and generate replacements — wrote a script for 1100 synthetic healthcare data points, upsampled 50x, and beat Codex on HealthBench by 60%; full GRPO training with ablation loops runs autonomously (from ml intern automated research agent)

Agent Skill Distribution and Discovery

FieldTheory CLI (npm install -g fieldtheory, then ft sync) downloads X bookmarks locally so agents can read them; ft viz for visualization, ft classify <url> for tagging — local bookmark graph as agent-readable context, no API limits (from fieldtheory x bookmarks cli tool)
Allie K. Miller's /ss screenshot skill: Claude lists newest files in your screenshots folder, grabs the most recent (or N most recent with /ss 4), and acts on the trailing argument — /ss huh (explain), /ss fix (debug error or design), /ss do this (reverse-engineer + remix); saves ~1 hour/week (from claude screenshot skill visual processing)
Browser-tool selection makes a massive difference in agent token usage and latency on the same task — benchmark before adopting; tool choice is now a cost/perf optimization line item, not a default (from browser tools agent cost benchmark)
Browserbase's open-source catalog of web-agent skills reframes reliable web automation as distributed operational knowledge: researched site playbooks can be reused by any agent instead of rediscovered per project (from browserbase web agent skills catalog)
Astropad Workbench provides high-performance remote desktop from iPad/iPhone for headless Mac Minis running agents — needed because agents still require human visibility into logs, stuck tasks, and outputs even when "headless" (from astropad workbench headless mac remote desktop)

Voices

7 contributors

Tom Dörr

@tom_doerr

Follow for posts about GitHub repos, DSPy, and agents Subscribe for top posts DM to share your AI project (Due to volume of DMs I'll prioritize subscribers)

195.1K followers 1 tweet