Securing the Skills Layer

We spent the last year getting serious about MCP security. Authentication, access control, input validation, transport, supply chain. Good. Necessary. But the protocol layer is only part of the story.

While we were locking down MCP servers, a new abstraction appeared above them: skills.

Skills are packaged bundles of instructions, scripts, resources, and MCP server configurations that agents load to perform specialized tasks ^[1]. A folder with a SKILL.md, maybe some scripts, maybe a reference to an MCP server. You install one, and your agent gets a new capability.

The convenience is real. So is the risk.

Skills are not MCP servers

MCP was designed with process isolation. A server runs in its own process, with scoped credentials and explicit tool invocation. The agent calls it; it responds.

Skills work differently. They run in-process. They share the agent’s memory, context window, credentials, and filesystem access. A skill doesn’t just respond to calls. It can alter how the agent thinks, what it does with other tools, and how it interprets future instructions.

That is a fundamentally different trust model. And most people installing skills today don’t realize it.

Skills vs MCP: two security models, one agent

The composition problem

Here’s what makes skills uniquely dangerous: they cross multiple threat categories at once.

A single malicious skill can combine supply chain compromise, tool poisoning, code execution, and credential harvesting into one artifact. An MCP server does one thing. A skill orchestrates many things, including MCP servers themselves.

When a skill configures or launches an MCP server, it can pass broader credentials than intended, misconfigure isolation, or inject parameters the user never approved. The skill becomes a bridge between the in-process world and the process-isolated world, and that bridge can be exploited.

What ClawHub showed us

In January 2026, Koi Security audited 2,857 skills on ClawHub and found 341 malicious entries ^[2]. The campaign, called ClawHavoc, used fake prerequisites to install Atomic Stealer malware on macOS systems ^[3]. Skills with names like solana-wallet-tracker and polymarket-trader looked legitimate but contained shell commands that harvested SSH keys, browser passwords, and API credentials.

ClawHavoc attack chain: from SKILL.md to credential theft

The attack pattern was simple: a skill’s documentation includes a “Prerequisites” section telling users to run a command. The command fetches a payload from an external server. The payload steals everything it can reach.

This works because skills collapse the boundary between documentation and execution. An agent treats markdown content as actionable instructions. A code block in a SKILL.md is not documentation; it’s a command.

Snyk’s research made this concrete: from SKILL.md to shell access in three lines of markdown ^[4]. No exploit needed. The architecture is the vulnerability.

Separately, Snyk’s broader ToxicSkills study found prompt injection in 36% of analyzed skills and 1,467 malicious payloads across the ClawHub ecosystem ^[5]. The problem is systemic, not isolated.

The ClawHub incidents are not sophisticated attacks. They’re social engineering packaged in a format that agents and humans both trust.

The skill marketplace model mirrors npm, PyPI, and browser extension stores in their early days. Low barriers to publishing, no verified publishers, no provenance attestation, no signing. ClawHub requires only a week-old GitHub account to publish. That’s it ^[2].

We’ve seen this movie before. Package registries and extension stores all went through a phase of rapid growth followed by exploitation. The difference here is that the consumer is not just a human developer reading code. It’s an AI agent executing instructions with system access.

When skills can modify the agent’s own behavior, inject instructions into future conversations, and persist across sessions through memory files, the blast radius of a single bad install grows dramatically.

What needs to happen

I think about this in terms of lifecycle: authoring, publishing, discovery, installation, runtime, update, deprecation. Every phase has gaps today.

Integrity: Skills need cryptographic signing. A SKILL.md is a plain text file with no verification mechanism. We need hash-based pinning, signed packages, and provenance attestation for authors.

Isolation: Skills need sandboxing. Today, every installed skill has full access to everything the agent can reach. We need capability-based permissions: network access, filesystem scope, MCP server access, all declared and enforced.

Supply chain: Marketplaces need admission control. Automated scanning of skill packages (static analysis, instruction analysis, dependency analysis), verified publishers, and rapid takedown mechanisms.

The skill-to-MCP bridge: When a skill configures an MCP server, that configuration should be validated against approved policies. Credential scoping should be enforced at the boundary. Runtime monitoring should flag unexpected connections.

None of this exists yet. We’re in the “move fast” phase. The “break things” part is already happening.

The bigger picture

MCP security gave us the tools to secure the protocol layer. But skills represent the composition layer above it, where instructions, code, and MCP configurations get bundled into a single installable artifact.

Securing one without the other leaves a gap that attackers are already exploiting.

The question isn’t whether the skills layer needs its own security model. It’s whether we’ll build one before the next ClawHavoc, or after.

References

[1] Anthropic, "Agent Skills Specification," agentskills.io, December 2025. agentskills.io

[2] Koi Security, "ClawHavoc: 341 Malicious Skills Found by the Bot They Were Targeting," February 2026. koi.ai

[3] The Hacker News, "Researchers Find 341 Malicious ClawHub Skills Stealing Data from OpenClaw Users," February 2026. thehackernews.com

[4] Snyk, "From SKILL.md to Shell Access in Three Lines of Markdown: Threat Modeling Agent Skills," 2026. snyk.io

[5] Snyk, "ToxicSkills: Malicious AI Agent Skills Supply Chain Compromise," 2026. snyk.io

[6] 1Password, "From Magic to Malware: How OpenClaw's Agent Skills Become an Attack Surface," 2026. 1password.com

[7] The Register, "It's Easy to Backdoor OpenClaw, and Its Skills Leak API Keys," February 2026. theregister.com