Gemini Executive Synthesis

An AI model and harness for penetration testing and security scanning, post-trained on CTF contests.

Technical Positioning

A specialized AI-powered cybersecurity tool for SMEs and mid-market companies, offering un-guard-railed pen-testing capabilities, unlike general-purpose LLMs or enterprise-gated solutions. It provides concrete, verifiable vulnerability findings through a CLI with local code scanning and sandboxed live system exploitation.

SaaS Insight & Market Implications

This product directly addresses a critical market gap: accessible, un-guard-railed AI-driven penetration testing for SMEs and mid-market. Current LLMs are either restricted or too generalized, leaving these segments vulnerable. By post-training on CTF data, the solution offers practical, exploit-driven vulnerability identification, moving beyond "vibes-based findings" to verifiable exploits. The CLI-based, local code scanning with context sent over TLS to an inference API balances security with utility. The "Pen test" mode, though gated, promises active adversarial testing in sandboxed environments, a significant differentiator. This targets a high-value problem: proactive security for underserved markets. The pricing model (free scan up to 2M tokens, then paid) lowers adoption barriers. The inherent safety concerns of such a powerful tool are acknowledged, indicating a strategic approach to responsible deployment.

Proprietary Technical Taxonomy

Raw Developer Origin & Technical Request

Hacker News Jun 21, 2026

Show HN: We post-trained a model that pen tests instead of refusing

Anthropic and OpenAI's publicly available models are explicitly guard-railed so that they refuse offensive tasks. And their cyber-focussed models are gated for enterprises. This leaves SMEs and mid market open to major vulnerabilities.AI can be used as both an adversarial and defensive tool in the world of cyber. A worst case outcome is if only the adversaries have access.Meanwhile, most existing AI cyber tools are just wrappers. The problem is that they still have all the guardrails on from the foundation model where they will inherit its refusals.For this project we've post-trained a specific model on a decade of capture-the-flag contests. This won't be made available to anyone and everyone, but we do believe that responsible SMEs and midmarket companies also need access to these tools in order to identify key vulnerabilities in their systems; not just enterprises.We have developed two modes that run over a CLI:• Security scan: a read-only audit of your local codebase for vulnerabilities. It only reports what it can tie to a specific file and line, so you're not wading through vibes-based findings.• Pen test: an active adversarial mode that will try to break a live system in a sandboxed environment. It proves each vulnerability by running the exploit and showing the request it sent and the response your code gave back, not a confidence score. Currently gated.To show what the scan does, we pointed it at Bank of Anthos and it found an integer overflow in the transfer path: amount is an int, and amount + fee can overflow negative, so the balance check passes and you move funds you don't have. Plus the usual auth and secrets issues. (Bank of Anthos is Google's open-source bank. It's a known app and some of it is intentionally weak, which is the point: you can clone it and re-run the scan yourself instead of trusting a screenshot)The base model is a Kimi K2.6 (open weights). We didn't pretrain from scratch. We post-trained it ourselves, SFT on CTF writeups, then RL with verifiable rewards against actual exploit checks.How the harness works:Along with the model we built the harness to support this. The harness runs on a multi-agent swarm: an orchestrator splits the job across subagents running in parallel, each owning a slice, then synthesising one report.The CLI is a local binary (brew/curl). It reads your code locally, then sends context to our inference API over TLS tcpdump it and you'll see exactly what leaves and where. Install is free; and you can run a scan for free up to 2m tokens, then need to pay for tokens beyond this.For full disclosure this is a product part of Cosine (YC W23)Up for debate: tool safety, e.g. domain verification is one method that proves control but not necessarily permission. How would you gate a pen-test tool given that?

View Raw Source

Developer Debate & Comments

lacoolj • Jun 21, 2026

Inb4 govt intervention

skiing_crawling • Jun 20, 2026

Any generic abliterated or ubcensored open weight model (such as a qwen variant) will happily comply with requests like this.

jjcm • Jun 20, 2026

IMO the most interesting thing about this is Kimi K2.6, an extremely capable model, can be relatively easily post-trained to allow pen tests.This in its own right proves that the defenses of Fable and others are temporary blocks, and AI based hacking is going to be effectively available to all parties regardless of stop gaps, as long as open models exist.

luminati • Jun 20, 2026

Relevant: https://news.ycombinator.com/item?id=48016224 what's the differnce between this vs running shannon on aws/bedrock fully airgapped in my vpc? I've got some pretty great results with shannon [no subprocessor and can pay via aws credits]. Even better using claude code token [effectively free with our $200/mo cc subscription] I tried kimi but it generally spins it's wheels extensively in it's thinking tokens. kimi2.7 is an attempt at reducing this. But doing finetuning, means you will always be behind the latest.as a side note - I think it's very unprofessional and very shitty to not mention kimi2.6 at all in your marketing copy. and i feel that you posted that in this hn post begrudgingly since the hn crowd would have flagged that. confirmed with a google search too: https://www.google.com/search?q=kimi+site%3Aargusred.comAll around your marketing website you keep mentioning - 'A model lab built it'. A fintune does not maketh you a model lab - some humility please :)finally - doesn't Kimi's licensing prohibit you from not mentioning them? Didn't cursor run into the same issue?

jrflowers • Jun 20, 2026

Show HN: We told Claude to generate a marketing page for a theoretical pentesting model

mkaszkowiak • Jun 20, 2026

What was your approach to benchmarking an adversarial agent?This is an open problem that I came across (in a different domain), as the search space can be really wide. It's hard to measure results for non-trivial tasks.Would be really interested if you can share your eval approach :)

Catloafdev • Jun 20, 2026

Why create an offensive tool rather than a repo-scanning tool?I can't think of any way to safely release an offensive tool publicly.

cortesoft • Jun 20, 2026

> This won't be made available to anyone and everyone, but we do believe that responsible SMEs and midmarket companies also need access to these tools in order to identify key vulnerabilities in their systems; not just enterprises.So this is the same policy that Anthropic and OpenAI have, it is just based on your criteria rather than theirs.

andai • Jun 20, 2026

Fantastic. Could you share more details what it was like post-training a model?

Frequently Asked Questions

Market intelligence mapped to An AI model and harness for penetration testing and security scanning, post-trained on CTF contests..

What is the technical positioning of An AI model and harness for penetration testing and security scanning, post-trained on CTF contests.?

Based on our AI analysis of the original developer request, its primary technical positioning is: A specialized AI-powered cybersecurity tool for SMEs and mid-market companies, offering un-guard-railed pen-testing capabilities, unlike general-purpose LLMs or enterprise-gated solutions. It provides concrete, verifiable vulnerability findings through a CLI with local code scanning and sandboxed live system exploitation.

What is the general sentiment around An AI model and harness for penetration testing and security scanning, post-trained on CTF contests.?

Yes, we have tracked 37 direct responses and active debates regarding this specific topic originating from Hacker News.

What are the foundational technologies related to An AI model and harness for penetration testing and security scanning, post-trained on CTF contests.?

Our proprietary extraction maps An AI model and harness for penetration testing and security scanning, post-trained on CTF contests. to adjacent architectural concepts including post-trained model, pen tests, guard-railed, offensive tasks.

Engagement Signals

Upvotes

Comments

Cross-Market Term Frequency

Quantifies the cross-market adoption of foundational terms like CLI and tokens by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.