Skip to main content

Why I Built Human-in-the-Loop Instead of Full Automation

Keep humans in control when building AI security tools. Full automation sounds impressive until your reputation tanks from false positives.

Chudi Nnorukam
Chudi Nnorukam
Dec 28, 2025 4 min read

In this cluster

Bug Bounty Automation: Autonomous security testing with human-in-the-loop safeguards and evidence gates.

Pillar guide

Building a Semi-Autonomous Bug Bounty System with Claude Code How I built a multi-agent bug bounty hunting system with evidence-gated progression, RAG-enhanced learning, and safety mechanisms that keeps humans in the loop.

Related in this cluster

Why I Built Human-in-the-Loop Instead of Full Automation

I could have built BugBountyBot to submit findings automatically. The technical barrier isn’t high—an API call to HackerOne after validation passes.

I didn’t build it that way. Here’s why.

The Temptation of Full Automation

Full automation is seductive:

  • Speed: Submit findings as fast as you find them
  • Scale: Hunt 24/7 without human bottlenecks
  • Ego: “I built a system that hunts bugs while I sleep”

Every conversation about BugBountyBot eventually hits this question: “Why not just auto-submit?”

The False Positive Problem

In security research, reputation is everything. A single bad submission can:

  • Get your report closed as “Not Applicable”
  • Add a negative signal to your profile
  • Cost you access to private programs
  • Waste triager time (they remember)

The math is brutal: one false positive can undo five true positives in terms of reputation impact.

Automated systems optimize for recall—finding everything possible. But bug bounty rewards precision. A 90% precision rate sounds good until you realize that means 1 in 10 submissions is garbage.

Platform Requirements

This isn’t just my opinion—platforms explicitly require human oversight.

From HackerOne’s Automation Policy:

“Automated tools must have human review before submission. Fully automated submission systems are prohibited.”

From Intigriti’s Terms:

“Researchers are responsible for the quality and accuracy of all submissions, including those assisted by automated tools.”

From Bugcrowd:

“Automated scanning that results in excessive false positives may result in account suspension.”

Build full automation, and you’re violating ToS. Not a gray area.

The Liability Question

When your bot submits a finding, who’s responsible?

  • If it’s valid: You get credit
  • If it’s invalid: You get blamed
  • If it causes harm: You’re liable

There’s no “my AI did it” defense. The liability is asymmetric—downside is yours, and “scale” just multiplies it.

Compare this to human-in-the-loop:

  • You review each finding before submission
  • You apply judgment about timing and context
  • You own the decision, not just the consequence

What Humans Do Better

Automation excels at:

  • Pattern matching at scale
  • Consistent testing methodology
  • 24/7 availability
  • Memory across sessions

Humans excel at:

  • Context understanding - Is this behavior intentional?
  • Impact assessment - Is this actually a security issue?
  • Communication - Can I explain this clearly?
  • Timing judgment - Is now the right time to submit?

The optimal system uses AI for the first set and humans for the second.

The Human-in-the-Loop Architecture

BugBountyBot’s design:

[Recon Agent] → [Testing Agent] → [Validator Agent]

                              [Confidence ≥ 0.85?]
                                    ↓         ↓
                                  Yes        No
                                   ↓          ↓
                         [Queue for Review] [Log & Learn]

                           [Human Review]

                         [Approve/Reject/Edit]

                          [Reporter Agent]

The human checkpoint is after validation but before submission. You’re not reviewing raw signals—you’re reviewing high-confidence findings with full evidence.

This is the leverage point: AI handles the 80% grind, humans handle the 20% that requires judgment.

The Numbers That Matter

MetricFull AutomationHuman-in-the-Loop
Submissions/dayHighMedium
Precision~70%~95%
Reputation trendDecliningStable/Growing
Platform standingAt riskSolid
Sustainable?NoYes

Optimizing for submissions per day is the wrong metric. Optimize for accepted findings per month, reputation over time, and access to better programs.

When Full Automation Makes Sense

There are legitimate use cases:

  • Internal security testing - Your own infrastructure, no reputation at stake
  • Private engagements - Client agreed to automated testing
  • Research environments - Sandboxed, no real submissions

But for public bug bounty programs? Human-in-the-loop is the only sustainable architecture.

The Deeper Point

The goal isn’t maximum automation. The goal is maximum valuable output with acceptable risk.

Human-in-the-loop is how you get there. It’s not a compromise—it’s the architecture that lets you scale without catastrophic failure modes.

Build for sustainability. Your future self will thank you.


Chudi Nnorukam

Written by Chudi Nnorukam

I design and deploy agent-based AI automation systems that eliminate manual workflows, scale content, and power recursive learning. Specializing in micro-SaaS tools, content automation, and high-performance web applications.

Related: Building a Semi-Autonomous Bug Bounty System | Portfolio: BugBountyBot

FAQ

Can bug bounty hunting be fully automated?

Technically yes, but practically no. Platforms require human oversight, false positives damage your reputation, and the liability for automated submissions is unclear. Semi-autonomous with human approval is the sustainable approach.

What decisions require human judgment in bug bounty?

Severity assessment (is this actually impactful?), business logic context (is this intended behavior?), submission timing (is this the right moment?), and evidence quality (is this reproducible enough?).

How do you balance automation speed with human oversight?

Automate the tedious phases (recon, initial testing) and queue validated findings for human review. You review 10-20 high-confidence findings instead of grinding through 1000 raw signals.

What happens when automated submissions go wrong?

Your reputation tanks, programs stop accepting your reports, and platforms may ban you. One researcher's automated false positives got them permanently blocked from HackerOne's private programs.

Is human-in-the-loop slower than full automation?

For individual submissions, yes. For sustainable output over months and years, no. The time saved by not dealing with reputation damage, disputes, and bans far exceeds the review overhead.

Sources & Further Reading

Sources

Further Reading