
How I built a production-grade malware analysis infrastructure that automatically detonates suspicious files, captures behavioral patterns, and generates threat intelligence reports - all running in my homelab.
Published on January 30, 2026 by Kyle S
cape malware sandbox automation soar proxmox yara mitre-attack threat-intelligence n8n
14 min READ
Let’s be real - manually analyzing malware samples is exhausting. You download a suspicious file, spin up a VM, detonate it, watch for behavior, take notes, extract IOCs, write a report, and then do it all over again for the next sample. Each analysis takes 1-2 hours if you’re thorough. And if you’re a SOC analyst dealing with 10-20 suspicious files a week? Good luck.
I decided to stop doing this the hard way. Over 5 days (and several 2am debugging sessions), I built a fully automated malware analysis pipeline that handles everything from sample submission to report publishing. Now I can queue up 25 samples, walk away, and come back to 25 complete analysis reports with behavioral patterns, MITRE ATT&CK mappings, and extractable IOCs.
This isn’t a toy project. It’s production infrastructure running in my homelab using CAPEv2 sandbox technology, Proxmox virtualization, network isolation, and workflow automation. And it actually works.
GitHub Repository: https://github.com/kyhomelab/malware-analysis-pipeline
Published Analysis Reports: Analysis Catalog (12 samples with Gist reports)
I was tired of the manual grind. Every time I encountered a suspicious file - whether from a phishing email simulation, a CTF challenge, or just curiosity - the analysis process was the same tedious workflow. And the bigger problem? There was no way I could scale this to handle real SOC workloads.
Security operations teams deal with dozens of suspicious files daily. Without automation, you’re forced to prioritize - which means some samples never get analyzed. That’s a risk I wasn’t comfortable accepting.
I wanted to answer some specific questions:
Could I build something that actually runs itself? From sample submission to report publishing, zero manual intervention. Just trigger a webhook and let the pipeline handle the rest.
Could I safely detonate real malware? Network isolation is critical. The sandbox needs to fool malware into thinking it has internet access while blocking all actual C2 communication. One misconfiguration and I’m potentially contributing to a botnet.
Could I generate threat intelligence that’s actually useful? Raw logs are worthless. I needed structured IOCs (domains, IPs, file hashes), behavioral patterns mapped to MITRE ATT&CK, and detection signatures I could feed into a SIEM.
This project became the answer to all three.
The pipeline connects four distinct layers: external sample sources, my local SOC automation platform, the isolated analysis environment, and public report sharing.
Here’s the high-level flow:
MalwareBazaar → n8n Webhook → CAPE Sandbox → Windows 10 VM (Isolated) → INetSim
↓
GitHub Gist Reports
External Sample Sources:
Automation Layer (n8n):
Isolated Analysis Environment (Proxmox):
The Windows VM is deliberately vulnerable - Windows Defender disabled, no security updates, user-mode monitoring agents installed. It’s designed to let malware run freely while capturing every action it takes.
This was the part that kept me up at night. If I mess up the network configuration, I could accidentally let malware reach real C2 servers. That’s not just embarrassing - it’s irresponsible.
My defense-in-depth strategy:
Layer 1: VLAN Segregation The analysis network (10.66.66.0/24) is on a completely separate VLAN (vmbr1) from my home network (192.168.1.0/24). No routing between them exists at the Proxmox level.
Layer 2: No Gateway The Windows analysis VM has no default gateway configured. Even if malware tries to route packets out, there’s nowhere for them to go.
Layer 3: INetSim Fake Internet INetSim runs on the CAPE host (10.66.66.1) and responds to ALL DNS queries with its own IP. When malware tries to reach evil-c2-server.com, DNS resolves to 10.66.66.1, and INetSim returns a fake HTTP response. The malware thinks it’s talking to its C2, but it’s actually just talking to itself.
Layer 4: Proxmox Firewall Rules Explicit firewall rules block any traffic from the analysis VLAN to the internet or home network. Only local traffic within 10.66.66.0/24 is permitted.
After configuring this, I tested it by intentionally trying to break out. I connected to the Windows VM, attempted to ping Google, browse websites, and use VPN tools. Everything failed. The isolation works.
To test the pipeline, I pulled 25 samples from MalwareBazaar representing major threat families:
Each sample got a clean Windows 10 VM, 4 minutes of execution time, and complete behavioral monitoring.
Here’s the reality check: YARA signature detection only caught 28% of samples (7/25). That’s… not great. Families like AgentTesla, Emotet, and Qakbot all ran without triggering YARA rules.
But behavioral analysis? That caught 80% of samples showing malicious behavior even when YARA failed. You can obfuscate and pack a binary all you want, but you can’t hide what it does when it runs.
Top Behavioral Signatures Observed:
One of the nastiest samples I analyzed scored 9/10 and triggered 36 behavioral signatures. Here’s what it did in 4 minutes:
All of this was captured automatically. The CAPE report included:
You can see the complete report here: WhiteSnake Stealer Analysis
I initially tried building this in Shuffle SOAR. Spent 4 hours fighting Docker version incompatibilities before admitting defeat. Rebuilt the entire thing in n8n in 2 hours. Sometimes the right tool is the one that actually works.
Workflow 1: Sample Submission
Webhook Trigger → Validate SHA256 → Download from MalwareBazaar →
Extract ZIP (7zip) → Submit to CAPE API → Return Task ID
Workflow 2: Report Publishing
Schedule (Every 5 Minutes) → Query CAPE for Completed Tasks →
Retrieve Analysis Summary → Format Report → Publish to GitHub Gist
The n8n workflows handle all the messy parts:
Let’s talk about the problems I hit, because tutorials never do.
Problem: Running apt install postgresql just… stopped. No progress, no error, nothing.
Solution: The installer was waiting for interactive prompts I couldn’t see. Set DEBIAN_FRONTEND=noninteractive before installation and it went through cleanly.
export DEBIAN_FRONTEND=noninteractive
apt-get install -y postgresql
Problem: UEFI firmware (OVMF) caused the VM to boot-loop endlessly.
Solution: CAPE’s documentation said to use OVMF, but in my testing, SeaBIOS (legacy BIOS) worked perfectly. Changed the Proxmox VM BIOS setting and it booted immediately.
Problem: Installed Python 3.10 on Windows, but the CAPE agent crashed on startup.
Solution: CAPE’s agent is 32-bit only. I had installed 64-bit Python. Uninstalled, grabbed the x86 installer from python.org, reinstalled. Problem solved.
Problem: Standard Linux unzip command failed on MalwareBazaar samples with “unsupported compression” errors.
Solution: MalwareBazaar uses AES-encrypted ZIPs. Standard unzip doesn’t support that. Switched to 7zip:
7z x -pinfected sample.zip
Worked perfectly.
Problem: CAPE API kept returning 401 Unauthorized despite using the correct token.
Solution: I was using Authorization: Bearer <token> because that’s the standard. CAPE wants Auth-Key: <token> instead. Read the docs, kids.
After running 25 samples through analysis, here’s the practical value this infrastructure delivers:
Every completed analysis generates structured IOCs I can immediately push to a SIEM or threat intel platform:
These IOCs are already formatted for consumption by MISP, TheHive, or Wazuh.
The behavioral analysis provides everything needed to write detection rules:
I went from “we saw a suspicious file” to “here’s a Suricata rule that detects this C2 communication pattern” in under 10 minutes.
When an alert fires and produces a suspicious file, the pipeline provides immediate answers:
The automation means SOC analysts don’t need to be malware reverse engineering experts. The pipeline does the heavy lifting.
After running this for a few weeks, here’s what I learned that actually improved my security operations skills:
YARA is great when it hits, but only 28% detection rate means you need behavioral analysis as a backup. Watching what a program does (API calls, file access, network requests) is more reliable than trying to match static signatures.
Modern malware is heavily packed, obfuscated, and polymorphic. But it still has to call CreateProcess to spawn processes, it still has to modify the registry for persistence, and it still has to open network sockets for C2. That behavior is hard to hide.
Manual analysis: 1-2 hours per sample, 3-5 samples per day max. Automated pipeline: 4 minutes per sample, 200+ samples per day capacity.
That’s not just faster - it’s a completely different capability. Now I can analyze entire malware campaigns, track family evolution over time, and identify behavioral patterns across dozens of samples.
72% of samples actively checked for virtualization. Many used anti-VM techniques like checking for VMware tools, VirtualBox drivers, specific registry keys, or unusual hardware configurations.
If the isolation fails and malware detects the sandbox, it just exits without showing its true behavior. Defense-in-depth (VLAN + no gateway + INetSim + firewall rules) is required to fool modern malware.
Reverting the Windows VM to a clean snapshot between samples takes 30 seconds. Manually rebuilding a Windows VM takes 20 minutes minimum.
Snapshots also prevent cross-contamination. If one malware sample persists through some weird technique, the next sample starts from a guaranteed clean state.
This pipeline is functional, but there’s always room for improvement:
Custom YARA Rules: The 28% detection rate needs improvement. I’m writing custom YARA rules for families the existing ruleset missed (AgentTesla, Emotet, Qakbot).
VirusTotal Enrichment: Before submitting to CAPE, check VirusTotal for existing analysis. If 40 AV engines already flagged it, I can skip the resource-intensive sandbox detonation.
Multi-VM Parallel Processing: Currently one sample at a time. With 3-4 Windows VMs running in parallel, I could analyze 5x more samples per day.
MISP Integration: Automatically create MISP events with extracted IOCs, allowing correlation with existing threat intelligence.
TheHive Case Creation: High-severity detections (score 9+) should automatically create investigation cases in TheHive.
Memory Forensics: Run Volatility 3 on memory dumps from every sample to catch fileless malware and injected payloads.
Office Macro Analysis: Enable Office VM for analyzing weaponized documents (malicious Excel/Word files with macros).
Machine Learning Family Classification: Train a model on behavioral patterns to identify malware families even when YARA fails.
Cluster Analysis: Correlate samples by infrastructure (shared C2 servers), code reuse, or behavioral similarity to attribute campaigns.
Real-Time Dashboard: Live metrics showing analysis throughput, detection rates, and trending malware families.
Building this pipeline required combining skills from multiple security domains:
Malware Analysis:
Infrastructure & Virtualization:
Automation & Integration:
Security Operations:
Five days of work turned a 2-hour manual process into a 4-minute automated workflow. That’s not just convenience - it’s a fundamental capability upgrade.
Now when I encounter a suspicious file, I don’t sigh and block off my afternoon. I submit the hash to a webhook and go work on something else. Five minutes later, I have a complete analysis report with behavioral patterns, IOCs, and MITRE ATT&CK mappings.
This is the difference between reactive security (analyzing whatever you have time for) and proactive security (analyzing everything and building a comprehensive threat intelligence database).
The entire infrastructure runs in my homelab on hardware I already owned. The only costs were time and some late nights debugging why PostgreSQL wouldn’t install.
If you’re dealing with malware samples manually, stop. Build automation. Your future self will thank you.
Ready to build your own? The complete infrastructure code, setup scripts, troubleshooting guides, and documentation are all open source in the repository.
Project Links:
Example Analysis Reports:
Technologies Used: CAPEv2 | Proxmox | n8n | YARA | Volatility | INetSim | Python | Bash | PostgreSQL | KVM
Questions about the architecture? Want to discuss malware analysis techniques or automation strategies? Let’s connect!