25 Malware Samples Later: Detection Engineering Lessons from My Analysis Pipeline

In my previous post, I walked through how I built an automated malware analysis pipeline using CAPEv2, Proxmox, and n8n. Infrastructure is one thing. What actually matters is what you learn from the data it produces.

I’ve now run 25 real-world malware samples through the pipeline. The results confirmed some things I expected, surprised me in a few areas, and fundamentally changed how I think about detection engineering. This post breaks down those findings.

The Detection Gap: 28% vs 80%

The single most important number from this analysis: YARA signature detection caught 28% of samples. Behavioral detection caught 80%.

That’s not a slight edge. Behavioral analysis detected nearly 3x more threats than static signatures alone. And the 20% that behavioral analysis missed? Those were heavily packed samples that crashed during detonation or employed aggressive anti-sandbox techniques that prevented meaningful execution.

This isn’t a criticism of YARA. Signatures are fast, cheap to run at scale, and critical for known-threat triage. But if your detection strategy relies primarily on signatures, you’re structurally blind to 70%+ of what’s actually hitting your environment.

The takeaway isn’t “stop using signatures.” It’s layer your detections and assume signatures will fail for anything novel.

The Five Most Common Behavioral Patterns

Across all 25 samples, five behavioral patterns dominated:

Behavior	Prevalence	MITRE Technique
System discovery commands	88%	T1082, T1083
Code obfuscation / packing	80%	T1027
VM/sandbox evasion checks	72%	T1497
Process injection	60%	T1055
Browser credential theft	48%	T1555.003

System Discovery is Nearly Universal

22 out of 25 samples ran some form of system reconnaissance before doing anything else. The pattern was consistent: whoami, systeminfo, ipconfig, tasklist, wmic calls - mapping out the environment before deciding whether to deploy the real payload.

This is useful for defenders. System discovery commands from unexpected parent processes are a reliable early-warning signal. A cmd.exe spawned by outlook.exe running systeminfo is almost never legitimate. These detections are low-noise and high-fidelity.

VM Evasion is the Norm, Not the Exception

72% of samples actively checked whether they were running in a sandbox. Techniques ranged from simple (checking for VMware tools processes, querying MAC address OUI prefixes) to sophisticated (timing-based checks using GetTickCount, checking for human interaction artifacts like recent documents and browser history).

This has direct implications for sandbox design. My pipeline uses INetSim to simulate internet services and the CAPEv2 agent with anti-evasion patches, but the arms race is real. Samples that detected the sandbox simply exited cleanly - no malicious behavior to analyze.

Process Injection Remains King

60% of samples used some form of process injection. The breakdown:

Process hollowing (T1055.012) - 40% of injectors. Spawn a legitimate process suspended, hollow it out, inject payload
CreateRemoteThread (T1055.003) - 35% of injectors. Classic DLL injection into running processes
LOLBAS abuse - 25% of injectors. Using legitimate Windows binaries as proxy executors

The favorite injection targets? explorer.exe, svchost.exe, and RegAsm.exe. All blend into normal system activity, which is exactly the point.

For detection, monitor for process creation anomalies: svchost.exe without -k parameters, processes with unusual parent-child relationships, and high-entropy memory regions in legitimate processes.

Notable Detections: The Highlight Reel

XWorm RAT - The Kitchen Sink

XWorm was the most feature-rich sample in the dataset. Keylogging, screen capture, credential theft, persistence via scheduled tasks AND registry run keys (redundant persistence is a red flag), plus encrypted C2 over a non-standard port. Detection surface was wide - almost too many indicators to miss.

WhiteSnake Stealer - Surgical Precision

Unlike XWorm’s sprawling approach, WhiteSnake was focused. It hit browser credential stores, cryptocurrency wallets, and VPN configs, exfiltrated over HTTPS, and self-deleted. Total execution time: under 90 seconds. The lesson here is that speed kills - if your detection pipeline has even a 2-minute delay, this sample completes its entire kill chain before you see the first alert.

RisePro - The Evasion Outlier

RisePro was the only sample that successfully evaded most behavioral detections. Heavy packing, delayed execution, environment fingerprinting, and incremental payload delivery meant the sandbox saw minimal activity during the analysis window. This is the 20% that behavioral detection misses - and why extending detonation times and using multiple analysis passes matter.

MITRE ATT&CK Tactic Distribution

Mapping all observed behaviors to ATT&CK tactics across the dataset:

Discovery       ████████████████████████ 88%
Defense Evasion ████████████████████████ 84%
Execution       ██████████████████████   80%
Credential Acc. █████████████████        60%
Collection      ████████████████         56%
Persistence     ███████████████          52%
C2              ██████████████           48%
Exfiltration    ████████████             44%
Priv Escalation ████████                 28%
Lateral Movmnt  ███                      8%

Two observations:

Discovery and Defense Evasion dominate. Nearly every sample’s first priority is understanding the environment and avoiding detection. If you’re building detections, these tactics are your highest-ROI targets because they’re nearly universal across malware families.

Lateral Movement is rare in commodity malware. Only 8% showed lateral movement capability. This makes sense - most of these samples are stealers and RATs designed to compromise individual endpoints, not move through networks. Lateral movement detection is still critical for targeted attacks, but your commodity malware detections should focus elsewhere.

Network IOCs: The Underrated Detection Layer

PCAP analysis from the isolated network produced 42 unique network IOCs across the dataset: C2 domains, callback IPs, and exfiltration endpoints. Several patterns emerged:

DNS over HTTPS to cloudflare-dns.com and dns.google for C2 resolution (bypassing corporate DNS monitoring)
HTTPS on non-standard ports (8443, 4443, 9443) for C2 traffic
Telegram Bot API callbacks for data exfiltration (3 samples used Telegram as a C2 channel)
Discord webhook abuse for exfiltrating stolen credentials

These aren’t traditional IOCs that age out quickly. The patterns - DNS-over-HTTPS for resolution, messaging platform APIs for exfil - are stable detection opportunities that work across malware families.

Three Detection Engineering Principles, Reinforced

1. Detect the Behavior, Not the Tool

Every sample in this dataset would eventually get new hashes, new C2 infrastructure, and new packing. The behaviors stayed consistent. System discovery commands from unexpected parents, process injection patterns, and credential store access are durable detections that survive malware updates.

2. Speed Matters More Than You Think

WhiteSnake completed its entire kill chain in 90 seconds. If your SIEM ingestion pipeline has a 5-minute delay, that sample has already exfiltrated credentials and self-destructed before your first alert fires. Real-time or near-real-time detection isn’t a luxury - it’s the difference between catching a breach and investigating one.

3. Layer Everything

No single detection approach caught everything. YARA missed 72% of samples. Behavioral analysis missed 20%. Network IOCs missed samples that successfully evaded detonation. But layered together, the combined detection rate approaches 95%. Defense in depth isn’t a buzzword - it’s a mathematical necessity.

What I’m Building Next

This analysis is feeding directly into detection content for my Unified SOC Lab. Specifically:

KQL detection rules based on the behavioral patterns identified here
YARA rules tuned for the packing and obfuscation techniques observed
Sigma rules for the process injection patterns, portable across SIEM platforms
Extended detonation profiles for samples flagged as evasion-heavy

The pipeline produced 12 published analysis reports (available in the Analysis Catalog), 42 network IOCs, and 200+ behavioral signature matches. That’s not a toy dataset - it’s a working threat intelligence feed built entirely from homelab infrastructure.

Final Thought

Running your own malware analysis pipeline isn’t just a resume project. It fundamentally changes how you think about detection. When you watch 25 samples execute in real-time and see the patterns emerge, you stop thinking in terms of IOCs and start thinking in terms of behaviors. That shift - from reactive signature matching to proactive behavioral detection - is what separates a SOC analyst from a detection engineer.

The infrastructure post covered the “how.” This post covered the “so what.” The data speaks for itself.