Reprogrammed

Lets Make Malware – Static Detection Bypasses

Lets Make Malware

Preamble

Over the prior few weeks, we have covered several different methods of executing shellcode and malicious DLLs. This week, we will begin to take what we have learned so far and improve upon it. There are quite a few ways to do this, but to start, we will look at static detections and, of course, bypassing them.

What are static detections

Static detections are the detections that happen prior to execution. These can be hashes of the file itself, such as a MD5 or SHA1 sum. These detections work well for known malicious files. These are also the easiest detections to bypass. By changing a single byte within the file, we change the digest. By changing the digest, the detection no longer catches us. Easy!

Thankfully, we now have YARA rules, which allow for more granular detections. Instead of simply checking for known malicious files, we can now look for known malicious strings and byte sequences. Because of this, we can no longer simply change an arbitrary byte to bypass the detection. We need to know what is being detected.

What is being detected

Unless we have the ruleset being used in hand, there is no surefire way of knowing what part of a malicious file will be detected. However, we can make some reasonable assumptions. We know when we create a YARA file or any type of detection, we want to avoid false positives as much as possible.

Too many files being blocked, or worse removed, becomes disruptive. This means we want to detect things that are unique. “Definitely not malware.ps1” is probably not something we will see legitimate software named. So, a strange filename could be something we detect on.

Likewise, if we know strings like “Executing payload” and “Disabling Anti-Virus” exist within the file, we can build detections for those as well. The same logic can be applied to byte sequences too, of course.

With that in mind, as malware developers, we can begin to bypass static detections by removing or obfuscating strings.

Obfuscating Strings

There are a variety of ways we can obfuscate strings to bypass static detections. We could simply encode them using something like Base64, Base32, or Hexadecimal. These will work in most cases. Although, they are also easy to detect as they, themselves, are still strings and easy to decode.

For example, in Python, we could do something like this: Python Base64 String Obfuscation

Another way we can obfuscate strings is through concatenation. This means simply building our string in increments. Again, in Python, we can do it like so:

Python String Concatenation

In C and C++, we will want to use C-style strings. In other words, we want to use null-terminated character arrays. An example:

C-Style Strings

There are, of course, other ways still to obfuscate strings like converting ordinal or Unicode/ASCII values into characters.

One thing to note is we want to obfuscate strings provided as function arguments as well. These are commonly overlooked.

Shellcode

So, now that we have obfuscated the strings, what else can we do? We can begin to look at our shellcode. If we generate our shellcode using MSFVenom or Cobalt Strike, there is a great chance static detections will catch us. These two, in particular, are highly signatured. Of course, shellcode in general is commonly detected as well by detecting particular byte sequences.

So, how do we get around that? We can use the same methods as we did for obfuscating strings, either encoding or encrypting the shellcode.

Sidenote: There are also projects that allow us to manipulate the shellcode directly, typically by adding in junk instructions or replacing certain instructions with others.

Shellcode is generally provided as hexadecimal so that option is out, but we could base64 encode it. This may work on occasion but AV vendors have largely caught on to this.

That leaves us with a few other options. We could create something custom and map certain bytes to words in English (or any other language for that matter). This works well in a lot of cases. For instance, we could map byte 0xFC to the word “Hungry”.

We could also encode the shellcode into formats like UUID or MAC Addresses. These also tend to work well. The SGN encoder (Shikata-ga-nai – Japanese for “it cannot be helped”), not to be confused with the original Shikata-ga-nai encoder in MSFVenom, is also an option.

Lastly, we can encrypt the shellcode. Typically, this will be using AES-256 CBC, but we can use anything here. Note, however, we generally do not want to use weaker encryption.

Although, even weaker encryption schemes can still work at times.

Other than AES-256 CBC (or even GCM), we could also use XOR and RC4 encryption.

One trick with XOR, because it is a weaker encryption, is to brute force the (multibyte) key when decrypting our shellcode instead of including it in our malware. This has a side benefit in (potentially) evading heuristics (i.e sandboxes) because nothing malicious happens while the key is being brute forced.

This happens, as we will discuss further in a later post, because sandboxes cannot run indefinitely. At some point, the AV must decide whether to allow the file to run, and it must make this decision fairly quickly.

The benefit to using RC4 encryption is that it can be decrypted in memory, so our shellcode is never in plaintext until it absolutely needs to be. So, there are benefits to using different encryption schemes.

One downside to using encryption is the resulting higher entropy.

Entropy

Entropy is a measure of randomness. Words in a language follow a certain pattern and typically have often reoccurring letters. Words are not very random. Therefore, they have low entropy. Encryption on the other hand is the exact opposite. It is highly random and has high entropy.

High entropy on its own is not bad or suspicious. Encryption has many valid uses. However, when combined with other factors, it can be used to detect malware.

So, of course, there is a tradeoff that must be considered when encrypting our shellcode. That said, one thing we can do when encrypting our shellcode is also include a large array of English words to counteract the higher entropy. This should bring our overall entropy back down to reasonable levels.

Debug Symbols

Finally, we must talk about debug symbols. As Bing Chat (Chat-GPT4) summarizes it:

Debug symbols are additional information related to the symbols (e.g. function, variable) of a program that are compiled into a binary file (e.g. executable, shared object, DLL).

They can help debug a program by mapping the binary code to the source code and showing the function calls, variable names, and data types. Debug symbols are stored in different types of files, such as PDB, DBG, SYM, etc. Debugging symbols may not match the optimized code in release builds.

Similarly, when discussing C#/.NET, we want to obfuscate function, variable, and class names, as well as the compiler-generated GUID.

Luckily, removing debug symbols is very easy. We can use the -s option for GCC, the -ldflags "-s" option for GoLang’s compiler or the strip executable.

Conclusion

With that, we have covered some methods for bypassing static detections. This is, by far, the easiest kind of detection to bypass and in doing so will usually mean we can at least get our malware executing. Once that process starts, however, we will see we have other layers to get through. But we will save that for next time!