File System Friend or Foe? - How to Tell if a File is Malicious or Not
An unlucky upshot of running your own website or online store is that, sooner or later, hackers will add it to their ‘juicy list of prey’. Once in their list, hackers will continuously scan and probe your site for weaknesses, trying to find a way to further their illicit goals.
As a hoster, you’ll have to sift through many thousands of web server files, checking whether any malicious code got in during a suspected breach. This can get tricky, as legitimate software can seem malicious when it’s not, and deleting files by mistake can break your website.
In this article, I’ll describe techniques to help you identify the difference between good files and bad files, that is, between clean ones and infected ones.
Malware authors often use several techniques to hide malicious code. I like to classify them into two kinds:
- Obfuscation through encoding and encryption techniques.
- Obfuscation through legitimate appearances.
I’ll look at them each in turn.
Obfuscation through encoding and encryption techniques
Obfuscation is the deliberate act of creating source or machine code that is difficult for humans to understand. It’s a way of jumbling up code to make it visually unfriendly and difficult to read. The intention is to mask exactly what the code is doing.
Obfuscation has two uses:
-
Developers use it for legitimate reasons, to protect software code from reverse engineering and copying of functionality. This is common in, for example, commercial (paid) software.
-
Hackers use it for illicit purposes, to disguise malware or exploitation code, and prevent its detection by antivirus software.
Obfuscation for legitimate purposes
Below is an example of some obfuscated PHP code.
If you are a hoster and you see this code sitting in your file system, your first instinct would be to assume that it’s malware. But, you will only be right in your decision 50 percent of the time.
Remember, as I said before, it’s not just malware writers who use obfuscation. Honest developers can obfuscate, too, to protect their code. So let’s de-obfuscate this oh-so-suspicious-looking file, and see what it does. The de-obfuscated version looks like this:
As I hope you can see, this code does no harm, but is a normal part of some software protecting its Secret Sauce Functionality.
Obfuscation for illicit purposes
Here’s another suspicious looking file.
This example looks like the previous, except that the content body varies. Let’s also de-obfuscate this to find out what it’s doing ‘behind the curtain’.
An attacker injected code, probably during a recent, undetected server compromise. They did it to execute malicious OS commands and take control of a server.
Obfuscation through legitimate appearance
I’ve gone through obfuscation and looked at both sides of the coin in the world of obfuscated files. Hackers not only rely on obfuscating files through encoding and encryption techniques that evade antivirus detection, but they also make use of valid techniques to conceal malware that can slip past the eyes of malware researchers.
Malware Case Study: PHP
Now, I want you to look at the following code segments from a recent fake ionCube malware outbreak. One is normal and benign, the other malicious.
Normal IonCube file
The ionCube fakes are similar in appearance to authentic ionCube files. Comparing these two examples, you can see the attackers have gone to great lengths to obscure the malware in a way that makes it seem like a genuine ionCube-encoded file. You may notice the differences only if you look closely, as there are minor modifications. For example, the original function name _il_exec
becomes il_exec
, and there are extra preg_replace
and fopen
functions used which are not present in the genuine file. The malware authors made sure the bad file looks like the normal one in both structure and code, to fool the eyes of even experienced malware researchers.
Fake IonCube File
Malware Case Study: JavaScript
A large number of websites use JavaScript. It’s supported by all modern web browsers, making it a playground for hackers in the malware industry. JavaScript-based malware is a little more difficult to spot than PHP-based malware, even with little or no obfuscation.
I’ll now take a look at an example of a coinhive-based crypto-miner JavaScript infection.
JavaScript-based crypto-miner injected into a legal HTML file
The JavaScript segment embedded into the legitimate HTML file will look like a legal piece of JavaScript code. But, in reality, it’s a JavaScript-based crypto-miner injected into HTML. Such pages generate revenue for hackers in the form of cryptocurrency when the script runs on the browsers of visitors to the website.
If, by casually glancing through a file, you feel you won’t be able to spot a JavaScript malware infection, you need to analyze the actual workings of the JavaScript code. You don’t need to worry if you are already a customer of Imunify360, because it has a beautiful user-friendly dashboard, armed with advanced anomaly and heuristic-based detection capabilities for such malware on the backend. If you come across an infection on your domain, this is how it will look in your dashboard.
Why all this fuss?
Yes, you may well be thinking that. Why am I going to all this fuss to explain how to distinguish good files from bad? Because I want you to understand that in the world of web-based malware, appearances can be deceptive. You can decide if a legitimate file is a bad or malicious file and vice versa, both of which can be disruptive to the smooth running of your CMS and your business.
In identifying the difference between good and bad files, the ability to decode or de-obfuscate obfuscated code, and the ability to read through the flow of the software, each play a vital role. But there are other tactics you can use to help you identify ‘good’ files.
File Integrity Checking
It is important to know if there has been any tampering of your website’s core CMS files. One way to do this is by checking the integrity of files. File integrity checking uses cryptographic hashing functions, such as MD5 and SHA1 to verify a file’s checksum against healthy versions of the same one. You can do this on all core CMS files to verify their integrity.
For example, if anything or anyone modifies a core CMS file, even by a single byte, the computed hash values change to completely different ones. This makes it much easier to spot changes in files.
As an example, I’ll show how hash values can vary even for small changes.
- The MD5 hash value for the text ‘
Hi
’ (2 characters) isC1A5298F939E87E8F962A5EDFC206918
- The MD5 hash value for the text ‘
Hi.
’ (3 characters) is7B00B6D52A564B31999EB3DE9CE0980B
Maintaining Proper Logging of Actions
Logging events that happen in the server can help you identify when bad things happen to a file. You can gather data from access and error logs, such as:
- actions or modifications performed on files;
- requests from foreign and unknown IP addresses that made changes to files;
- strange exploit codes passed through HTTP Requests.
These details help to investigate suspicious files.
Using Publicly Available Blacklist sources
You can use publicly available blacklists, such Google’s, Virustotal, the Spamhaus database, etc., to verify the legitimacy of files, or of unknown URLs appended to files.
Based on Positional Traits of Suspicious code
Most of the time, hackers append malicious code near the top or bottom of a valid segment of code. It’s rarely inserted within valid code because it may break it, defeating the purpose, which is to run undetected.
Version Control
If you use version control tools such as Git, you can compare the current versions of suspicious files with previous versions. Keeping your files within a version control system makes life easier in these sorts of situations.
Using the ‘Last Modified’ details of files
To detect suspicious files, you can also use the ‘last modified’ details of files, file permission changes, and content size changes. Running the stat
command on suspicious files is one of the ways to pull out this information. The next example shows the extensive ‘modified details’ information given in the output of stat
for a file.
Lines with Access/Uid/Gid
show the read/write/execute permissions of the file, and its user and group ownership. The last three lines are the time-stamp of the file.
Access
normally refers to when it was first created, or last written to.Modify
refers to the last time the file changed permissions, or was renamed.Change
refers to the last time the actual contents of the file were modified.
Many malicious scripts are able to keep the access
and modify
time-stamps unchanged, but the change
time-stamp will always reflect if the file was altered. If a file is modified via FTP, all three time-stamps are changed to the same date.
Conclusion
Distinguishing between good and bad or de-obfuscated files is never easy. It takes time and good investigative skills, at least for those of us without prior experience. Hackers are always doing their best to hide malicious files from webmasters or admins. That’s why we at Imunify360 have some well-trained eyes and minds working for to help you with the highest precision in these sort of situations. If you need help to protect your web servers, feel free to ping us. We’ll be more than happy to help.
Imunify360 is a comprehensive six-layers web server security with feature management. Antivirus firewall, WAF, PHP, Security Layer, Patch Management, Domain Reputation with easy UI and advanced automation. Try free to make your websites and server secure now.