Addressing an FP: 2016950 - ET MALWARE Possible Win32/Hupigon ip.txt with a Non-Mozilla UA

A customer recently reported some False Positives on a signature and it made for an interesting case to track down.

The report mentioned that 2016950 was trigging on benign HTTP traffic and was actually one of “Top 5” signatures generating alerts within the environment.

This reported prompted a review of the rule.

TLDR; Do you have FPs that are causing you problems? Instead of just disabling that rule, report it and Emerging Threats and can tune the rule for you! FPs can be publicly submitted here on the Discourse site or privately via https://feedback.emergingthreats.net.

Seek First to Understand, Then to Make the Tune.

In reviewing a rule there are several items considered, though the primary considered is often if the rule is still producing true positives alerts. Other considerations are specific to a malware family still being active within the malware ecosystem, the age of the rule, rate of false positives, etc.

2016950, as indicated by the “created_at” metadata, was created just over a decade ago and is currently on it’s 7th revision. It was last updated a bit over a year ago and internal tooling indicates this rule was created by none-other than ET’s OG Will Metcalf.

sid:2016950; rev:7; metadata:created_at 2013_05_31, updated_at 2022_06_27;

This context is very helpful to keep in mind as tunes are considered. Here is a 10 year old rule having undergone several revisions already. This leads us to believe that the signature has been prone to FPs for some time.

Looking at the actual detection logic it can be determined that the rule is detecting output HTTP requests to any URI ending in /ip.txt where the HTTP User-Agent does not contain “Mozilla”, with a couple negations.

http.uri; content:"/ip.txt"; nocase; endswith; fast_pattern; 
http.header; content:!"%E5%A4%A7%E4%BC%97%E7%82%B9%E8%AF%84"; 
http.user_agent; content:!"Mozilla"; 
http.host; dotprefix; content:!".malwaredomainlist.com";

Finding True Positives

In order to tune any signature, a good sample of true positives traffic is helpful. This establishes the baseline of what is possible with any tunes. Creating false negatives (FNs) is a very real outcome of tuning rules. Having true positive traffic allows any changes to the rule to be tested and ensure that FNs are not introduced.

In this case, due to the age of the rule and the high number of False Positives (even within Emerging Threat’s own dataset) finding true positive traffic proved to be difficult.

Pivot from the reference sample

Within the rule, there is a single md5 reference. When a rule includes a MD5 reference, a link to VT, or Malware Bazaar, etc it’s generally just the sample which was used to create the rule. To explain that further, the sample would have been executed in a sandbox and then the network traffic from that sandbox execution was analyzed and the rule created based on it. It is important to realize that the md5 is a “reference sample” and not considered the definitive reference. It is the intent to cover multiple unique samples with a single rule. Even though there is a md5 reference in the rule, it’s possible, and even desirable, that other samples will generate true positive traffic and be covered by the same signature.

reference:md5,4d23395fcbab1dabef9afe6af81df558;

In this case, even after attempting to sandbox the sample, 4d23395fcbab1dabef9afe6af81df558 did not produce any network traffic nor was there any traffic stored within Emerging Threats internal dataset for this sample. :frowning:

#####VirusTotal
In looking at the sample within VirusTotal, there is a unique artifact which allowed for pivoting. This sample opened, wrote to and then deleted a file called DELME.BAT

A quick pivot within VT allowed for 79 other samples which produced the same behavior, several of which were first uploaded to VT within the last year.
behaviour_files:"C:\\WINDOWS\\DELME.BAT"

After sandboxing all of the samples, several produced traffic which matched the existing rule!

md5sum first seen on VT compile date
077b4fa1df1e7daa8a92215461add816 2023-09-15 09:27:21 2008-08-03 09:11:21
a256c7a6c4f0f46219aabfbce1976980 2023-09-15 09:17:10 2008-08-03 09:11:21
ec7735fea5c1fd350dcdc28673084914 2023-09-15 09:17:10 2008-08-03 09:11:21
1aa9dd3138670d24948dd046a1dfe540 2023-09-15 09:15:51 2008-08-03 09:11:21
30bd44741c26ffc4b752112affc8849a 2019-05-01 05:35:05 1992-06-19 22:22:17
4a20b3500310b8b442c2387480f4e280 2014-04-06 09:25:21 1992-06-19 22:22:17
16436fac049aac6cf8d176b5add4de76 2013-06-12 23:10:37 2007-11-11 07:14:33
249af293e2820a66977dacbbbeb0b5d2 2012-01-21 11:10:52 2008-09-12 05:14:23
52d45e6e6b3ab80a9912549e5b0fdfd1 2012-01-20 15:58:06 2008-05-16 15:56:06
1bada85493cd78d525886978d2e795c7 2008-05-06 15:24:25 2008-05-06 13:13:42

Several other samples produced traffic which matched a different signature for detecting Hupigon.

The true positive samples, which matched the signature in question, produced traffic that only varied within the HTTP Host header.

image

Emerging Threats Internal Data

VirusTotal proved useful, though within the mass of false positives within the Emerging Threats data, there are true positives as well. Infact, the second sample reviewed ended up being a true positive of high value. 93f8587d7f977a64142b0979c534e8fd (first submitted to VT way back on 2007-03-30 09:35:13) produced the following network traffic:

image

This network traffic is different from the previous true positives samples at least four ways:

  1. the URI includes a base directory for which /ip.txt is in
  2. the HTTP Version is different (HTTP/1.0 vs HTTP/1.1
  3. The HTTP User-Agent is different (RAV1.23 vs SERVER2_03)
  4. The Cache-Control header has been replaced with a Pragma header.

As explained by the Mozilla Developer Docs the Cache-Control and Pragma header is an effect of the different HTTP versions.

As the current version of the rule alerts on both types of network traffic produced by of the Hupigon samples, these differences need to be considered when making any rule modifications.

Finding False Positives

Collecting false positives is an important step to help ensure we are “tightening” the signature in a way to avoid incorrectly alerting on them. Many false positives were located using Emerging Threats internal data. This process was generally as simple as looking at all the HTTP traffic produced and seeing if it as consistent with true positive samples.

One example of an False Positives is 09428675e4d527fbff579570522b2e53 which produced the following request for /ip.txt

While this traffic might be malicious, it certainly isn’t Hupigon. This sample actually triggered another rule, 2805265 - ETPRO ADWARE_PUP W32/Chistudi Checkin.

The Tuning Process

Taking a careful look at the traffic produced by true positives samples and comparing it to the existing rule there is at least one option to reduce false positives. The ordering and number of the HTTP headers is consistent across all samples.

  1. User-Agent
  2. Host
  3. Either Cache-Control or Pragma with a value of no-cache

using the a combination of http.header and http.header_names buffer we can enforce this logic:

http.header; content:"|3a 20|no-cache|0d 0a|"; endswith; http.header_names; content:"|0d 0a|User-Agent|0d 0a|Host|0d 0a|"; content:"|0d 0a 0d 0a|"; within:17; pcre:"/(?:Cache-Control|Pragma)\x0d\x0a\x0d\x0a$/":

Tuning Results

After testing this modification on true positives samples, the signature continued to fire, while the discovered false positives samples did not alert. This rule will now be under “Revision 8” as it continues providing good true positive alerts on recently uploaded samples.

2 Likes