==Research Phase===
1. Identify the SID(s)
A submitter reported 2044171 was causing False Positive alerts. From the ruleset, Proofpoint Emerging Threats Rules, grep for the SID and keep it in your notes.
alert http $HOME_NET any -> $EXTERNAL_NET any (msg:"ET MALWARE NewsPenguin CnC Checkin"; flow:established,to_server; http.method; content:"GET"; http.uri; content:"/search:"; startswith; fast_pattern; pcre:"/[a-z0-9]{12}/Ri"; reference:url,blogs.blackberry.com/en/2023/02/newspenguin-a-previously-unknown-threat-actor-targets-pakistan-with-advanced-espionage-tool; classtype:trojan-activity; sid:2044171; rev:1; metadata:affected_product Windows_XP_Vista_7_8_10_Server_32_64_Bit, attack_target Client_Endpoint, created_at 2023_02_10, deployment Perimeter, former_category MALWARE, signature_severity Major, updated_at 2023_02_10;)
2. Understand Intent Behind Rule(s)
It’s easy to get lost in the False Positive report before understanding the reported rules themselves. Look away from the report and focus on breaking down SID:2044171!
For starters, let’s break it down into its Suricata rule format parts (6.1. Rules Format — Suricata 6.0.0 documentation)
Rule Format Section |
Definition |
Action |
determines what happens when the signature matches |
Header |
defining the protocol, IP addresses, ports and direction of the rule. |
Rule Options |
defining the specifics of the rule |
Remember, we want to understand what the rule means so that we understand its False Negative and/or False Positive potential.
Rule Format Section
Action
This determines what happens when the signature matches.
In SID:2044171, the action == alert
and it indicates an alert is generated whenever content exists in traffic.
Header
This defines the protocol, IP addresses, ports and direction of the rule.
In SID:2044171, the header == http $HOME_NET any -> $EXTERNAL_NET any
.
This says Suricata rule will look at HTTP traffic that occurs from a defined internal source (from any port) to an external destination (from any port).
Rule Options
This defines the specifics of the rule.
In SID:2044171, the rule specifics == (msg:"ET MALWARE NewsPenguin CnC Checkin"; flow:established,to_server; http.method; content:"GET"; http.uri; content:"/search:"; startswith; fast_pattern; pcre:"/[a-z0-9]{12}/Ri"; reference:url,blogs.blackberry.com/en/2023/02/newspenguin-a-previously-unknown-threat-actor-targets-pakistan-with-advanced-espionage-tool; classtype:trojan-activity; sid:2044171; rev:1; metadata:affected_product Windows_XP_Vista_7_8_10_Server_32_64_Bit, attack_target Client_Endpoint, created_at 2023_02_10, deployment Perimeter, former_category MALWARE, signature_severity Major, updated_at 2023_02_10;)
Whoa! That’s not easy on the eyes.
I’m going to break down the rule options and organize them in the order that best expresses the rule’s intent.
More on Rule Options
Let’s break down the rule options.
User Friendly Message
msg:"ET MALWARE NewsPenguin CnC Checkin";
Asides from the SID, the msg provides a human friendly label to the rule. We understand that we are looking at NewsPenguin malware activity.
More User Friendly Metadata
reference:url,blogs.blackberry.com/en/2023/02/newspenguin-a-previously-unknown-threat-actor-targets-pakistan-with-advanced-espionage-tool;
classtype:trojan-activity;
sid:2044171;
rev:1;
metadata:affected_product Windows_XP_Vista_7_8_10_Server_32_64_Bit, attack_target Client_Endpoint, created_at 2023_02_10, deployment Perimeter, former_category MALWARE, signature_severity Major, updated_at 2023_02_10;)
Fields like reference, classtype, sid, rev, and metadata help us describe the rule. We want to pay attention to the reference. References best describe the rule’s intent! The rest of the rule options attempt to best articulate it in Suricata’s rule language…which is not always perfect.
In the blog, note that “NewsPenguin then connects to a hardcoded server – updates[.]win32[.]live:443/search:<unique_identifier>
where <unique_identifier> is 12 characters long. And so, the rule’s intent is to match on…
- HTTP GET requests
- An URI that contains “/search:” is followed by <unique_identifier>” IMMEDIATELY
- <unique_identifier> is 12 characters long and alphanumeric
After reading the msg and the blog references, we should have a good understanding about why this rule was created – the rule’s intent. Now, let’s review how this research was practically applied / implemented using Suricata.
Direction Keyword
flow:established,to_server;
This keyword is used to specify the flow of the connection. Flow is used to help Suricata (and Snort!) understand if the client or server is responsible for the connection. See: Snort Blog: Flow matters.
HTTP Keywords
http.method; content:"GET";
This sticky buffer helps look at HTTP GET Requests only.
http.uri; content:"/search:"; startswith; fast_pattern; pcre:"/[a-z0-9]{12}/Ri";
This sticky buffer helps parse the normalized URI buffer (and not the raw URI buffer. This nuance is important for knowing which Suricata PCRE modifiers you’re allowed to use.)
In the normalized URI buffer, there exists /search:
in the URI, but strictly at the beginning. This content is considered the fast_pattern.
Spoiler: This restriction at the beginning of the URI was never specified by the blog! This seems like a rule writer’s interpretation of the malware’s techniques. If the malware typically deploys this URI at the top level directory, then this rule alerts. However, if the malware sometimes deploys this URI after several directories, then this rule wouldn’t detect it. Missing samples due to restrictions is an example of False Negative misses.
Then, there exists a PCRE which looks for 12 occurrences of alphanumeric characters.
This PCRE has modifiers (or flags). More on this topic, [Suricata PCRE Modifiers]. From this Suricata documentation, we get the definition of the flags used:
- /R: Match relative to the last pattern match. It is similar to distance:0;
- /i: pcre is case insensitive
To summarize, if the normalized URI buffer matches on /search:
and matches on the PCRE, then the rule is satisfied. The PCRE can occur…
- immediately afterwards like
/search:12341234abcd
- or, with unexpected characters before the PCRE like
/search:/blah/blah/12341234abcd1234
.
Spoiler: This rule will cause False Positives. We must force the PCRE to occur IMMEDIATELY after the /search:
match. We need to add this to prevent False Positives. Also, the PCRE does not strictly enforce the unique_identifers length to be 12. It accepts any 12 alphanumeric characters and doesn’t care if MORE exists.
Big Takeaway:
This rule using a blog’s research which discussed CnC activity. After reviewing the rule inspired from the blog, we see there are gaps which could cause False Negatives (misses) and False Positives (alerts on unintended matches).