Handling False Positive Reports as A Rule Writer! Special Guests: PCREs, Dalton, Dalton’s Flowsynth

Hi All,

This post reviews how you may want to troubleshoot False Positive alerts from a Suricata Rule. If you already have an established troubleshooting flow, then I suggest you stick around for the later discussion about Suricata PCRE content anchoring and PCRE modifiers.

At the end of this blog post, you should leave knowing how to use Dalton while troubleshooting False Positive alerts. Also, you leave with the awareness of Emerging Threat’s MUST-DOs regarding Suricata PCREs:

1. Add static content related to the previous match. In Emerging Threats, we frequently ask how our PCREs are “statically anchored” to previous content.
2. Use PCRE anchors when possible (^ or $) to anchor PCRE from beginning or end of a buffer.
3. Sparingly use capture groups, especially if it’s not needed.

Table of Contents

  1. About False Positives, True Positives, and PCREs
  2. False Positive Troubleshooting Process
    A ==Research Phase===
    1.1. Identify the SID(s)
    1.2. Understand Intent Behind Rule(s)
    1.2.1 Rule Format Section
    1.2.1.1 Action
    1.2.1.2 Header
    1.2.1.3 Rule Options
    B ==Decision Phase==
    1.1. Validate False Positive Claims
    C ==Tune and Tweaking Phase==
    1.1. Get PCAPs for Testing
    1.2. Update and Test Rule against Dalton (repeatedly!)
    1.2.1. Run 1: Test original 2044171 against fp.pcap, tp1.pcap, tp2.pcap.
    1.2.2. Run 2: Test updated 2044171 where /search: can appear anywhere in the URI, not just in the beginning.
    1.2.3. Run 3: Test updated 2044171 where <unique_identifier> occurs immediately after PCRE
    PSA about PCRE Anchoring
    1.2.4. Run 4: Review Applicable PCRE Modifiers Review

Shouts and Greetz

Everything I share here was acquired by standing alongside senior analysts in the field – from spam detection to network traffic analysis. Their mentorship really helped shaped my own gut for testing and validating results. Thank you everyone, always! :hotdog:

Also, thank you to @samjenk for submitting the False Positive report that inspired this thread!

3 Likes

About False Positives, True Positives, and PCREs

What are False Positives and True Positives?

Given a Suricata rule, the rule should match on traffic that reflects the rule’s intent and then alert. Whenever a rule matches and alerts accordingly, then it is considered a True Positive alert.

However, there are times when the rule alerts on unintended traffic. This is called a False Positive alert.

Consider this simplified, exaggerated example. Within your organization, your users are receiving Google Form URLs which are phishing for PII. You want to create a rule with the intent to identify this phishing activity.

A not-so clever idea pops up – what if you created a PHISHING rule that alerts whenever your network sees a POST request to a Google Form URL!

Will this rule alert on the POST requests to the phishing Google Form URLs? Yes.

However, this rule would also alert on POST requests to the benign Google Form URLs. (oh no! :slightly_frowning_face: ) These was not your rule’s intent and so these occurrences count as False Positives,

Psst – What are False Negatives?
False Negatives occur when the rule does not detect traffic when it definitely should have.

What are PCREs?

PCRE stands for Perl Compatible Regular Expressions. These expressions are used to match on patterns. Suricata also has modifiers to assist with PCRE matching.
https://docs.suricata.io/en/suricata-6.0.0/rules/payload-keywords.html?highlight=regular%20expressions#pcre-perl-compatible-regular-expressions

2 Likes

Handling False Positive Reports as A Rule Writer

Disclaimer: There are many ways in which rule writers handle False Positive reports. This is post summarizes my process. If you would like to share your own approach, please do consider sharing it in the Discourse!

After receiving a False Positive report from a submitter, I would do the following phases…

==Research Phase===

  1. Identify SID(s) of Rule’s within the Report

  2. Understand Intent Behind Rule(s)

==Decision Phase==

  1. Validate False Positive Claims. Does the reported False Positive activity align or not align with the rule’s intent?

If the activity does not align with the rule’s intent – then you do have a False Positive case to fix.

==Tune and Tweaking Phase==

  1. Get PCAPs for Testing.

  2. Tune and Test Rule against Dalton (repeatedly!)

2 Likes

==Research Phase===

1. Identify the SID(s)

A submitter reported 2044171 was causing False Positive alerts. From the ruleset, Proofpoint Emerging Threats Rules, grep for the SID and keep it in your notes.

alert http $HOME_NET any -> $EXTERNAL_NET any (msg:"ET MALWARE NewsPenguin CnC Checkin"; flow:established,to_server; http.method; content:"GET"; http.uri; content:"/search:"; startswith; fast_pattern; pcre:"/[a-z0-9]{12}/Ri"; reference:url,blogs.blackberry.com/en/2023/02/newspenguin-a-previously-unknown-threat-actor-targets-pakistan-with-advanced-espionage-tool; classtype:trojan-activity; sid:2044171; rev:1; metadata:affected_product Windows_XP_Vista_7_8_10_Server_32_64_Bit, attack_target Client_Endpoint, created_at 2023_02_10, deployment Perimeter, former_category MALWARE, signature_severity Major, updated_at 2023_02_10;)

2. Understand Intent Behind Rule(s)

It’s easy to get lost in the False Positive report before understanding the reported rules themselves. Look away from the report and focus on breaking down SID:2044171!

For starters, let’s break it down into its Suricata rule format parts (6.1. Rules Format — Suricata 6.0.0 documentation)

Rule Format Section Definition
Action determines what happens when the signature matches
Header defining the protocol, IP addresses, ports and direction of the rule.
Rule Options defining the specifics of the rule

Remember, we want to understand what the rule means so that we understand its False Negative and/or False Positive potential.

Rule Format Section

Action

This determines what happens when the signature matches.

In SID:2044171, the action == alert and it indicates an alert is generated whenever content exists in traffic.

Header

This defines the protocol, IP addresses, ports and direction of the rule.

In SID:2044171, the header == http $HOME_NET any -> $EXTERNAL_NET any.

This says Suricata rule will look at HTTP traffic that occurs from a defined internal source (from any port) to an external destination (from any port).

Rule Options

This defines the specifics of the rule.

In SID:2044171, the rule specifics == (msg:"ET MALWARE NewsPenguin CnC Checkin"; flow:established,to_server; http.method; content:"GET"; http.uri; content:"/search:"; startswith; fast_pattern; pcre:"/[a-z0-9]{12}/Ri"; reference:url,blogs.blackberry.com/en/2023/02/newspenguin-a-previously-unknown-threat-actor-targets-pakistan-with-advanced-espionage-tool; classtype:trojan-activity; sid:2044171; rev:1; metadata:affected_product Windows_XP_Vista_7_8_10_Server_32_64_Bit, attack_target Client_Endpoint, created_at 2023_02_10, deployment Perimeter, former_category MALWARE, signature_severity Major, updated_at 2023_02_10;)

Whoa! That’s not easy on the eyes.

I’m going to break down the rule options and organize them in the order that best expresses the rule’s intent.

More on Rule Options

Let’s break down the rule options.

User Friendly Message

msg:"ET MALWARE NewsPenguin CnC Checkin";

Asides from the SID, the msg provides a human friendly label to the rule. We understand that we are looking at NewsPenguin malware activity.

More User Friendly Metadata

reference:url,blogs.blackberry.com/en/2023/02/newspenguin-a-previously-unknown-threat-actor-targets-pakistan-with-advanced-espionage-tool;
classtype:trojan-activity;
sid:2044171;
rev:1;
metadata:affected_product Windows_XP_Vista_7_8_10_Server_32_64_Bit, attack_target Client_Endpoint, created_at 2023_02_10, deployment Perimeter, former_category MALWARE, signature_severity Major, updated_at 2023_02_10;)

Fields like reference, classtype, sid, rev, and metadata help us describe the rule. We want to pay attention to the reference. References best describe the rule’s intent! The rest of the rule options attempt to best articulate it in Suricata’s rule language…which is not always perfect.

In the blog, note that “NewsPenguin then connects to a hardcoded server – updates[.]win32[.]live:443/search:<unique_identifier> where <unique_identifier> is 12 characters long. And so, the rule’s intent is to match on…

  • HTTP GET requests
  • An URI that contains “/search:” is followed by <unique_identifier>” IMMEDIATELY
  • <unique_identifier> is 12 characters long and alphanumeric

After reading the msg and the blog references, we should have a good understanding about why this rule was created – the rule’s intent. Now, let’s review how this research was practically applied / implemented using Suricata.

Direction Keyword

flow:established,to_server;

This keyword is used to specify the flow of the connection. Flow is used to help Suricata (and Snort!) understand if the client or server is responsible for the connection. See: Snort Blog: Flow matters.

HTTP Keywords

http.method; content:"GET";

This sticky buffer helps look at HTTP GET Requests only.

http.uri; content:"/search:"; startswith; fast_pattern; pcre:"/[a-z0-9]{12}/Ri";

This sticky buffer helps parse the normalized URI buffer (and not the raw URI buffer. This nuance is important for knowing which Suricata PCRE modifiers you’re allowed to use.)

In the normalized URI buffer, there exists /search: in the URI, but strictly at the beginning. This content is considered the fast_pattern.

Spoiler: This restriction at the beginning of the URI was never specified by the blog! This seems like a rule writer’s interpretation of the malware’s techniques. If the malware typically deploys this URI at the top level directory, then this rule alerts. However, if the malware sometimes deploys this URI after several directories, then this rule wouldn’t detect it. Missing samples due to restrictions is an example of False Negative misses.

Then, there exists a PCRE which looks for 12 occurrences of alphanumeric characters.

This PCRE has modifiers (or flags). More on this topic, [Suricata PCRE Modifiers]. From this Suricata documentation, we get the definition of the flags used:

  • /R: Match relative to the last pattern match. It is similar to distance:0;
  • /i: pcre is case insensitive

To summarize, if the normalized URI buffer matches on /search: and matches on the PCRE, then the rule is satisfied. The PCRE can occur…

  • immediately afterwards like /search:12341234abcd
  • or, with unexpected characters before the PCRE like /search:/blah/blah/12341234abcd1234.

Spoiler: This rule will cause False Positives. We must force the PCRE to occur IMMEDIATELY after the /search: match. We need to add this to prevent False Positives. Also, the PCRE does not strictly enforce the unique_identifers length to be 12. It accepts any 12 alphanumeric characters and doesn’t care if MORE exists.

Big Takeaway:

This rule using a blog’s research which discussed CnC activity. After reviewing the rule inspired from the blog, we see there are gaps which could cause False Negatives (misses) and False Positives (alerts on unintended matches).

2 Likes

==Decision Phase==

3. Validate False Positive Claims

A submitter reported that …

It looks to be effectively matching on any uri that fits the pattern:

^/search:.*[a-z0-9]{12}/i

It’s alerting on HTTP GET requests to this page, which we believe is benign:

latindictionary.wikidot.com/search:site/html/2128ebb4588a73ef6eac84f0f330ca13c59a9eec-836761755241636984

I think the problem is that, after the rule matches on a GET request for a uri starting with "/search:”, it then looks for the 12-character alphanumeric anywhere after that. Reading the blog post for this threat [0], it I think the threat actor’s actual behavior was more like this:

<bad domain>/search:<12-character alphanumeric irrespective of case>

Based on our understanding of the rule, this report is valid! We should update this rule.

2 Likes

==Tune and Tweaking Phase==

4. Get PCAPs for Testing

We have confirmed that we need to fix this rule. It’s time to get PCAPs for testing.

At minimum we would need…

  • Traffic that generates False Positive alerts from our SID.
    • Let’s call this PCAP, fp.pcap when we get it.
    • Why do we want fp.pcap?This traffic helps validate the submitter’s report. Also, we will use this traffic against our tweaked rule. This tweaked rule should no longer alert on this traffic.
  • Traffic generates True Positive alerts from our SID.
    • Let’s call this PCAP, tp.pcap, when we get it.
    • Why do we want tp.pcap? As we tweak our rule to prevent False Positive alerts, we want to ensure the rule still creates True Positive alerts on malicious NewsPenguin Activity.

We have many approaches to gathering PCAPs. Here are common options…

A) Find PCAPs shared by other researchers

B) Use PCAPs generated from sandbox submissions

C) Create PCAPs with Flowsynth (GitHub - secureworks/flowsynth: a network packet capture compiler) by manually creating content

D) Capture traffic with our local Wireshare

For this post, we will use Option B to get our fp.pcap and Option C to get our tp.pcap.

Get fp.pcap

The submitter reported that our SID alerts on a HTTP GET request to URI, latindictionary.wikidot.com/search:site/html/2128ebb4588a73ef6eac84f0f330ca13c59a9eec-836761755241636984

Let’s use sandbox tools like any.run or tria.ge (Triage | Behavioral Report) to generate a PCAP. This PCAP will have the following HTTP GET request…

GET /search:site/html/2128ebb4588a73ef6eac84f0f330ca13c59a9eec-836761755241636984 HTTP/1.1
Host: latindictionary.wikidot.com
Proxy-Connection: keep-alive
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding: gzip, deflate

:tada:After you download this PCAP, I suggest name it fp.pcap (12.5 KB) :tada:

2 Likes

Creating tp1.pcap and tp2.pcap with FlowSynth

In this section, we will review how to use Dalton’s FlowSynth tool to create PCAPs with provided HTTP Requests.

Get tp1.pcap

Sometimes, open source resources have PCAPs that match a SID. For example, if you use any.run, you are able to filter public submissions by Suricata SID (see: Retrospective malware analysis - ANY.RUN Blog). Unfortunately, I did not find any existing PCAPs on open source options.

If we do not have PCAPs, sometimes we have to create them ourselves. Remember, the rule’s reference, the Blackberry’s research blog, it describes NewsPenguin traffic. There will exist a HTTP GET Request that contains…

  • A URI with /search: somewhere in the URI
  • After matching on /search:, a match on “<unique_identifier>” exists next IMMEDIATELY
  • <unique_identifier> is 12 characters long and alphanumeric
    e.g. updates[.]win32[.]live:443/search:<unique_identifier>

Let’s reuse the HTTP GET request generated above, edit it to match this NewsPenguin traffic, and generate a PCAP with Dalton’s Flowsynth! Take the previous HTTP GET request and carefully update it with the known info about NewsPenguin traffic.

Generated True Positive HTTP GET Request.

GET /search:12341234abcd HTTP/1.1
Host: example.org
Proxy-Connection: keep-alive
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding: gzip, deflate

Notes about this generated HTTP Request:

*Notice that we used Host: example.org. This is top level domain reserved for testing; RFC 2606: Reserved Top Level DNS Names. Purely my style choice when creating the PCAP.

The following screenshots review how create a PCAP using FlowSynth, GitHub - secureworks/flowsynth: a network packet capture compiler.

  1. In the FlowSynth’s Build Packet Capture, set the Network Layer values that reflect the $HOME_NET will do a GET request to the evil server.

  2. Set the Transport Layer to TCP and Destination Port = 80, which are typical HTTP settings.

  3. Set the Payload data. We paste the HTTP GET Request data and then click Generate FlowSynth.

:tada:Congrats! You now have tp1.pcap (836 Bytes) :tada:

Get tp2.pcap

The previous HTTP GET Request is an example of when /search: occurs in the beginning of the URI. Remember, we mentioned that /search: could also exist elsewhere in the URI. We should make another PCAP that reflects this and save it. Use the following data to create another PCAP.

GET /blah/blah/search:12341234abcd HTTP/1.1
Host: example.org
Proxy-Connection: keep-alive
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding: gzip, deflate

:tada:Save this second generated PCAP as tp2.pcap (802 Bytes):tada:

1 Like

5. Update and Test Rule against Dalton (repeatedly!)

Let’s recap:

  • Someone told ET that 2044171 is causing False Positive alerts.
  • We understand the rule intent behind 2044171. From our analysis, we see how its current implementation could lead to False Negatives and False Positives.
  • Going forward, we know this SID needs updates.
  • We have gather PCAPs for testing that reflect False Positive and True Positive traffic.
    • fp.pcap, this PCAP represents the SID firing on a benign URI.
    • tp1.pcap, this PCAP represents the SID firing on NewsPenguin traffic where URI contains /search: at the beginning.
    • Tp2.pcap, this PCAP represents the SID firing on NewsPenguin traffic where URI contains /search: but not at the beginning!

Now, time to start running 2044171 against our PCAPs!

Remember, our goals are to ensure 2044171 does not alert for fp.pcap AND it alerts on tp1.pcap and tp2.pcap.

Every time you do a run, consider what results you expect. If the test results meet our goals, then we can stop! If not, you must persist and keep testing. :muscle::muscle::muscle:

:ghost: Ideally you’d tweak for performance too, but that’s a topic for another time) :ghost:

1 Like

Run 1: Test original 2044171 against fp.pcap, tp1.pcap, and tp2.pcap.

Why are we doing this test?

To affirm our known expectations before we deviate into new behavior.

Expected results for this test…

  • Will 2044171 alert on fp.pcap? Yes
  • Will 2044171 alert on tp1.pcap? Yes
  • Will 2044171 alert on tp2.pcap? No

Rule for Testing
This is the grep’ed 2044171 from the ruleset.

alert http $HOME_NET any -> $EXTERNAL_NET any (msg:"ET MALWARE NewsPenguin CnC Checkin"; flow:established,to_server; http.method; content:"GET"; http.uri; content:"/search:"; startswith; fast_pattern; pcre:"/[a-z0-9]{12}/Ri"; reference:url,blogs.blackberry.com/en/2023/02/newspenguin-a-previously-unknown-threat-actor-targets-pakistan-with-advanced-espionage-tool; classtype:trojan-activity; sid:2044171; rev:1; metadata:affected_product Windows_XP_Vista_7_8_10_Server_32_64_Bit, attack_target Client_Endpoint, created_at 2023_02_10, deployment Perimeter, former_category MALWARE, signature_severity Major, updated_at 2023_02_10;)

Below are screenshots showing how you’d use Dalton to test the rule against three PCAPs for Suricata results.

A) Submit Job page.

  1. Add the PCAPs accordingly with the Browse button. Then, select “Create Separate jobs for each PCAP” so that an individual log will be generated for each PCAP. This makes troubleshooting per PCAP easier. Also, select “Use Custom Rules” to test our specific SID only.
  2. Select which Suricata version you’d like to use.
  3. Select “Use custom rules”. In the textbox, paste 2044171.
  4. At the bottom of the page, click Submit.

B) Recent Jobs page.

This redirected page lists the launch jobs from Dalton. Focus on the ones with Status:Queued because these are our PCAPs. Click the Job IDs to see results.

C) Report for Job ID

The report page has multiple tabs and we want to focus on Alerts and Debug.

  • Alerts Tab. This tab reports if the Suricata rule is alerted on the PCAP. But wait, which PCAP?
  • Debug Tab. This tab logs info like what PCAP was used to generate the report.

Test Results
From this screenshot, we see that test results matched our expectations.

Apologies! That’s a screenshot in not very readable. Here’s another screenshot, but this time of just one report:

Notice that this tells us that fp.pcap does cause our rule to alert.

2 Likes

Run 2: Test updated 2044171 where /search: can appear anywhere in the URI, not just in the beginning.

Why are we doing this test?
We want this new behavior to prevent False Negatives or misses. In the original rule, it strictly enforced /search: at the beginning. The original rule was not flexible if the Malware decided to change its URI abuse pattern.

Expected results for this test…

  • Will 2044171 alert on fp.pcap? Yes # Remember, this is not what we want! But, this change will get us alerts on tp2.pcap.
  • Will 2044171 alert on tp1.pcap? Yes
  • Will 2044171 alert on tp2.pcap? Yes

Rule for Testing

Note that I removed references and metadata so that we can visually declutter our rule.

alert http $HOME_NET any -> $EXTERNAL_NET any (msg:"ET MALWARE NewsPenguin CnC Checkin"; flow:established,to_server; http.method; content:"GET"; http.uri; content:"/search:"; nocase; fast_pattern; pcre:"/[a-z0-9]{12}/Ri"; classtype:trojan-activity; sid:2044171; rev:2;)

Notes about Rule Updates

  • This new rule drops the usage of startswith;.
  • Also, let’s force the content match to be case-insensitive by adding >> nocase;

Test Results

After testing in Dalton, the results were as expected. Great! Now, we will take this rule and transform it once again to fit the next run.

2 Likes

Run 3: Test updated 2044171 where <unique_identifier> occurs immediately after PCRE

Why are we doing this test?
We want this new behavior to prevent False Positives.

Expected results for this test…

  • Will 2044171 alert on fp.pcap? No
  • Will 2044171 alert on tp1.pcap? Yes
  • Will 2044171 alert on tp2.pcap? Yes

Notes about Rule Updates
Remember, in the original rule, it allowed for URIs like…
/search:site/html/2128ebb4588a73ef6eac84f0f330ca13c59a9eec-836761755241636984

But, the true rule’s intent was to actually match on URIs where

  1. /search: content appears next to the PCRE like
  • /search:12341234abcd
  • /blah/blah/search:12341234abcd
  1. The blog mentioned that there should be 12 characters; this PCRE should strictly have 12 characters. And so, matches like the following are NOT allowed:
  • /blah/blah/search:12341234abcd1234

About PCRE Testing

Let’s walk through updating the PCRE.

Using a tool like regex101.com, we can test our PCREs before using Dalton.

In the Test String box, you may put URIs that reflect your PCAPs.

Here are the Test Strings

# fp.pcap
/search:site/html/2128ebb4588a73ef6eac84f0f330ca13c59a9eec-836761755241636984

# tp1.pcap
/search:12341234abcd

# tp2.pcap
/blah/blah/search:12341234abcd

# Content to test if PCRE strictly enforces 12 alphanumeric matches only
/search:12341234abcdefg
/blah/blah/search:12341234abcdefg

Now, let’s begin updating the PCRE so that it behaves accordingly. (Note, I dropped the PCRE modifiers for now as we will discuss them more in the next Run).

Our initial PCRE is pcre:"/[a-z0-9]{12}/".

Add that PCRE should match static content /search: then the PCRE.

After the content match of /search: the PCRE should appear immediately >> pcre:"/\/search:[a-z0-9]{12}/"

In regex101, the PCRE does not match the fp.pcap URI. Great! Also, it matches the tp1.pcap and tp2.pcap – Double great!

Hmm, the regex matches the last two URIs though. Let’s continue updating this PCRE.

Restrict PCRE match to URIs with strictly 12 alphanumeric characters.

To ensure the URI only has only 12 alphanumeric characters after the /search: match, we can use $. This will enforce the end of the string pattern. And so, we have

pcre:"/\/search:[a-z0-9]{12}/" >> pcre:"/\/search:[a-z0-9]{12}$/"

This PCRE works! We used the PCRE to inspect the URI buffer.

…but we could do inspect the URI buffer beyond sticky buffers and PCREs. We could use Suricata Payload Keywords like specifically isdataat, 4.6. Payload Keywords — Suricata 4.1.9 documentation

In our rule, we are trying to lookahead of our /search: such that only 12 alphanumeric characters exist. We can apply the usage of isdataat which will check if there “is data at” a particular byte in the buffer.

In this case, we want to check if there is NOT data at the 13th byte after our match e.g.

isdataat:!13,relative;

We can add this to our rule alongside our updated PCRE.

Rule for Testing
alert http $HOME_NET any -> $EXTERNAL_NET any (msg:"ET MALWARE NewsPenguin CnC Checkin"; flow:established,to_server; http.method; content:"GET"; http.uri; content:"/search:"; nocase; fast_pattern; isdataat:!13,relative; pcre:"/\/search:[a-z0-9]{12}$/"; classtype:trojan-activity; sid:2044171; rev:2;)

Test Results
After testing in Dalton, the results were as expected. Great! Now, we will take this rule and transform it once again to fit the next run.

PSA about PCRE Anchoring

There’s an Emerging Threats joke that “PCREs are evil”. Why? PCREs have the potential of negatively impacting performance despite their powerful matching usage (Suricata and PCRE performance | Inliniac).

How do we fight evil PCREs? Emerging Threats recommends the following MUST-DOs:

1. Add static content related to the previous match.

- This is a number item in Emerging Threats and we frequently ask how our PCREs are “anchored” to previous content.

2. Use anchors when possible (^ or $) to anchor PCRE from beginning or end of a buffer.

3. Sparingly use capture groups, especially if it’s not needed

So, let’s reflect – is our PCRE evil?

content:"/search:"; nocase; fast_pattern; isdataat:!13,relative; pcre:"/\/search:[a-z0-9]{12}$/"

  1. Has reference to static content in previous matches.
  2. Uses end of line anchor
  3. Does not unnecessarily use capture groups

Conclusion: Not too evil!

2 Likes

Run 4: Review Applicable PCRE Modifiers Review

Why are we doing this test?
We want to consider which PCRE Modifiers can further improve the SID’s performance.

Expected results for this test…

  • Will 2044171 alert on fp.pcap? No
  • Will 2044171 alert on tp1.pcap? Yes – possibly with faster perf?
  • Will 2044171 alert on tp2.pcap? Yes – possibly with faster perf?

About PCRE Modifiers
Whoa! Before you call it done, we should further improve this PCRE’s performance with PCRE modifiers, 6.7. Payload Keywords — Suricata 6.0.0 documentation

In the original rule, the following modifiers were used…

  • /i pcre is case insensitive
  • /R Match relative to the last pattern match. It is similar to distance:0;

Can we simply re-add these and call this rule done? No as they may impact the rule’s new intent.

What happens if we include these modifiers back into the rule one by one?

  • Add /i, test rule >> Rule alerts!
  • Add /R, test rule >> Rule doesn’t alert?!

If we read the Suricata PCRE Modifier Examples, we get an idea of when /R works. /R is meant to be used as a match relative to the last pattern match.

In the Suricata docs, it provides three examples.

If you look at the second example, it uses /R successfully. Why? Because the PCRE is relative to the last pattern match e.g. /html?$/UR does not contain an index. and does not enhance index. with PCRE.

If you look at the third example, it does not use /R. This is because it DOES contain index. in the PCRE e.g. the last relative match.

And so, if we reflect on our rule which contains …

http.uri; content:"/search:"; nocase; fast_pattern; isdataat:!13,relative; pcre:"/\/search:[a-z0-9]{12}$/";

…then it makes sense that /R would not work.

Applying Modifiers
Hmm, what other modifiers could we use instead? From the docs, we see these relevant ones…

  • /i pcre is case insensitive

  • /U: Makes pcre match on the normalized uri. It matches on the uri_buffer just like uricontent and content combined with http_uri.U can be combined with /R. Note that R is relative to the previous match so both matches have to be in the HTTP-uri buffer. Read more about HTTP URI Normalization.

  • /A: A pattern has to match at the beginning of a buffer. (In pcre ^ is similar to A.)

  • /E Ignores newline characters at the end of the buffer/payload.

Let’s analyze which ones we should apply.

  • The /i modifier looks useful. It would enforce case insensitive matches.

  • The /U is redundant. Our http.uri sticky buffer already enforces that we are looking at the URI normalized buffer.

  • The /A modifier would restrict our PCRE to match only if “/search:” appears at the beginning! Remember, we want “/search:” to match anywhere in the URI.

  • The /E will not be useful because the buffer does not have a new line at the end. It might be usefult for situations like
    pcre:"/^Host\x3a\x20[^\r\n]+[\r\n]+$/Hmi"; and replace it with
    pcre:"/^Host\x3a\x20[^\r\n]+$/HEmi"

And so, we’ve confirmed that /i is the best modifier as it adds to the PCRE and is not redundant.

The big takeaways – reviewing PCRE modifiers helps you understand if you’ve optimized your PCRE already. If not, you may want to consider using the flags.

For this blog, we will not be reviewing the rule profiling numbers for this PCRE modifier addition. If you are interested in understanding rule profiling, I advise you review this topic 11.9. Rule Profiling — Suricata 7.0.2-dev documentation and apply it as you observe your Dalton results. If you would like a discussion on Rule Profile, please let us know and we will work on this topic.

Rule for Testing – Final Rule Revision!

alert http $HOME_NET any -> $EXTERNAL_NET any (msg:"ET MALWARE NewsPenguin CnC Checkin"; flow:established,to_server; http.method; content:"GET"; http.uri; content:"/search:"; nocase; fast_pattern; isdataat:!13,relative; pcre:"/\/search:[a-z0-9]{12}$/i"; classtype:trojan-activity; sid:2044171; rev:3;)

Yay! Time to update 2044171 in the ruleset.

3 Likes