Rule Performance and QA

Hi :slight_smile:
Measuring Suricata’s performance — and testing rules for optimization — is often a grueling process. Anyone who’s tried it has probably found themselves asking: “Why am I seeing 200k checks before MPM with no matches?”, “Why does Rule Profiling only report ticks?”, “Was I condemned to this mortal plane just to suffer?” Because of this, we regularly get questions about how we measure performance on the ETPro ruleset and our daily rule releases. In this post, I’ll share the internal workflows and strategies we use for performance testing. We don’t claim this to be the definitive or best method — it’s simply the approach that works for us right now. If you have suggestions for improvement or want to dig deeper into any part of our process, we’d be happy to continue the discussion here on the forum or in our Discord.

Suricata Rule Performance Overview

A core focus of the Suricata project from OISF has been performance optimization. Suricata’s module-based architecture is built to leverage hardware acceleration, taking advantage of modern NIC features like TSO, GRO, and checksum offloading to reduce CPU overhead. This efficient use of hardware resources allows Suricata to sustain high-speed data analysis under heavy traffic loads. These optimizations enable more efficient packet processing and support the high throughput required for modern 10-gigabit-plus networks.

Suricata represented a major leap forward in IDS performance compared to earlier single-threaded engines such as Snort and Zeek. The trade-off, however, is that measuring performance becomes more complex. Because Suricata is highly parallelized and hardware-sensitive, results can vary significantly: even small differences in hardware, traffic mix, loaded rules, or competing system processes can produce very different performance outcomes — sometimes even across multiple runs on the same machine.

Our Testing Environment

Our researchers can submit rules for testing on an orchestrator platform that tracks their submissions and measures performance against a variety of PCAP samples. Rules may be tested across multiple engines of both Suricata and Snort, and against multiple PCAPs in parallel. Each PCAP/engine combination is treated as a separate “job.” For example, testing a rule on Suricata 5.0 and Suricata 7.0 with three PCAPs generates six total jobs (three PCAPs on each engine). These jobs are then dispatched to dedicated worker servers for processing.

To ensure consistency, we maintain a rack of worker servers that are used exclusively for Suricata and Snort testing — no other processes are ever run on these machines. Each worker is configured with identical hardware. When a worker receives jobs, it executes each one inside a fresh Docker container running the specified engine. The container spins up Suricata or Snort in a clean environment, processes the assigned PCAP, and dumps the resulting logs. Each container is pinned to dedicated CPU cores, minimizing cross-job interference.

For further consistency, Suricata containers are run in “single” run mode, which sacrifices some raw speed but produces more reliable and repeatable results. All containers are based on Ubuntu images (18, 20, or 22, depending on engine requirements) and are configured with Hyperscan as the pattern matcher.

Measurement

Our current strategy for measuring performance is intentionally simple. We rely on Suricata’s built-in rule profiling report, and then assign each rule in the report a score according to the following standard:

# Total ticks
over 1,000,000 - 1
over 2,000,000 - 2
over 4,000,000 - 3
over 10,000,000 - 4
over 100,000,000 - 6

# Total checks
over 100 - 1
over 1000 - 2
over 2000 - 3
over 3000 - 4
over 10000 - 6
over 100000 - 9

The scores from total ticks and total checks are added for the rule’s total score. So a rule with 3 million ticks and 8 thousand checks would have a total score of 6. These numbers have been standardized based on the typical results we observe in our environments and would be different with alternative hardware.

Our QA Process

Before a rule goes out in our team’s daily release, it must pass three quality assurance checks.

1 - Static Validation:
Each rule is first run through a series of static checks to confirm that it meets basic syntax requirements and conforms to the Emerging Threats style guide. These checks enforce our in-house standards for formatting and consistency across the ruleset.

2 - Rule Performance Testing:
Next, rules are tested against the IDS engines we actively support — Snort 2.9.13+, Suricata 5.0, and Suricata 7.0.3 — using a curated set of 16 PCAPs drawn from both internal and external sources. These PCAPs cover a spectrum from benign internet traffic (reflecting common patterns in North America and Europe) to highly malicious traffic captured from Proofpoint’s internal sandbox environments.

Researchers review the alerts and performance scores generated by each rule across these PCAPs. Generally, a score above 3 triggers a rewrite, or the rule may still be released but with its behavior explicitly documented in the “Performance” metadata tag. Sometimes, if a rule’s performance is exceptionally high, we will ship it “disabled” (commented out). Rules that fire on benign traffic are always rewritten before release.

3 - Dynamic Validation:
Finally, rules are loaded into the supported engines in their default states, without any PCAP input, as a sanity check. If a rule produces errors, warnings, or unexpected performance results in this stage, it is rewritten.

Suricata 7.0.10
Suricata 7.0.9
Suricata 7.0.8
Suricata 7.0.7
Suricata 7.0.6
Suricata 7.0.5
Suricata 7.0.3
Suricata 7.0.0
Suricata 6.0.20
Suricata 6.0.19
Suricata 6.0.16
Suricata 6.0.15
Suricata 6.0.13
Suricata 6.0.8
Suricata 6.0.7
Suricata 6.0.5
Suricata 6.0.0
Suricata 5.0.10
Suricata 5.0.9
Suricata 5.0.3
Suricata 5.0.0
Snort 2.9.20
Snort 2.9.19
Snort 2.9.18.1
Snort 2.9.17.1
Snort 2.9.17
Snort 2.9.16.1
Snort 2.9.15.1
Snort 2.9.14.1
Snort 2.9.13

If you have any questions or suggestions surrounding our strategy for testing rule performance on ET, we’re always happy to discuss here on this forum or in our team’s Discord.
- Will

1 Like