Since the release of NitroEDB version 5.0, a significant impact on the performance was measured due to the introduction of "differentiated subfields". Insertion rates appeared to increase by a factor of ten, and some queries realized that much improvement and more. Whereas, previous benchmarks against Oracle and Microsoft's SQL Server showed marked advantages to using NitroEDB, and subsequent work improved upon that, the new version 5.0 introduced a new level of performance. This study is internal to NitroSecurity and intended to benchmark the performance using characteristics of network security databases for the purposes of measuring against competitors' databases in the security space.
MySQL has been found to be a good database for use by numerous network security product manufacturers. Its speed relative to other commercial enterprise databases is excellent, and its flexibility to support network security data structures is one of its salient features.
The benchmark test between NitroEDB and MySQL was designed to measure certain characteristics found commonly in the network security data space. The schema used is a subset of that commonly used by network security information systems. Areas of interest that were measured include:
Query speeds — Jump to the results!
Insertion rates — Jump to the results!
Memory used — Jump to the results!
Required storage sizes — Jump to the results!
While the main "Alert" table in the NitroSecurity products is somewhat larger and uses fourteen (14) separate indexed fields, the table used in this benchmark was defined to allow isolation of various individual parameters when compared against MySQL.
One of the most important functions required of a database by a security monitor is its ability to return queries as rapidly as possible against large data sets. This allows an analyst or automated systems processes to "drill down", respond in "real time", and/or take remediation action when required. In the absence of rapid query responses, the notion of a "real time" response or "drill down" capability is lost. A flexible database is able to use certain indexed fields in multiple ways to support a wide variety of queries. As the variety or classes of queries increase, the number of indexed fields required to return more optimal results must also increase, thus insertion rates are reduced.
Data accumulated and analyzed by network security monitors is derived from multiple devices (firewalls, IPS, IDS, etc.), and is generated at significant rates based upon network speed and traffic characteristics. Databases that support such monitors must be able to store, index, and query this data while keeping up with the constant traffic flow.
A critical relationship exists between the insertion rate of a database and query speeds across different kinds of queries. More indexed fields cause significantly slower insertion rates. The larger the number of indexed fields that can be rapidly maintained, the faster the queries will be that use those indexed fields. If a database has a slow insertion rate, fewer indexed fields can be maintained in order for the database to provide insertion rates acceptable to supported applications.
The amount of memory used to perform a task forms a balance with the speed of the queries, the size of the database, and the memory available on a system after necessary system and application memory is consumed. While important, its negative effect is felt when an excessive amount of memory is used or the required database operations are slowed down because of a memory limitation. A database that uses less memory to perform the same task, therefore, is more desirable.
The size required to store information is important, but not as significant as the other parameters measured. In some databases, however, the maximum size is a limitation and restricts the total number of records. When this occurs, multiple databases must be maintained in order to perform analysis against the complete data set. This greatly increases deployment complexity and query times.
Windows XP machine
CPU3.4GHz Intel Pentium IV - HT
Memory2GB
Drive250GB External Firewire drive @ 7200rpm
OSMicrosoft Windows XP - service pack 2
Linux machine
CPUDual Xeon 3.0 GHz - HT
Memory4GB
DriveRAID 1
OSLinux 2.6
All efforts were made to ensure that MySQL was able to optimize its insertion and query rates to their maximum. Tests were performed with each of MySQL's engines so that the engine giving maximum results for security type situations was used. The literature was then searched to validate the choices made. The following methods and options were used:
MySQL version 5.0.17. This is the latest version available as of 15 Dec. 2005.
MyISAM
Very fast B-tree disk based table with index compression
Native C with level O2 optimization in compilation
This interface is documented to be as much as twice as fast as the ODBC interface.
Time + IPSource + IPDest + EventType
All combinations of these fields were tried. MySQL "EXPLAIN" chose this one.
100 rows were inserted at a time
Various numbers from 1 to 100 were tried. This gave considerable improvements, but improvements began to give diminishing returns at about 100.
This was recommended by those optimizing MySQL queries. The default is 26MB.
Did not use the DELAYED keyword on insertion because it was found to reduce the insertion rate.
NitroEDB provides multiple options that support network security traffic, many of which were used in this benchmark. These include:
NitroEDB version 5.2.
This provides the ability to use a compound index for multiple purposes and still provide rapid insertions characteristic of a regularly increasing field (i.e. time).
This table was defined to become circular at 2 billion records, so the actual circular feature was not encountered except for the support of the inferred primary index.
The data and index files were defined to cache information up to 400MB total.
This field type was used for the IP addresses. It is convenient for use in network traffic and otherwise is stored and acts like a 4 byte integer field.
In a typical security database, there are a few tables (usually only one or two) that store the largest amount of data and are the focus of analysis queries. Other tables are joined to this primary table for purposes of further describing the output or retrieving auxiliary information. These larger tables become the "bottleneck" of the collection and analysis functions. This test, therefore, used a table named "Event" to represent events generated from security devices. The table's construct is as follows:
CREATE TABLE Event (
IDINT UNSIGNED AUTO_INCREMENT,
IPSourceINT UNSIGNED,
IPDestINT UNSIGNED,
TimeDATETIME,
EventTypeENUM ('Firewall_Drop','Firewall_Reject','IDS_Alert','IPS_Alert','IPS_Drop'),
DeviceIDTINYINT UNSIGNED,
ProtocolENUM ('','ICMP','IGMP','SMTP','TCP','UDP'),
PRIMARY KEY (ID),
INDEX TimeIPSourceIPDestEventType (Time,IPSource,IPDest,EventType));
Note that there are only two indexed fields. In the normal case, there would be more to support the various types of queries. A field is indexed based upon a balance between its affect on the insertion rates and the desire to perform analysis queries as rapidly as possible. The index shown in the schema, TimeIPSourceIPDestEventType, was chosen because it was indicated to be the most optimal index for the benchmark queries chosen by both NitroEDB and MySQL using their respective schema/query analysis tools to support Query 1 and Query 3 used in this benchmark. It is also the index of choice for NitroEDB for Query 2.
Another indexed field was selected as the optimal index for MySQL against Query 2. A second schema was used with this index for MySQL for the benchmark. The second schema used for MySQL was:
CREATE TABLE Event (
IDINT UNSIGNED AUTO_INCREMENT,
IPSourceINT UNSIGNED,
IPDestINT UNSIGNED,
TimeDATETIME,
EventTypeENUM ('Firewall_Drop','Firewall_Reject','IDS_Alert','IPS_Alert','IPS_Drop'),
DeviceIDTINYINT UNSIGNED,
ProtocolENUM ('','ICMP','IGMP','SMTP','TCP','UDP'),
PRIMARY KEY (ID),
INDEX IPSourceIPDestEventTypeTime (IPSource,IPDest,EventType,Time));
While there are numerous queries required of a network security database, there is a rather significant subset that is required in analyses and which become a bottleneck because of the time normally required to execute them. These queries normally have some variant of the SQL "GROUP BY" clause and the SQL set functions (e.g. COUNT). Three typical questions in this group are:
SELECT COUNT(*),IPSource FROM Event
WHERE Time >= '12/01/2005 00:00:00' AND
Time < '12/31/2005 00:00:00'
GROUP BY IPSource
ORDER BY 1 DESC
SELECT COUNT(*),IPDest FROM Event
WHERE Time >= '12/01/2005 00:00:00' AND
Time < '12/31/2005 00:00:00' AND
IPSource = '83.97.100.232' AND
Event.Type = 'IPS_Drop'
GROUP BY IPDest
ORDER BY 1 DESC
For MySQL, this would be 5 different queries, one for each day, as follows:
SELECT COUNT(*) FROM Event
WHERE Event.Time >= '12/01/2005 00:00:00' AND
Event.Time < '12/02/2005 00:00:00'
SELECT COUNT(*) FROM Event
WHERE Event.Time >= '12/02/2005 00:00:00' AND
Event.Time < '12/03/2005 00:00:00'
SELECT COUNT(*) FROM Event
WHERE Event.Time >= '12/03/2005 00:00:00' AND
Event.Time < '12/04/2005 00:00:00'
SELECT COUNT(*) FROM Event
WHERE Event.Time >= '12/04/2005 00:00:00' AND
Event.Time < '12/05/2005 00:00:00'
SELECT COUNT(*) FROM Event
WHERE Event.Time >= '12/05/2005 00:00:00' AND
Event.Time < '12/06/2005 00:00:00'
For NitroEDB, this would be one query as follows:
SELECT Event.Time, COUNT(*) FROM Event
WHERE Event.Time >= '12/01/2005 00:00:00' AND
Event.Time < '12/06/2005 00:00:00'
GROUP BY Event.Time[DAYS]
ORDER BY Event.Time
Multiple database sizes at 13 million records (13M) and 100 million records (100M) were used in the Windows and Linux operating systems to give a spectrum of conditions.
Each query was executed multiple times to get the optimal results. A special schema was created for the column, MySQL2, which was designed to give optimal results for query 2. Since the MySQL insertion rate was so slow for this later schema, only 13 million records were initially generated with this schema for a benchmark to determine query speeds against smaller data sets. The complete test against 100 million records was also completed for MySQL2. Time is given in seconds. (Smaller is better)
|
Query |
DB Size |
Op Sys |
MySQL1 (sec) |
MySQL2 (sec) |
NitroEDB (sec) |
Ratio (MySQL:Nitro) |
|
1. Top IPSource
|
100M |
Win XP |
1,622.23 (27:02.23) |
|
0.08 |
20,278 : 1 |
|
2. Top IPDest targeted by IPSource |
100M |
Win XP |
1,643.66 (27:23.66) |
|
0.03 |
54,788 : 1 |
|
3. Event activity for past 5 days |
100M |
Win XP |
902.4 (15:02.4) |
|
0.01 |
90,240 : 1 |
|
1. Top IPSource
|
13M |
Win XP |
41.33 |
24.16
|
0.08 |
517 : 161 : 1 |
|
2. Top IPDest targeted by IPSource |
13M |
Win XP |
23.84 |
0.01 |
0.01 |
2,384 : 1 : 1 |
|
3. Event activity for past 5 days |
13M |
Win XP |
27.39 |
48.0 |
0.01 |
2,739 : 4,800 : 1 |
|
1. Top IPSource
|
100M |
Linux |
627.0 |
* |
0.10 |
6,270 : ? : 1 |
|
2. Top IPDest targeted by IPSource |
100M |
Linux |
641.11 |
0.34 |
0.03 |
21,370 : 11 : 1 |
|
3. Event activity for past 5 days |
100M |
Linux |
415.15 |
|
0.01 |
41,505 : 1 |
* Unable to complete the query. The query was attempted twice. It was stopped the first time after 24 hours - the second time after 6 hours.
These represent the average insertions rates measured in records per second over the entire load of the table. The second schema used for MySQL caused an extremely slow insertion rate that became increasingly slower as the table grew in size. (Larger insertion rate is better)
|
DB Size |
Op Sys |
MySQL1 (rec/sec) |
MySQL2 (rec/sec) |
NitroEDB |
Ratio (MySQL:Nitro) |
|
100M |
Win XP |
13,175 (2 hr 6 min) |
|
57,012 (29 min 4 sec) |
4.2 : 1 |
|
13M |
Win XP |
13,948 (15 min 32 sec) |
258 (14 hr 33 min) |
67,357 (3 min 13 sec) |
4.8 : 261 : 1 |
|
100M |
Linux |
14,854 (1 hr 52 min) |
388 (71 hr 35 min) |
83,402 (19 min 20 sec) |
5.6 : 147 : 1 |
The amount of memory used is somewhat irrelevant in this test other than to ensure that neither database used excessive amounts. An "index_buffer_size" of 500MB was specified for MySQL to get maximum performance for query responses and other operations as recommended in optimization recommendations for MySQL.
|
DB Size |
Op Sys |
MySQL |
NitroEDB |
|
100M |
Win XP |
502 MB |
438 MB |
This measurement represents the storage size of the physical files in bytes used by each database to store its information. Storage sizes on the two operating systems are similar, so only Windows XP is shown. The special primary index used by NitroEDB requires significantly less storage than other database fields of the same type. Because of this, the storage size ratio between the two databases is expected to become closer as more regularly indexed fields are added to the schema of both databases. (Smaller is better)
|
DB Size |
Op Sys |
MySQL1 |
MySQL2 |
NitroEDB |
Ratio (MySQL:Nitro) |
|
100M |
Win XP |
7,868,000,000 |
|
4,142,482,572 |
1.85 : 1.00 |
|
13M |
Win XP |
|
962,474,848 |
539,522,476 |
1.78 : 1.00 |
The performance advantage of NitroEDB over MySQL in the tested area appeared greater than expected. As long as MySQL was defined with either monotonically increasing (i.e. automatically incremented identify fields) or regularly increasing (i.e. fields beginning with "Time/Date"), the insertion rates were good but still approvimately 1/5 as fast as NitroEDB. Of course, in order to provide sufficiently responsive queries for the wide variety of cases required in security work, this will not be the only type of index used. When other than regularly increasing indexed fields are used (as with the second MySQL schema), insertion rates slowed down considerably. This reduction was observed as soon as the allocated index buffer for MySQL was exceeded. Thus, at 13 million records, with only an increasing and a random indexed fields, insertion rates had slowed down to 192 records/second under Windows XP and were continuing to slow even further as the population increased (i.e. started being moved to disk). Faster insertion rates for both MySQL and NitroEDB were measured on the faster Linux platform being tested. The NitroEDB differentiated subfields appear to provide significant improvements for insertion rates of what would normally be randomly indexed fields. They also allow an indexed field to support a wider variety of queries as was shown when MySQL was measured with two different schemas to optimize two different classes of queries.
The query improvements for the random index over 13 million records and 100 million records were noticeable in MySQL, however, giving excellent results for the second query at the expense of much slower insertion rates. In this case, it took almost 72 hours to load the 100 million record table. With both populations, it appears that sufficient index buffers were available to contain the required nodes of the index in memory for Query 2 giving good query responses. It is safe to predict that the difference in insertion rates and query speeds will significantly widen as the database population increases.
Differentiated subfields allows NitroEDB to provide extremely high insertion rates across all indexed fields used in the sample (and in the NitroSecurity ESS production database which currently uses 14 indexed fields) while exhibiting excellent query performance.
NitroEDB provides insertion rates at 260 or more times that of MySQL when the standard random indexed fields are used. There is reliable evidence to show that as more random indexed fields are incorporated into a schema that MySQL will grow increasingly slower compared to NitroEDB.
NitroEDB provides query rates of up to 90,000 times faster than MySQL in large data sets. In smaller data sets, where the index can be contained entirely within memory, MySQL provides much better performance, but still slower (by as much as 5,000 times slower) than most of those measured for NitroEDB.