Content Aware SIEM™ (CA-SIEM) represents a new generation of Security Information and Event Management (SIEM) capabilities that extend the value and benefits of SIEM by providing visibility into the contents of applications, documents and protocols. Without content awareness, SIEM is only able to act upon the surface details provided by logs. This limits the effectiveness of key SIEM functionalities—including threat detection, incident response, and compliance reporting—because the data being used for analysis lacks sufficient context to make informed, relevant decisions.
As a result, SIEM systems have started to evolve: context information from add-on systems such as Identity Management, Vulnerability Assessment, Configuration Management systems, and others has been used to enhance the security events collected and correlated by the SIEM. While these systems provide a great deal of value to SIEM, the events themselves are still myopic, limited to the summary data provided by the source log files.
Consider the following example:
An email being sent by an admin user to an outside address, on the surface, does not represent a threat. However, if the email contains sensitive company information, either within the body of the email or in an attachment, the activity could indicate insider theft.
However, by extending visibility into the actual payload of applications and protocols on the network, this generates a massive increase in event load translating directly to a performance impact on current SIEMs. Query responses from most SQL-based data stores begin to slow rapidly after just a few million rows of stored data, and content information can quickly generate billions of rows. Even with highly optimized databases or flat-file data stores, current systems lack the scale and performance to deliver the real-time results needed to be valuable as rapid response operations systems.
Luckily, while content awareness represents architectural challenges to most SIEM platforms, it is a technology that is available today. NitroSecurity's NitroView Enterprise Security Manager (ESM) — which was built from the start to handle massive volumes of diverse data, logs and content — is the first commercially available Content Aware SIEM.
Security information and event management (SIEM) is sometimes defined as a set of technologies for log data collection, aggregation, normalization, retention, analysis and workflow. The analysis of security information includes the collection of "context data" — additional details and meta data that are relevant for understanding the impact of collected log entries, such as DNS names or geo-location information, but not defined within the original log — and the correlation, visualization and reporting of the newly contextualized information. In addition, SIEM products support related workflow either directly or through the integration of external case management systems, so that this newly contextualized information can be used directly to support the various operations of the Security Operations Center (SOC) or Computer Incident Response Team (CIRT). Similarly, Gartner defined SIEM as "[technology] used to analyze security event data in real time for internal and external threat management, and to collect, store, analyze and report on log data for regulatory compliance and forensics" (source: SIEM research note 2009)
Companies use SIEM tools as enabling technology for both Security Operations Centers (SOCs), where the SIEM is a key component of security monitoring functions, and also for compliance reporting in support of various regulatory requirements, including NERC, HIPAA, PCI, and the Sarbanes-Oxley Act (SOX).
SIEM solutions originated from simple event management and aggregation tools, evolving with industry trends until they became advanced security centralization solutions that help organizations optimize various aspects of security management, including incident, risk, policy and vulnerability management. As SIEM evolved, it also expanded to provide greater visibility into new areas of the enterprise, by expanding the scope of data that is fed into the SIEM: from log sources such IDS, IPS, hosts and firewalls to new information sources designed to provide additional context to an event — vulnerability assessment information, and identity information, for example.
As security and compliance requirements grow even more demanding, however, the effectiveness of the SIEM is under pressure. Where SEM added additional information context and historical analysis to event management to become SIEM, the need for visibility into actual application usage is driving SIEM towards a further evolution, towards a new SIEM that is both context and content aware. The new Content Aware SIEM (CA-SIEM) combines context and content together with log and event data to provide the next generation of threat detection, incident response, remediation, risk management and compliance services (see Appendices for examples and use cases).
As we mentioned before, "context" in relation to security management is a matter of data enrichment. Data from various log, event and flow sources is fed into SIEM platforms in order to convert raw data into actionable information about threats, attacks, malware issues and other security relevant information. Such information is then used to make decisions - by SIEM operators and their managers as well as by the SIEM itself - to investigate, block, reconfigure, etc.
One of the ways SIEM performs such enrichment is by adding "context information." Context information is the additional information required to make the limited details available within an event or log more meaningful. Context information does not come from the logs themselves, but originates in the surrounding IT environment, in other information systems inside or outside the organization.
Context Information can be thought of using the analogy of an online retailer: if the book title, price, and ISBN make up the "log record" then where it was purchased, store address, phone number or author name are all parts of the book's context. Of course, the book's contents would be an example of Content Information, which is discussed later.
One of the simplest example of Context Information is name and, where possible, full identity resolution: DNS names, Windows Host names and/or network username details are added to the logs. While the log file may have already provided IP addresses, the added context of a human-readable name makes the log more meaningful. Normally, DNS names are not present in logs, but have to be obtained by queries to a DNS server. The SIEM tool gathers Context Information from a variety sources, including:
An example of log data enrichment is given in Figure A.

Figure A: Enriching the surface data found in logs
Access to Context Information requires the SIEM to connect to external systems and then to retain the Context Information locally, so that it can be accessed quickly and used for event analysis, correlation, reporting, and other SIEM functions. Context Information is extremely useful for performing advanced forms of correlation. For example, correlating user role with access hours or matching the server under attacks with a particular department in the company are examples of useful correlations with context data. Consider the following examples:
Classic event correlation rule: if a user tries to login 10 times and fails, but then immediately succeeds to login, raise an alert "Password guessing successful."
Context-aware rule: if user belongs to a "IT group," but not to "IT support subgroup" and logs into a server in "finance" group, raise a medium severity alert "Possible unauthorized access."
The benefit of context-aware correlation is obvious in the above example, as it makes an "unauthorized login" alert more relevant to the security analyst. It also allows creation of very simple and effective correlation rules, which can be directly derived from corporate security policy.
As mentioned before, log data and context data provide the major input into a SIEM product. Log data shows what activities were occurring, succeeding, failing or being attempted. However, what if we'd like to learn more about specifics of those activities? For example, what was that downloaded file? What was that email about?
Learning these details becomes more and more important as threats become more complex and as they "move up the stack," exploiting vulnerabilities at the application and session layers as well as business logic flaws. Current SIEM products cannot answer these questions, and require other expensive security products such as Data Leakage Prevention (DLP) and Database Activity Monitoring (DAM) to be deployed and integrated alongside the SIEM. However, as SIEM evolves to become content aware, the situation is changing, allowing the integration of key DAM and DLP features into the SIEM. Following the above book analogy:
The book's content provides the ultimate level of detail: the full text that make up the book's interior. While title, ISBN, price are part of the book "log record," and the added context of the author's other published titles might be obtained from outside sources to add relevance, what is the book about? Does it contain profanity? Does it mention a particular historical figure, or a certain character? How many chapters, words and characters does the book contain? How many times does the word 'engineering' appear in the text, and of those, how many occur immediately following the word 'social'? Only the analysis of the book's content will answer these questions.
This analogy also explains the value of content compared to the value of log data: while knowing the ISBN and title is very useful and gives an initial impression of the book, it is impossible to make a qualified and informed decision about the book until you read its contents. If it's a fiction novel, do you like the story? If it's a reference, does it contain the information that you need? Should you buy it? Without content awareness, you would be contradicting the age-old lesson of "judging a book by its cover."
Imagine a SIEM product having access to the actual content of the conversation. It's easy to understand the almost endless opportunities that this would provide to the security analyst. Using another analogy, consider law enforcement: tracing a phone call, or examining call records to know "who called who" provides a certain value to an investigation, but is largely circumstantial. In contrast, actually recording the contents of the phone call provides hard evidence, with a clear and incontrovertible record of the conversation, and all (if any) incriminations. Admittedly, knowing who called who is key for catching known offenders: common criminals and terrorists, for example. However, knowing what they actually spoke about allows the investigation to go the next level. The more granular collection and analysis of a conversation (the Content Information of the phone call) based upon the initial transaction (dialing a number to place a call to a certain destination) allows the initiation or escalation of incident response efforts to be more focused, efficient, and effective. Using SIEM for information security is no different: content truly matters, and facilitates all levels of threat detection and investigation (see 'Appendix B: Content Analysis in CA-SIEM'). However, adding content information to SIEM runs the risk of overwhelming the tools as well as the analyst with too much information. The SIEM has to be engineered to support massive amounts of data in a scalable manner.
So what is "content" as it relates to SIEM? For the purpose of this discussion we define content as the payload of an application, i.e., what is actually being communicated, transferred, and shared over the network. Logs describe the fact that an activity has taken place on a system or network. Content is what defines the actual nature of the activity. For example:
The SIEM typically collects data from a variety of sources to provide a base of forensic detail (for detection of events as well as for subsequent investigations), collects additional Context Information to provide other relevant assessments, such as risk and severity. However, with current SIEMs, the range of data sources collected is still relatively limited. Typical data sources include:
The following log types are rarely sent into a SIEM, even though their use is on the rise. While collection of information from these sources is considered good practice, it also represents new challenges to the SIEM.
However, despite the source of these log files being applications and databases, they seldom represent actual application activity or data access. They still represent the surface detail; the "title and ISBN" information that is available in the log.
Before Content-Aware SIEM was created by NitroSecurity, a typical SIEM product was aware of things like source IP address, destination IP address, TCP or UDP port, username, attack type, number of bytes transferred, etc. As discussed earlier, this information is insufficient — especially when considering event correlation, which is the SIEM's primary mechanism for threat detection. Correlation rules are limited to fairly simple patterns that match known attack patterns, such as a "brute force" login:
Event correlation rule: if a user tries to login 10 times and fails, but then immediately succeeds to login, raise an alert "Password guessing successful"
Or perhaps a slightly more sophisticated rule:
Event correlation rule: if an IDS alert against a system is followed by a new user being added to a system, raise an alert "System compromise"
With the addition of context, correlation rules can leverage user roles, IP address information, and asset data for correlation. For example:
Context rule: if a non-admin user accesses a system after hours, alert "Possible fraudulent activity."
Adding content awareness to SIEM creates unique advantages for the security analyst; allowing the analysis of new content and correlating that content information with logs and context information. This results in better visibility and control over the entire IT environment, not just the network. Logs allow the SIEM to see the events occurring on systems and network devices, adding content information explains the nature of these events. From simply knowing that an email was sent, we move to knowing what it was about. From knowing that FTP connection was established, we move to knowing what file was copied.
Content data can be analyzed using Content Aware SIEM. Instead of correlating, summarizing, filtering and reporting on events we can now filter email contents, correlate SQL keywords with others information and perform other analysis tasks. Examples of how content data can add value to correlation rules are shown below:
Event correlation rule: if 1,000 emails originating from within the company are sent, raise an alert "anomalous email activity."
Context rule: if 1,000 emails originating from a non-SMTP host within the company are sent to 1,000 unique addresses, raise an alert "possible spam bot," with increased severity.
Content rule: if 1,000 emails originating from a non-SMTP host within the company, with a 'reply to' address in an outside domain, are sent to 1,000 unique addresses with the words 'account' and 'password' in the body of the email, raise a critical alert "possible spam bot," with maximum severity.
Additional value is recognized when logs, context, and content information are correlated together. Such rules can be used for either security or compliance use cases, within the SOC environment or for automated security monitoring. Using content information alone will improve the value of correlation, but the real value of content correlation is achieved when individual rules are combined to create tiered or "composite" rules, which provide an even greater incident detection capability. For example:
Content rule: if outbound application contents contain sensitive terms (e.g., "quarterly report"), raise an alert "possible breach of non-disclosure."
Composite rule: if user role is "Finance" and an IM conversation topic is "quarterly report" and the IM program is not on the corporate approved list, raise a critical alert "possible fraudulent activity."
Such rules allow much more comprehensive security monitoring, compared to a legacy SIEM. Bringing this level of visibility into the Content Aware SIEM also brings us one step closer to a true "single pane of glass" CISO dashboard. Previously, the daily operations involved in security monitoring entailed watching the logs; now the views can include the actual contents of network communication and application activity, adding new tools to the arsenal of a SOC analyst, security manager and CSO.
The CA-SIEM provides improved threat detection capability, from both inside and outside sources. As networks and operating systems grow more secure, cyber threats continue to move "up the stack" of the OSI model to take advantage of application vulnerabilities. Monitoring content allows the CA-SIEM to detect these new application- and database- level attacks, and even fraud and business logic abuse. Sending application-level traffic to the CA-SIEM for correlation and analysis allows for the detection of a much broader range of attacks. Single-purpose monitoring tools — including legacy SIEM, as well as standalone DAM, DLP and Deep Packet Inspection solutions — cannot provide this breadth of visibility and correlation.
Insider attacks, in contrast, are often simple cases of authorized access abuse. While these are not complex, blended attacks with sophisticated attack vectors, they are often more difficult to detect because they involve authorized users accessing information that is mostly within the realm of expected behavior. Total visibility of the authorized user activity - from file transfers to emails to various queries to IM to social networks - enables security analysts to separate legitimate actions from dangerous mistakes and from actual insider abuse.
Finally, if malicious activity is detected the CA-SIEM provides more relevant data, which is available for live forensic analysis as well as for restoring a complete picture of the incident. SIEM log repositories have long been used for forensics analysis, but now we can retain not just the logs, but also the documents and communication records of users, which are often even more useful than logs during the investigations (see 'Appendix C: Use Cases,' for more examples).
With all of the added value that content awareness brings to security information management, why aren't all SIEMs content aware? The issue is one of both performance and complexity; as each event is examined in more detail, the volume of raw event data associated with it increases exponentially. Therefore, by fully decoding packets to expose the content information - whether from applications or databases - the volume of information that needs to be stored, retrieved, analyzed, and visualized within the user interface quickly becomes unmanageable.
Just as a book's contents are much larger than its title page, adding content information into the SIEM increases the event load by many times. This massive increase in event load translates directly to a performance impact on the SIEM: the data stores used by most SIEM products begin to slow rapidly after just a few million rows of stored data, and content information can quickly generate billions of rows. This is why most SIEM vendors are unable to add content information into their platforms. Even with highly optimized databases or flat-file data stores, these systems will quickly become unusable. The only way to support this level of event detail for sustained periods of time is to implement a highly scalable purpose-built database, which would require hundreds of man years of research and development to produce. NitroSecurity, with a history in database development, is able to use the Nitro Extreme Database (NitroEDB), which provides the scalability and performance necessary to support content information. [1]
Another challenge faced by the CA-SIEM is how to present the many types of new information to the operator, in a way that remains meaningful and useful. With only log data (the book's title and ISBN), each event can be clearly defined within the confines of a User Interface (UI). With context information added (the book's availability, reader score, and other meta- data), the UI can begin to grow complex, but the amount of information being visualized is still manageable. However, with content information (the full contents of the book), interface complexity becomes a real issue. Going beyond logs and context into document presentation, even simple query results can raise the challenges of design to a new level. Avoiding "SIEM operator overload" becomes critical for CA-SIEM success, and requires flexible control over how this detail is presented within the interface. Some methods of maintaining usability include:
While CA-SIEM represents a new generation of SIEM capabilities, these systems are available today. Information Security professionals no longer have to settle for a mix of very complex and poorly integrated solutions. For example, enterprise SIEM software getting data from another vendor's DAM solution, and providing content information from yet another vendor's DLP solution will NOT produce a content-aware SIEM; it will simply produce a SIEM with many different types of new logs. A content-aware SIEM can natively understand, integrate and correlate log, context and content data in one tool.
The original Security Event Managers (SEM) started by supported IDS logs. Bringing in other third party logs grew the SEM into a Security Information Management (SIM), which then evolved further to incorporate contextual information from other sources such as VA and IAM tools, finally becoming what we refer to today as a "Security Information and Event Management" system, or SIEM. Each evolution increased the event load placed on the system, in how fast events or logs needed to be collected, how much storage was required to support data retention over time, and how quickly the data could be analyzed and accessed, in order to produce actionable information.
With SIEM now becoming aware of content information, the strain of information management is being seen again — this time exponentially due to the depth of event detail being analyzed and retained. Not every SIEM can support these extreme data management requirements. Even second-generation SIEM platforms can barely handle the logs, much less the context around them, and there is absolutely no chance that adding content will be possible without sacrificing performance and usability. Only SIEM tools built from the start to handle massive volumes of diverse data, logs and content, can evolve to the next level, and become a true Content Aware SIEM.
Please download this whitepaper to access appendices (registration required).
Please download this whitepaper to access appendices (registration required).
Please download this whitepaper to access appendices (registration required).