"... the ability to reduce the time to true incident identification to a number that is measured in seconds, versus minutes, hours or even longer"
— Rocky DeStefano, CEO, Decurity
 

    Quick Contact

    First Name:

    Last Name:

    Company:

    Email:

    Phone:

    State:

    What can we do for you?

      


    Click here for more contact options.

  •  

 
 

Content Aware SIEM™ Defined

Updated: Jan 4, 2010

download Content Aware SIEM Defined white paper

 

Content Aware SIEM™ (CA-SIEM) represents a new generation of Security Information and Event Management (SIEM) capabilities that extend the value and benefits of SIEM by providing visibility into the contents of applications, documents and protocols. Without content awareness, SIEM is only able to act upon the surface details provided by logs. This limits the effectiveness of key SIEM functionalities—including threat detection, incident response, and compliance reporting—because the data being used for analysis lacks sufficient context to make informed, relevant decisions.

As a result, SIEM systems have started to evolve: context information from add-on systems such as Identity Management, Vulnerability Assessment, Configuration Management systems, and others has been used to enhance the security events collected and correlated by the SIEM. While these systems provide a great deal of value to SIEM, the events themselves are still myopic, limited to the summary data provided by the source log files.

Consider the following example:

An email being sent by an admin user to an outside address, on the surface, does not represent a threat. However, if the email contains sensitive company information, either within the body of the email or in an attachment, the activity could indicate insider theft.

However, by extending visibility into the actual payload of applications and protocols on the network, this generates a massive increase in event load translating directly to a performance impact on current SIEMs. Query responses from most SQL-based data stores begin to slow rapidly after just a few million rows of stored data, and content information can quickly generate billions of rows. Even with highly optimized databases or flat-file data stores, current systems lack the scale and performance to deliver the real-time results needed to be valuable as rapid response operations systems.

Luckily, while content awareness represents architectural challenges to most SIEM platforms, it is a technology that is available today. NitroSecurity's NitroView Enterprise Security Manager (ESM) — which was built from the start to handle massive volumes of diverse data, logs and content — is the first commercially available Content Aware SIEM.


What is SIEM?

Security information and event management (SIEM) is sometimes defined as a set of technologies for log data collection, aggregation, normalization, retention, analysis and workflow. The analysis of security information includes the collection of "context data" — additional details and meta data that are relevant for understanding the impact of collected log entries, such as DNS names or geo-location information, but not defined within the original log — and the correlation, visualization and reporting of the newly contextualized information. In addition, SIEM products support related workflow either directly or through the integration of external case management systems, so that this newly contextualized information can be used directly to support the various operations of the Security Operations Center (SOC) or Computer Incident Response Team (CIRT). Similarly, Gartner defined SIEM as "[technology] used to analyze security event data in real time for internal and external threat management, and to collect, store, analyze and report on log data for regulatory compliance and forensics" (source: SIEM research note 2009)

Companies use SIEM tools as enabling technology for both Security Operations Centers (SOCs), where the SIEM is a key component of security monitoring functions, and also for compliance reporting in support of various regulatory requirements, including NERC, HIPAA, PCI, and the Sarbanes-Oxley Act (SOX).

SIEM solutions originated from simple event management and aggregation tools, evolving with industry trends until they became advanced security centralization solutions that help organizations optimize various aspects of security management, including incident, risk, policy and vulnerability management. As SIEM evolved, it also expanded to provide greater visibility into new areas of the enterprise, by expanding the scope of data that is fed into the SIEM: from log sources such IDS, IPS, hosts and firewalls to new information sources designed to provide additional context to an event — vulnerability assessment information, and identity information, for example.

As security and compliance requirements grow even more demanding, however, the effectiveness of the SIEM is under pressure. Where SEM added additional information context and historical analysis to event management to become SIEM, the need for visibility into actual application usage is driving SIEM towards a further evolution, towards a new SIEM that is both context and content aware. The new Content Aware SIEM (CA-SIEM) combines context and content together with log and event data to provide the next generation of threat detection, incident response, remediation, risk management and compliance services (see Appendices for examples and use cases).


What is context?

As we mentioned before, "context" in relation to security management is a matter of data enrichment. Data from various log, event and flow sources is fed into SIEM platforms in order to convert raw data into actionable information about threats, attacks, malware issues and other security relevant information. Such information is then used to make decisions - by SIEM operators and their managers as well as by the SIEM itself - to investigate, block, reconfigure, etc.

One of the ways SIEM performs such enrichment is by adding "context information." Context information is the additional information required to make the limited details available within an event or log more meaningful. Context information does not come from the logs themselves, but originates in the surrounding IT environment, in other information systems inside or outside the organization.

Context Information can be thought of using the analogy of an online retailer: if the book title, price, and ISBN make up the "log record" then where it was purchased, store address, phone number or author name are all parts of the book's context. Of course, the book's contents would be an example of Content Information, which is discussed later.

One of the simplest example of Context Information is name and, where possible, full identity resolution: DNS names, Windows Host names and/or network username details are added to the logs. While the log file may have already provided IP addresses, the added context of a human-readable name makes the log more meaningful. Normally, DNS names are not present in logs, but have to be obtained by queries to a DNS server. The SIEM tool gathers Context Information from a variety sources, including:

  • Windows name services, DNS and NIS servers: to map addresses to names
  • Defined asset groups: Internal or external status of an IP address; Logical or physical meta-groups
  • WHOIS servers: WHOIS information for external addresses shows who owns them and where they are located
  • Geo-location: show the physical location of the system
  • Asset and owner information for internal addresses
  • DNS and NIS servers: to map addresses to names
  • Active directory and LDAP servers: to map user names to actual user identities
  • Entitlement servers: to obtain a user's entitlements
  • Asset management systems: to gather information about systems, their ownership, compliance relevant of each system or group of systems
  • Attack and exploit information: to gather additional details about the log data
  • Vulnerability assessment information

An example of log data enrichment is given in Figure A.

Figure A: Enriching the surface data found in logs


Access to Context Information requires the SIEM to connect to external systems and then to retain the Context Information locally, so that it can be accessed quickly and used for event analysis, correlation, reporting, and other SIEM functions. Context Information is extremely useful for performing advanced forms of correlation. For example, correlating user role with access hours or matching the server under attacks with a particular department in the company are examples of useful correlations with context data. Consider the following examples:

Classic event correlation rule: if a user tries to login 10 times and fails, but then immediately succeeds to login, raise an alert "Password guessing successful."

Context-aware rule: if user belongs to a "IT group," but not to "IT support subgroup" and logs into a server in "finance" group, raise a medium severity alert "Possible unauthorized access."

The benefit of context-aware correlation is obvious in the above example, as it makes an "unauthorized login" alert more relevant to the security analyst. It also allows creation of very simple and effective correlation rules, which can be directly derived from corporate security policy.


What is content?

As mentioned before, log data and context data provide the major input into a SIEM product. Log data shows what activities were occurring, succeeding, failing or being attempted. However, what if we'd like to learn more about specifics of those activities? For example, what was that downloaded file? What was that email about?

Learning these details becomes more and more important as threats become more complex and as they "move up the stack," exploiting vulnerabilities at the application and session layers as well as business logic flaws. Current SIEM products cannot answer these questions, and require other expensive security products such as Data Leakage Prevention (DLP) and Database Activity Monitoring (DAM) to be deployed and integrated alongside the SIEM. However, as SIEM evolves to become content aware, the situation is changing, allowing the integration of key DAM and DLP features into the SIEM. Following the above book analogy:

The book's content provides the ultimate level of detail: the full text that make up the book's interior. While title, ISBN, price are part of the book "log record," and the added context of the author's other published titles might be obtained from outside sources to add relevance, what is the book about? Does it contain profanity? Does it mention a particular historical figure, or a certain character? How many chapters, words and characters does the book contain? How many times does the word 'engineering' appear in the text, and of those, how many occur immediately following the word 'social'? Only the analysis of the book's content will answer these questions.

This analogy also explains the value of content compared to the value of log data: while knowing the ISBN and title is very useful and gives an initial impression of the book, it is impossible to make a qualified and informed decision about the book until you read its contents. If it's a fiction novel, do you like the story? If it's a reference, does it contain the information that you need? Should you buy it? Without content awareness, you would be contradicting the age-old lesson of "judging a book by its cover."

Imagine a SIEM product having access to the actual content of the conversation. It's easy to understand the almost endless opportunities that this would provide to the security analyst. Using another analogy, consider law enforcement: tracing a phone call, or examining call records to know "who called who" provides a certain value to an investigation, but is largely circumstantial. In contrast, actually recording the contents of the phone call provides hard evidence, with a clear and incontrovertible record of the conversation, and all (if any) incriminations. Admittedly, knowing who called who is key for catching known offenders: common criminals and terrorists, for example. However, knowing what they actually spoke about allows the investigation to go the next level. The more granular collection and analysis of a conversation (the Content Information of the phone call) based upon the initial transaction (dialing a number to place a call to a certain destination) allows the initiation or escalation of incident response efforts to be more focused, efficient, and effective. Using SIEM for information security is no different: content truly matters, and facilitates all levels of threat detection and investigation (see 'Appendix B: Content Analysis in CA-SIEM'). However, adding content information to SIEM runs the risk of overwhelming the tools as well as the analyst with too much information. The SIEM has to be engineered to support massive amounts of data in a scalable manner.

So what is "content" as it relates to SIEM? For the purpose of this discussion we define content as the payload of an application, i.e., what is actually being communicated, transferred, and shared over the network. Logs describe the fact that an activity has taken place on a system or network. Content is what defines the actual nature of the activity. For example:

  • Email contents, including attachments
  • Social network communication
  • Document contents
  • Database queries and the size and/or subject matter of their responses
  • IM conversation contents

What Legacy SIEM is Missing

The SIEM typically collects data from a variety of sources to provide a base of forensic detail (for detection of events as well as for subsequent investigations), collects additional Context Information to provide other relevant assessments, such as risk and severity. However, with current SIEMs, the range of data sources collected is still relatively limited. Typical data sources include:

  • Firewalls: firewall logs contain information on connections allowed or blocked by the firewall
  • Netflow: similarly, netflow records contain information about systems connecting to other systems and include characteristics such as: session time, duration, protocol, packet and byte count.
  • Router logs: often disabled, router logs are similar to firewall logs and netflow since they contain connectivity information, as well as access 'permit' and 'deny' events related to network access.
  • NIDS/NIPS alerts: these alerts and logs contain information about attacks detected or prevented by the systems as well as information about suspicious activities.
  • Operating system logs such as Windows event logs or Unix/Linux syslog: they contain information about the routine system operations, system access as well as various errors and failures.
  • Email server logs: these logs contain the information about sent and received email, the sender and the recipients' addressees, errors, message sizes and other parameters of email messages.
  • Proxy logs: logs contain information about internet connectivity thru the proxy server; what sites the user went to and sometimes which actions he performed on each site.
  • Vulnerability Assessment results: results of a standard VA scan can define specific host and application vulnerabilities, and provide a detailed inventory of discovered assets.

The following log types are rarely sent into a SIEM, even though their use is on the rise. While collection of information from these sources is considered good practice, it also represents new challenges to the SIEM.

  • Database logs: native database logs contain queries and database administration commands.
  • Application logs: such logs contain literally anything that a developer will put in them, with no standards, limitation or restrictions.
  • IAM: Identity and Access Management systems provide user policy context within SIEM.
  • CMDB: Configuration Management Database systems provide configuration policy context to SIEM.

However, despite the source of these log files being applications and databases, they seldom represent actual application activity or data access. They still represent the surface detail; the "title and ISBN" information that is available in the log.


How Content Awareness Brings Value to SIEM

Before Content-Aware SIEM was created by NitroSecurity, a typical SIEM product was aware of things like source IP address, destination IP address, TCP or UDP port, username, attack type, number of bytes transferred, etc. As discussed earlier, this information is insufficient — especially when considering event correlation, which is the SIEM's primary mechanism for threat detection. Correlation rules are limited to fairly simple patterns that match known attack patterns, such as a "brute force" login:

Event correlation rule: if a user tries to login 10 times and fails, but then immediately succeeds to login, raise an alert "Password guessing successful"

Or perhaps a slightly more sophisticated rule:

Event correlation rule: if an IDS alert against a system is followed by a new user being added to a system, raise an alert "System compromise"

With the addition of context, correlation rules can leverage user roles, IP address information, and asset data for correlation. For example:

Context rule: if a non-admin user accesses a system after hours, alert "Possible fraudulent activity."

Adding content awareness to SIEM creates unique advantages for the security analyst; allowing the analysis of new content and correlating that content information with logs and context information. This results in better visibility and control over the entire IT environment, not just the network. Logs allow the SIEM to see the events occurring on systems and network devices, adding content information explains the nature of these events. From simply knowing that an email was sent, we move to knowing what it was about. From knowing that FTP connection was established, we move to knowing what file was copied.

Content data can be analyzed using Content Aware SIEM. Instead of correlating, summarizing, filtering and reporting on events we can now filter email contents, correlate SQL keywords with others information and perform other analysis tasks. Examples of how content data can add value to correlation rules are shown below:

Event correlation rule: if 1,000 emails originating from within the company are sent, raise an alert "anomalous email activity."

Context rule: if 1,000 emails originating from a non-SMTP host within the company are sent to 1,000 unique addresses, raise an alert "possible spam bot," with increased severity.

Content rule: if 1,000 emails originating from a non-SMTP host within the company, with a 'reply to' address in an outside domain, are sent to 1,000 unique addresses with the words 'account' and 'password' in the body of the email, raise a critical alert "possible spam bot," with maximum severity.

Getting the Most out of Content Correlation

Additional value is recognized when logs, context, and content information are correlated together. Such rules can be used for either security or compliance use cases, within the SOC environment or for automated security monitoring. Using content information alone will improve the value of correlation, but the real value of content correlation is achieved when individual rules are combined to create tiered or "composite" rules, which provide an even greater incident detection capability. For example:

Content rule: if outbound application contents contain sensitive terms (e.g., "quarterly report"), raise an alert "possible breach of non-disclosure."

Composite rule: if user role is "Finance" and an IM conversation topic is "quarterly report" and the IM program is not on the corporate approved list, raise a critical alert "possible fraudulent activity."

Such rules allow much more comprehensive security monitoring, compared to a legacy SIEM. Bringing this level of visibility into the Content Aware SIEM also brings us one step closer to a true "single pane of glass" CISO dashboard. Previously, the daily operations involved in security monitoring entailed watching the logs; now the views can include the actual contents of network communication and application activity, adding new tools to the arsenal of a SOC analyst, security manager and CSO.

Detection Capability

The CA-SIEM provides improved threat detection capability, from both inside and outside sources. As networks and operating systems grow more secure, cyber threats continue to move "up the stack" of the OSI model to take advantage of application vulnerabilities. Monitoring content allows the CA-SIEM to detect these new application- and database- level attacks, and even fraud and business logic abuse. Sending application-level traffic to the CA-SIEM for correlation and analysis allows for the detection of a much broader range of attacks. Single-purpose monitoring tools — including legacy SIEM, as well as standalone DAM, DLP and Deep Packet Inspection solutions — cannot provide this breadth of visibility and correlation.

Insider attacks, in contrast, are often simple cases of authorized access abuse. While these are not complex, blended attacks with sophisticated attack vectors, they are often more difficult to detect because they involve authorized users accessing information that is mostly within the realm of expected behavior. Total visibility of the authorized user activity - from file transfers to emails to various queries to IM to social networks - enables security analysts to separate legitimate actions from dangerous mistakes and from actual insider abuse.

Finally, if malicious activity is detected the CA-SIEM provides more relevant data, which is available for live forensic analysis as well as for restoring a complete picture of the incident. SIEM log repositories have long been used for forensics analysis, but now we can retain not just the logs, but also the documents and communication records of users, which are often even more useful than logs during the investigations (see 'Appendix C: Use Cases,' for more examples).


 

Why legacy SIEMs are not Content Aware

With all of the added value that content awareness brings to security information management, why aren't all SIEMs content aware? The issue is one of both performance and complexity; as each event is examined in more detail, the volume of raw event data associated with it increases exponentially. Therefore, by fully decoding packets to expose the content information - whether from applications or databases - the volume of information that needs to be stored, retrieved, analyzed, and visualized within the user interface quickly becomes unmanageable.

More Data Volume — Performance is Key!

Just as a book's contents are much larger than its title page, adding content information into the SIEM increases the event load by many times. This massive increase in event load translates directly to a performance impact on the SIEM: the data stores used by most SIEM products begin to slow rapidly after just a few million rows of stored data, and content information can quickly generate billions of rows. This is why most SIEM vendors are unable to add content information into their platforms. Even with highly optimized databases or flat-file data stores, these systems will quickly become unusable. The only way to support this level of event detail for sustained periods of time is to implement a highly scalable purpose-built database, which would require hundreds of man years of research and development to produce. NitroSecurity, with a history in database development, is able to use the Nitro Extreme Database (NitroEDB), which provides the scalability and performance necessary to support content information. [1]

More Complexity — Usability is Key!

Another challenge faced by the CA-SIEM is how to present the many types of new information to the operator, in a way that remains meaningful and useful. With only log data (the book's title and ISBN), each event can be clearly defined within the confines of a User Interface (UI). With context information added (the book's availability, reader score, and other meta- data), the UI can begin to grow complex, but the amount of information being visualized is still manageable. However, with content information (the full contents of the book), interface complexity becomes a real issue. Going beyond logs and context into document presentation, even simple query results can raise the challenges of design to a new level. Avoiding "SIEM operator overload" becomes critical for CA-SIEM success, and requires flexible control over how this detail is presented within the interface. Some methods of maintaining usability include:

  • Provide normalization of information to allow higher-level activity to be presented clearly, while allowing users to "drill in" for greater depth as needed.
  • Provide easy-to-use "rules" to be generated, so that the most important content information can be quickly identified.
  • Expose content information to the CA-SIEM's correlation system, so that content becomes a consideration when identifying potential threats — preserving the content information for forensics, but filtering it from initial views within the UI.

Conclusion

Content Aware SIEM is available today

While CA-SIEM represents a new generation of SIEM capabilities, these systems are available today. Information Security professionals no longer have to settle for a mix of very complex and poorly integrated solutions. For example, enterprise SIEM software getting data from another vendor's DAM solution, and providing content information from yet another vendor's DLP solution will NOT produce a content-aware SIEM; it will simply produce a SIEM with many different types of new logs. A content-aware SIEM can natively understand, integrate and correlate log, context and content data in one tool.

CA-SIEM is the natural evolution of SIEM, but only if built on a datastore that can handle the data

The original Security Event Managers (SEM) started by supported IDS logs. Bringing in other third party logs grew the SEM into a Security Information Management (SIM), which then evolved further to incorporate contextual information from other sources such as VA and IAM tools, finally becoming what we refer to today as a "Security Information and Event Management" system, or SIEM. Each evolution increased the event load placed on the system, in how fast events or logs needed to be collected, how much storage was required to support data retention over time, and how quickly the data could be analyzed and accessed, in order to produce actionable information.

With SIEM now becoming aware of content information, the strain of information management is being seen again — this time exponentially due to the depth of event detail being analyzed and retained. Not every SIEM can support these extreme data management requirements. Even second-generation SIEM platforms can barely handle the logs, much less the context around them, and there is absolutely no chance that adding content will be possible without sacrificing performance and usability. Only SIEM tools built from the start to handle massive volumes of diverse data, logs and content, can evolve to the next level, and become a true Content Aware SIEM.

 


 

Appendix A:

A Summary of Content Aware SIEM Capabilities

Please download this whitepaper to access appendices (registration required).

Appendix B:

Content Analysis in Content Aware SIEM

Please download this whitepaper to access appendices (registration required).

Appendix C:

Use Cases

Please download this whitepaper to access appendices (registration required).



These icons link to social bookmarking sites to help share this content.
  • bodytext
  • del.icio.us
  • Reddit
  • Slashdot
  • Technorati
  • Propeller
  • TwitThis
              
 

Search NitroSecurity.com