Machine Learning Support for Cyber Threat Attribution at FireEye

FireEye Stories 2020-11-19

This is an extended version of the article “The Future Of Work Now: Cyber Threat Attribution At FireEye” previously published by Thomas H. Davenport and Steven Miller in Forbes on May 28, 2020.

Steven Stone is the Director of Adversary Pursuit at FireEye, the intelligence-led security company. This group is part of the company’s Advanced Practices Team—highly skilled experts focused on determining the identity, actions, and next steps for cyber threat groups actively operating against FireEye clients. A key part of their work is to determine whether newly found cyber threat group activity is attributed to a new, unique entity, or is actually the activity of an existing entity they have previously been tracking.

Explaining the strategic importance of their efforts, Stone says, “If we can attribute the cyber threat activity we are monitoring to a particular threat group—even if we do not yet know the identity of that threat group—it helps in remediating, responding to, and most importantly, prioritizing our efforts.” This intelligence led prioritization of response and remediation efforts gives defenders a critical advantage as they attempt to intercept and remove a threat group who has breached an organization before that threat group can accomplish their mission and cause impact to the victim organization.

When a new cyber threat group or cluster shows up on the FireEye global cyber threat tracking “radar screen”, it is given a unique tracking number as an uncategorized group or cluster (UNC). If Stone’s team is able to determine that this UNC is in fact the same as a previous threat cluster they have been tracking, they can draw upon all their existing knowledge to merge the two groups together and anticipate what will happen next. They can also see how that existing threat cluster is evolving. Stone adds, “When we can identify that it is warranted to merge a new unidentified threat cluster with a pre-existing entity, we become much more familiar with the situation. This guides us in what to look for, and what response and remediation steps to take.”

In contrast, if it is not possible to decide if the perpetrating cyber threat entity is a familiar entity, “then it is as if you are feeling around in the dark. You do not know where to look. It is hard to focus your response efforts. It is more difficult to anticipate what the perpetrator will do next. Ultimately, it makes it challenging to know how seriously to take a given threat.”

That is why it is so important for Stone and the Advanced Practices Group to be able to assess the similarity of one cyber threat cluster to another threat cluster. “When we can determine that one uncategorized threat cluster—usually a more recently identified cluster—is actually the same as an existing entity that we are already tracking.” Stone explains, “It changes everything.” It makes incident response much more efficient and more directed.

Stone emphasizes, “This is why we put so much emphasis on assessing the similarity of a recently appearing unidentified threat cluster with all pre-existing threat clusters” (either other uncategorized though uniquely numbered clusters, or previously identified Advanced Persistent Threat groups which are usually state-sponsored, or Financial Threat groups focused on monetary gains).

“Understanding the degree of similarity across the many UNCs that we track was a big challenge for us,” said Stone. “Until 2018, our method for comparing UNCs was purely manual, and corresponding to that, our approach to the decision of whether or not two UNCs could be merged and considered to be the same entity was purely manual—and required the focus of our top experts.”

The Need for an Intelligent Tool to Support Similarity Comparisons of Uncategorized Cyber Threat Groups

Because FireEye is tracking thousands of UNCs, and also sizable numbers of other specifically identified threat groups, it was impossible for an expert analyst, or even for a team of expert analysts, to keep all of these entities in mind at once, and even more difficult to consider these comparisons over long periods of time. The work was consuming a lot of the group’s effort and was a big challenge for their workflow.

“This is the problem we threw Machine Learning (ML) against. We wanted intelligent, automated tooling to help us systematically and objectively make this comparison of how similar one UNC is to all other UNCs, as well as to the entities in other attribution categories. We needed to do this much more efficiently, given that we are always dealing with new UNCs, and new incoming telemetry on existing UNCs. We also wanted to make the entire process of making these determinations more systematic and more data-driven.”

FireEye network, endpoint, and email security controls deployed across the globe are built to allow telemetry to flow back to a central source, where the process of analyzing massive amounts of telemetry information can be centralized, standardized, automated and scaled. Stone notes that the decision to operate via this approach has been the key to the company’s success, as it can use the telemetry data across global client sites to monitor the cyber threat situation across the entire world. He observes, “This has been absolutely game-changing for us—it’s one thing to see a single threat at a single client and an entirely different capability to see all threats against all clients.”

Having vast quantities of telemetry information available and knowing how to systematically harness it for complex comparisons of threat clusters are two different matters. Stone explains, “The fact that we have all of this telemetry and metadata and supporting external intelligence data is great. But this is also a problem, and a big challenge for us. It is the number one challenge my team deals with. How do you do the type of highly detailed, complex work we do in a “bucket” of data that is this big? When you can look at telemetry and supporting metadata across millions of endpoints and millions of email boxes in real time, you have to be very smart about how you go about it.”

This is another area where ML is put to use; to manage the scale of information they have to deal with, as “you cannot hire someone smart enough to look at all this data all the time. That person does not exist. The bucket size of the data we have to process is way too big…and our experts want a bigger bucket.”

Domain Experts and Data Science Experts Team Up to Create the New Machine Learning Tool

Stone and his Advanced Practices Team came at this dual challenge of finding a better way—of comparing uncategorized threat clusters and harnessing the vast amount of global telemetry data that streams back to the company to help with these comparisons—from two ends. On one end, there was his team of highly specialized cyber threat attribution analyst with deep expertise in identifying, tracking and pursuing UNCs, as that is the crux of their job. On the other end, there was the FireEye Data Science Team. They did not have the same level of in-depth domain expertise in cybersecurity threat analysis. But they had deep expertise in data science, including the math, analytics modeling and computing required to build ML models and to test and validate them against available datasets—none of which Advanced Practices had as a capability.

Based on their domain expertise, the cyber threat attribution analysts identified almost 50 important dimensions of a cyber threat. They worked with the Data Science Team to update and reorganize the data set on all the UNCs they had been tracking in order to describe each UNC in terms of these dimensions.

Even with the data on each UNC organized in this way, there were too many dimensions and associated metrics for the expert analysts in Stone’s group to work with and make sense of. Stone observed, “Our analytic challenge was how to use these dozens of dimensions across thousands of threat entities we track to construct an overall composite score resulting in a single dimensional metric of similarity. Oh and do this incorporating anything new we learned since the last time we did this.”

Continuing their collaboration with the FireEye Data Science Team, the analysts developed a modeling framework based on established methods used by the Natural Language Processing and ML research communities to assess the degree of similarity of different text documents. At a conceptual level, they approached the ML model as follows:

They viewed each Uncategorized Threat Cluster (UNC) as a “document”.
They viewed each of the dimensions they used to characterize a cyber attack as a “topic” within the document.
They viewed the specifics within each of the dimensions as a “term” within the “topic”. For example, if Malware was a topic, then the specific types of malware used would be the analogous term within the topic.
They viewed the number of occurrences of each specific element of the cyberattack (e.g. number of times a particular method was used, or a particular type of malware was used) as the frequency count for the term.

Figure 1: Illustrative example of using the metaphor of 'documents' to organize FireEye summary information on cyber threat groups and clusters for machine learning analysis of similarity (source)

Figure 2: Overall model flow of the FireEye ATOMICITY tool for analysing the degree of similarity across cyber threat groups and clusters. Micro level similarities are computed for each “topic” and further aggregated to form a macro level composite similarity measure (source)

Their big insight was the analogy of mapping their specific need to assess the similarity of cyber attack threat clusters to ML based Natural Language Processing methods for automatically assessing the similarity of text documents. This insight would never have occurred without intensive back-and-forth interaction between the team of threat analysis domain experts and the team of data science experts.

The ATOMICITY Tool Supports Both ML and Human Learning

The tool the team developed for evaluating the similarity of cyber threat clusters was named ATOMICITY. To evaluate and validate the ATOMICITY tool, the team used it on historical information to look at all the previous decisions FireEye expert threat attribution analysts had made for merging unidentified threat clusters. These prior decisions were based on human expert determination that two separate cyber threat entities were actually the same entity. The review helped in two ways. First, it helped the data science Team to refine and improve the underlying model and analytic methods. Second, it also helped the threat attribution analysts to sharpen and deepen their own understanding of their expert reasoning as they had to meticulously compare their prior decisions with the results of the new similarity analysis tool.

Stone commented, “The review, validation and revision of the ATOMICITY tool continues as an ongoing effort that is built into our work process. We review every single instance of machine-generated output and our individual assessments on the same topic. This allows us to check both our analysts and our machines in a predictable and recurring manner.” As they continue to use the ATOMICITY tool, the ML system and the human experts help to train one another in ways that improve learning on both sides.

The new ML based ATOMICITY tool has changed the way Stone’s group does threat cluster similarity comparisons, and more broadly changed the way FireEye works across the company. For new threat cluster groups that are identified, the ATOMICITY tool automatically generates a list of the 10 most similar existing threat cluster groups.

The expert analysts use the ATOMICITY output as a starting point. The team no longer spends huge amounts of their work time on the essential and very complicated front-end task of selecting which existing UNC candidates to compare against or if new information came in to challenge old assessments. They now directly proceed into the higher value added step of more detailed consideration of the “short list” of candidates to consider. The analysts use the outputs of the ATOMICITY tool, together with their human expertise, to probe why they think these two groups (the new threat cluster versus any of the 10 items on the list of most similar existing clusters) are the same or not, with deeper attention and higher priority on candidates that have a higher similarity measure.

“Now, with this tool,” explains Stone, “ we can more objectively, more comprehensively and much more efficiently compare across our entire data set of existing unidentified threat clusters in a timely way, and generate the similarity metric relative to the new threat cluster entity of interest.“

Because the ATOMICITY tool can automatically run the similarity analysis across the entire data set of thousands of uncategorized threat clusters, the FireEye Advanced Practices Team can do this analysis much more frequently. This has given the company a new type of visibility that they never had before. They can now see how their entire “universe” of thousands of uncategorized threat clusters that they are tracking are moving towards or away from one another over time based on changes in the value of the composite similarity metric. This new view provides powerful and useful insights to Stone’s team, as well as to other FireEye analysis teams, to understand the evolution of the global cyber threat landscape.

Because they can now understand the degree of similarity across threat clusters at both a macro level (the overall composite score) and at more micro levels (based on the details used to derive the overall score), they can do something else that they could not do previously. That is, they can track and visualise the spread of very specific cyber attack techniques across the “universe” of unidentified threat clusters. They can see things like whether a particular threat cluster is becoming an “arms supplier” of a particular attack technique to other clusters. They can see if the specific cyber attack approaches being used across a part of the universe, or across the entire universe, are changing, and if so, how.

ML for Human Augmentation and Support

Stone emphasizes that the ML based ATOMICITY tool does not replace any of the expert threat attribution analysts on his team, but instead augments and expands what they do. He clarifies, “We do not allow the ML based tool to make the critical decision of whether or not two unidentified threat clusters should be merged. Only our team of expert threat attribution analysts can make that type of decision, using the supporting analysis information from the ATOMICITY tool.” He also notes, “The fact that ATOMICITY provides more hard data on core topics back to our analysts really resounds with the analysts who are the users—hard data rules everything in our world.”

The ATOMICITY tool has enabled FireEye to automate selective, essential parts of their end-to-end work process for threat attribution in ways that have greatly improved their efficiency. This in turn has enabled the expert analysts in the Advanced Practices group to recover a substantial portion of their time and mental capacity, and use it for doing the types of special investigative projects where humans are far superior to automated ML models. For both of these reasons, Stone notes, “the use of this ML-based tool in our group has paid off for itself many times over.” He adds, “The fact that we have been able to give time back to some of our analysts due to efficiency improvement, and that we have been able to move towards becoming more systematic and data-driven in our decisions, has been a positive contribution to our company environment.”

This efficiency improvement and the resulting recouping of human expert time has also made it possible to keep improving the ML-based tool. Stone comments, “The analysts in my team are now spending time on preparations for creating the next version of ATOMICITY. Their ongoing involvement in this process creates substantive buy-in for the effort. Our employees see that this tool is resourced as a core, ongoing company effort, and not just a one-time science project.” This creates an ongoing cycle of continuous learning and improvement, for the human analysts as well as for the ML-based system.

Expanding Use Cases for ATOMICITY

As FireEye has gained experience and confidence in using the ATOMICITY tool, they have found new use cases for it within the company and also their customers. For example, sometimes cyber threat clusters that were being tracked by FireEye dropped “off the radar” for a while, and then suddenly reappeared. And it is quite common for a new cyber threat entity to suddenly appear for the first time “on the radar” as a new kid on the block. In both type of situations, a wider set of analysts within FireEye, beyond Stone’s group, are making use of outputs that are enabled by the ATOMICITY Tool to examine these types of situations and probe for possible explanations. In addition, FireEye now includes content in special communications with their customers that incorporates analysis findings on uncategorized threat clusters that are the results of the work of their threat attribution experts using the ATOMICITY tool.

For years, a subset of FireEye customers have been asking the company to share with them the ongoing interim analysis of uncategorised cyber threat clusters (the UNCs). Prior to introducing and validating ATOMICITY, FireEye was reluctant to discuss this type of information with external customers. While the firm believed that it had the best team of experts in the world doing this type of threat attribution and related similarity comparison analysis, it was reluctant to share the interim results externally, as it was challenging to quantify, justify and explain the approaches underlying the analysis. However, the company’s experience with ATOMICITY changed their thinking. Additionally, the purely manual process left little bandwidth for customer communications. FireEye now has the confidence and capacity to share some of their assessments on uncategorized cyber threat clusters because they know they have a much stronger foundation and methodology for their assessments.

Can ML-based models be used in a more direct way to support the analyst in predicting what the threat entity will do next, or in a future time period? That is a goal for FireEye, and the company is already taking steps in this direction. Stone’s view is that this type of prediction is beyond the current state-of-the-art, and to the best of his knowledge, as of this point in time, no commercial or government entity is known to have the ability to make this type of prediction. FireEye wants to eventually work out practical and explainable methods for predicting what a threat entity will do in the future, even if it is a type of behaviour that has not yet been observed in the existing data on that entity.

Strategy for Scoping ML Applications

Stone believes that ML-based tools are most useful in helping his Advanced Practices Team with their broad base of high-volume work that has to be addressed. He elaborates, “Purposely, our strategy has not been to use ML-based tools to search for “the needle in the haystack”. We are not attempting to develop ML-based tools specifically to help us to search for that extremely rare event lurking in the shadows (for which, by the way, there is very little or no data to feed to the model). Our experience is that human experts are far superior—now and for the foreseeable future—for doing this type of early stage, ill-defined exploration, and that the data driven ML-based tools are more productively deployed across our broader based areas where we are deluged with continuous streams of data.”

He notes there are some other commercial and government organisations that take a different view of this, and put in a lot of effort into developing ML-based systems for scenarios that are essentially like trying to identify that needle in the haystack, or finding those weak signals of that unknown rare event lurking in the shadows. He says that from what he understands, “these types of ML efforts have a hard time justifying their expense and upkeep”.

Another important aspect of ML development and deployment strategy for the Advanced Practices Group is “no auto-magic”. The team will not use ML-based systems to generate analysis or make recommendations where they cannot understand what the ML-based system is doing, or how it arrives at the conclusions of its analysis. Stone comments, “We insist that our data science experts as well as our threat attribution experts are able to understand how the ML-based models work and generate their outputs. This way, our human threat attribution experts can safely and reliably use the outputs of the supporting ML system.”

Reflections on Big Changes

We asked Stone to reflect on the big changes he has experienced during the course of his professional career, starting out as a military intelligence analyst, and later as a cyber threat intelligence analyst. He shared the following story:

“Years ago, when I would sometimes teach courses at one of the U.S. military schools for training intelligence analysts, we would teach them how to investigate many different sources of information to find one useful piece of information. The assumption was that nothing was available, and by cleverly finding one piece of useful information, it was an intelligence breakthrough. And if you could find an additional one or two pieces of useful information—that was considered to be a big intelligence gathering success.

These days, I still sometimes teach at that same school for military intelligence analysts. It is a totally different situation. We start from an entirely different set of assumptions—that there are mountains and oceans of data available to sort through. Now we have an entirely different type of challenge. It is easy to find data. The hard challenge is to know how to sift through the vast quantities of available data to find the subset of what is useful and important for the intelligence task at hand. The entire game of intelligence analysis, and similarly cyber threat analysis, has changed. It has gone from how to find any useful pieces of information—on the assumption that so little information was available—to how to sift through vast amounts of information and know what to discard in order to discover and retain the subset of it that really matters for the task at hand.

This is an entirely different skillset and mindset. It could not be more different.”