R&D is a fundamental part of what we do here at JUMPSEC. When we begin a research project, it’s usually driven by a challenge or opportunity that we’ve encountered. We hope this post helps you to gain insight to the creative process we apply when developing a solution or approach to solving a particular problem, and to what’s coming up in future!
One of the primary challenges that our security analysts encounter is where and how to best use their time. Monitoring and reviewing the constant influx of data and alerts produced by our client’s networks whilst also finding the time to keep on top of trending and emerging threats is no mean feat, and not particularly conducive to a healthy work-life balance…
Part of the problem is that finding the right information is an increasingly complex task. Cyber security is unique in comparison to many more established fields in that the best sources of information come at a grass-roots level, not through more typical academic channels or publications. This means that the most useful insights come from security companies and researchers. Unfortunately, this brings our analysts into contact with the sales and marketing materials that are everywhere on the internet today (we recognise the irony here). Few produce truly objective research content that can be applied without a specific paid tool or product. On the other hand, much of the technical research content produced is not always particularly current or innovative. While sometimes interesting, it is not always useful or actionable.
We distilled the problem into two key areas:
- Sourcing information likely to be insightful
- Consuming information effectively and efficiently
- Streamlining the manual effort required to act on the information
If you ask any security professional where they get their threat intelligence, few worth their salt will advocate a paid-for data feed. Social platforms are often the place where the most current and actionable information can be found, and so seem like a good place to start.
On social media, useful security information comes in two forms – links to external documents or pages (such as security blogs or papers), and post threads where a researcher will talk through a process or topic over multiple tweets.
There are multiple data points to look to identify what a good source of threat intelligence looks like:
- The user making the post
- Engagement (likes and comments) with the post
- Historical engagement with other posts
- The type and name of attachment
- The destination website of any links provided
- Text analysis of the post(s)
- Text analysis of the destination or downloadable
By building the profile of what good and bad looks like in each of these areas and using artificial intelligence to learn from the data, it is possible to not only highlight posts that have already begun to circulate and become known within the security community, but also anticipate which new posts are likely to contain insightful information – enabling analysts to react early.
There are several further levels of detail to explore in each of these areas (which we will save for later). But here are some examples of hypothesis that can be used to quantify good or bad in each of the areas:
- A reputable user will likely see engagement from other reputable users.
- A less reputable user will see engagement primarily from their own company or affiliates, and less from the wider community. Cross referencing the user with platforms such as LinkedIn can help to highlight where engagement with the post is not genuine.
- Some destination sites will be more likely to feature marketing-type materials than others, indicating that they may be less useful.
The best threat intelligence relates to the current Tactics, Techniques, and Procedures (TTPs) used by threat actors, as opposed to more time-limited Indicators of Compromise (IoCs) which drives more reactive behaviours.
However, staying up to date with the latest TTPs is incredibly time consuming. The most useful materials typically consist of dense research papers, which can be time-consuming to read, understand, and apply. To overcome this, we built a Named Entity Recognition platform to aggregate open-source cyber threat data and enable more effective and efficient application of threat intelligence to security operations
The platform can aggregate, digest, and produce consolidated, graph-based outputs based on even the richest research papers in the public domain, to present only relevant and useful data to the user, in more easily digestible formats. For example, by using graph-based outputs to illustrate the ways an organisation is likely to be attacked based on its threat profile, as attributed to threat actors with an interest in targeting the specific organisation.
To reduce the amount of time spent manually sourcing and reading dense research papers profiling attacker techniques and malicious campaign activities, the platform scrapes a variety of sources for articles and reports – such as social media, RSS feeds, and search engines. These large bodies of natural language are filtered and passed through a deep learning model trained to extract cyber entities such as APT names, techniques, and malware names.
The deep learning model solves the problem most commonly known as Named Entity Recognition or Entity Extraction in order to extract relevant pieces of data from natural language content. Since the model doesn’t solve the problem with 100% accuracy, it is also necessary to retrain using human-labelled training sets on regular intervals. This ensures a high level of model accuracy when making label predictions on new articles and reports.
Next, these labelled entities are piped into an additional language processing tool that establishes the relationships between the cyber entities, i.e. an organisation was targeted by an adversary using a series of techniques. Finally, the entity relationship results are ingested into a central platform in a machine readable format to be outputted in a range of visualisations and consolidated reports.
Streamlining human effort
The final piece here is in leveraging automation to act on more operational intelligence in an automated way to maximise the opportunity for analysts to apply their time most effectively.
One of the common issues we have encountered when assessing security monitoring solutions is in the ‘chaining’ of events in order to properly diagnose signs of malicious activity taking place, which can lead to missed opportunities to detect an attack.
To address this, we sought to:
- Create a machine learning model to automatically classify and categorise events based on situational awareness of where the event occurred on the network, relative to the client environment.
- Visualise the client network using 3d modelling to enable analysts to more accurately analyse and respond to events.
The purpose of this sub-project is not to eliminate human effort entirely; we recognise that there is no substitute for intelligent human analysis. However, a solely human-driven model is not scalable or sustainable. Augmenting manual efforts with predictive modelling can help to reduce false positive propensity and drive down investigation times, enabling more efficient human operations to take place.
By combining the identification of good data sources with the ability to summarise content into actionable pieces of information, we can streamline much of the process by which our analysts can leverage the latest threat intelligence available over the internet.
Sometimes there is no substitute for reading a good research paper as it was written, so this method of summarising the data is not always the best approach. However, the same engine can be used to analyse the content of the attachments and destination websites, to better identify the information worth investing time into.
Keep an eye out for more posts on our Labs site related to this ongoing R&D project, and how we went about solving different challenges along the way.
Head of Enablement
As Head of Enablement at JUMPSEC, Dan is responsible for shaping the solutions that JUMPSEC offer, working with our clients to deliver the outcomes they need.