Security professionals are recognising the intelligence value of leveraging publicly and commercially available information. This information can now be accessed more effectively from typically hard-to-reach regions. Also, the technological capabilities have matured in our age of artificial intelligence, machine learning, and data science.
Intelligence has historically been based on classified data. However, today’s unclassified data, including open-source intelligence (OSINT), is increasingly being used to provide context and queuing for other types of intelligence.
Advanced identity intelligence
Babel Street is a technology company providing advanced identity intelligence and risk operations using an AI-enabled data-to-knowledge platform to unlock insights from a flood of data. The company provides advanced data analytics and intelligence for the world’s most trusted government and commercial organisations.
Experts have predicted that by 2025 over 463 exabytes of data will be generated each day globally
The sheer volume of data is growing exponentially. Experts have predicted that by 2025 over 463 exabytes of data will be generated each day globally. Not only are we seeing exponential growth in the volume of data, but there is also disparity in the veracity and the variety of data. This is being compounded by the ‘app economy’ in which data is created in a new format for every app added around the globe.
Human language technology
“The problem is that the data ‘junk’ and the ‘crown jewels’ are in the same bucket, and government and commercial entities need better and faster ways to extract intelligence from these torrents of data,” says Farid Moussa, VP, Strategy & Public Sector, Babel Street. Prior to joining Babel Street, Farid retired from the National Security Agency (NSA). He has guided video, image, speech, and text analytics (VISTA) and developed an appreciation for human language technology.
An elusive source of data is the Dark Web, where every user, by design, is attempting to obfuscate their identity, and bad actors are hiding much better. “This presents a cat and mouse game – the cat must be smarter than the mouse, but the mouse is continually getting smarter,” says Moussa.
Intelligence tools for data analysis
SIGINT and HUMINT – while both vital – are also the most expensive forms of intelligence
There are several intelligence tools for analysing data. One of them is signals intelligence (SIGINT), which refers to electronic transmissions collected by ships, planes, ground sites, or satellites. Another is human intelligence (HUMINT), which is collected in a human-to-human fashion. Open-source intelligence (OSINT) is obtained by searching on topics or entities of interest that are publicly available on the Internet at large. Today, these various categories are often done in ‘silos of excellence.’
However, the best practice is using all forms together in a holistic fashion. SIGINT and HUMINT – while both vital – are also the most expensive forms of intelligence, while OSINT, which is growing in importance, is most cost-effective. All are vital forms of intelligence; OSINT is complementary and crucial to holistic intelligence practices.
Holistic intelligence practices
When it comes to physical security of people and places, OSINT has become a critical source of actionable information. Security directors leverage Publicly Available Information (PAI) to safeguard against threats to individuals, property, travel routes, and event sites. By monitoring PAI, security teams can detect and respond to potential dangers, including during and after events where thorough preparation is vital.
Online information can contain warning signs of impending threats. It informs security professionals in uncovering digital traces, confirming intentions, and addressing risks across language barriers, ensuring proactive risk management for the protection of people and property.
Role of Natural Language Processing (NLP)
The Internet and social media were mostly English language by default, but that has changed exponentially
Natural Language Processing (NLP) is a crucial capability that has evolved to recognise the richness and variety of words and names in multiple languages and scripts, and their use across cultures. Using machine learning and linguistics algorithms, the technology simultaneously considers numerous types of name variations. At one time, the Internet and social media were mostly English language by default, but that has changed exponentially.
Babel Street’s world-class entity matching technology measures over 100 features to calculate the similarity of entities across multiple languages. Despite advances in data management and the cloud, there are still multiple challenges and complexities with integration of these data elements. Challenges include spelling variances/phonetics, language translation issues, criminal evasion, human error upon input, typos, etc.
Accessing data from a scattered landscape
While there have been advancements in cloud technologies, agencies utilising open-source data are typically working within a highly scattered data landscape and must use a wide array of tools to get at the relevant pieces. This fragmentation makes it difficult to run analytics and apply AI and machine learning at scale in order to derive actionable insights.
Unstructured and relationship data are visualised through advanced link analysis
As with many disciplines, artificial intelligence (AI) is changing the game when it comes to intelligence. NLP and AI algorithms are employed to enhance datasets for greater quality, usability, and completeness. Unstructured and relationship data are visualised through advanced link analysis, geographic heat maps, influential entity carousels, topic clouds, and patterns by time and day.
Geographic heat maps
The advanced algorithms accurately score and prioritise critical entities within the relationship network while providing the citations from which an AI/ML-based decision was made.
“With the democratisation of AI, the world is becoming flat,” says Moussa. “Just like the most prosperous countries, even the poorest countries have the most advanced capabilities to do damage. Third-world economies often present a scenario where the financial gain of nefarious schemes and low-to-no regulation combine to incentivise bad actors.”
The Challenges of Name Matching
Identity has been an ongoing challenge for intelligence analysis due to the vast complexity of linguistics, spelling and cultural variances, human error, as well as human evasion. Technology and data science approaches are maturing, however machine translation can still struggle with meaning. The best-of-breed natural language processing capabilities run against the data while it still is in its native language. This minimises the occurrence of analytic errors caused by inaccurate machine translations.
This minimises the occurrence of analytic errors caused by inaccurate machine translations
It’s tempting to think that name matching is like doing a keyword search. The complexity of language makes it more challenging. New names are constantly created, with multiple spellings and no set of rules to encompass how names are formed. They are variable across languages, scripts, cultures, and ethnicities. Culturally specific nicknames and aliases add to the complexity.
Replacing human involvement
The investigation of the Boston Marathon Bombing in 2013 spotlighted an example of the significance of intelligence analysis. Even though the FBI had issued a detain alert for Tamerlan Tsarnaev back in 2011, Tsarnaev managed to travel to Russia in January 2012; and in July 2012, he returned to Boston. He was not detained on either occasion because there were too many names on the lists, and Tsarnaev’s last name had been spelled differently from the way it was on travel documents, thus enabling him to get through security.
With the Internet, social media, and the dark web, there’s been an exponential increase in public communications in various languages, adding significantly to the amount of analysis required to keep societies safe. Name matching, using AI, analyses multiple contextual data points across languages to arrive at matches.
Name matching, using AI, analyzes multiple contextual data points across languages to arrive at matches |
A common misconception is that this technology will replace human intelligence. “It’s more accurate to recognise its role as a force-multiplier, allowing humans to focus on the on the harder problems and/or vetting the results of AI,” says Moussa. “The technology can efficiently analyse massive volumes of data and distill it into actionable information in a timely manner. It augments human capabilities, enabling analysis at speed and scale beyond human capacity, without replacing human involvement.”
Commercial Technology to the rescue
“When it comes to threat and identity intelligence, we face a risk-confidence gap, underscored by the challenge of integrating traditional tactics with the modern digital landscape,” adds Moussa. “We cannot ‘hire’ our way out of this problem. Instead, it is imperative that we adopt technology to scale our efforts and free humans to solve the harder problems that machines cannot solve yet.”
The public sector loves to build things, but there are time-to-value and return-on-investment considerations to the ‘build or buy’ decision. When commercial technology can be leveraged by government, it frees resources up to work on problems that the commercial world hasn’t yet figured out, says Moussa. “The public and private sectors need to come together – one team, one nation, working together with mutual trust and collaboration,” he says.