The term Open Source Intelligence (osint) originally refers to a specific source of intelligence. In general, intelligence sources serve the purpose to produce raw data which can be further processed during the six steps of the intelligence cycle to gain insights (Office of the Director of National Intelligence 2011). Open Source Intelligence is defined as intelligence produced from publicly available sources that is collected, exploited, and disseminated in a timely manner to an appropriate audience for the purpose of addressing a specific intelligence requirement (Office of the Director of National Intelligence 2011).
The resulting mass of information is made available by the Internet to a broad audience, not necessarily limited to the intelligence community. Motivated by different reasons, a variety of parties develops and utilizes best practices, tools, and techniques to collect, exploit, and disseminate this publicly available information to address their specific requirements.
Bellingcat is a collective of researchers, investigators, and citizen journalists using open source and social media investigations to probe a variety of different subjects with impressive results. These include the identification of Russian intelligence officers as the key suspects in the Malaysian Airlines Flight 17 investigation as well as in the Skripal family poisoning. Moreover, they provided analysis of the chemical attack in Douma, Syria and of drone usage by non-state actors in Syria and Iraq. They exposed a fake persona who had been widely cited in Ukrainian and anti-Putin Russian media as a Pentagon official and revealed the illegal shipping of precursors of the nerve agent sarin to Syria by Belgian companies (The Bellingcat Collective 0000).
This chapter presents the theoretical framework of an Open Source Intelligence operation. The process how to undertake an osint investigation is outlined, the terms data, information, and intelligence are clarified, and selected tools and techniques are presented.
Different models to formalize the process of an osint investigation exist. In order to transform raw data into actionable intelligence, the intelligence community derived a model called intelligence cycle (Office of the Director of National Intelligence 2011). It is applied to all sources of intelligence and, in particular, to osint. This model has been adopted by Gibson (Gibson 2016) and with some adjustments also by Hassan (Hassan and Hijazi 2018). Bazzell presents a practical interpretation which is used as a mandatory training manual by U.S. government agencies (Bazzell 2021). Other works emphasize information gathering and analysis and, therefore, introduce models focusing on these tasks. This applies to the comprehensive three-step model derived by Pastor, et al. (Pastor-Galindo et al. 2016), as well to the model of Tabatabaei, et al. (Tabatabaei and Wells 2016).
This phase focuses on the collection of data and is described as gathering data. The idea is to systematically search public data using the known identifiers and link the findings to produce results. Bazzell gives the most detailed account about this phase by structuring it in three steps (Bazzell 2021). First, he suggests invoking specialized search engines, websites, and services which might require a payment or a fee. If the responsible investigator is associated with a law enforcement agency, this includes their respective closed source information systems. Second, the investigation continues with an initial web search of the identifiers followed by the utilization of selected osint application, tools, and techniques depending on the target of the investigation. In order to structure this utilization, he derives a workflow for each of the identifiers email address, user name, real name, telephone number, domain name, and location. Each workflow starts with a given identifier and proposes different paths including specific tools. This results in new pieces of information which can be further exploited to gain more insights. For example, a given identifier user name is potentially helpful to identify real name, email address, or a social network profile. The workflow includes several approaches how to proceed. One approach is a manual check for all social networks for the given user name, thereby potentially identifying the real name. Another path described in the workflow is the guessing of the email address based on the provided information. A third path takes the user name as input into a set of tools provided by Bazzell. In addition, the input is processed by standard and specialized web search engines and further enriched with information from compromised databases. All these workflows, however, are in most cases tailored to a search located in the USA. In the third and final step of the collection phase, all findings are captured.
This phase converts information into intelligence as described by Gibson (Gibson 2016). This includes the integration, evaluation, and analysis of the gained information to produce a result meeting the requirements (Office of the Director of National Intelligence 2011). Bazzell points out that this step aims to understand how information is connected and how to represent these connections. Therefore, he advises to use a link analysis tool in order to visualize the results of the investigation (Bazzell 2021). This phase is split up by Pastor to emphasize that the analyzed information can be subjected to additional data mining or artificial intelligence techniques in a dedicated knowledge extraction phase (Pastor-Galindo et al. 2016).
Contingent on the underlying model, information is either the output of the collection or the processing phase in Fig. 1. It is produced by processing the collected data. Depending on the nature of the data, processing includes translation, decryption, or format conversion, additionally filtering, correlating, classifying, clustering, and interpreting the given data. Fig. 2 refers to this output as open source information.
Compiling information to address a specific query results in intelligence (Gibson 2016). It is the result of the integration, evaluation, and analysis of information during the analysis phase in Fig. 1. Fig. 2 denotes it as open source intelligence.
Even if the intelligence cycle is initiated with a precise query as input, the response to this query relies on collecting, processing, and analyzing massive amounts of public data. This is predicated on at least the partial automation of certain tasks. In particular, the collection phase can be facilitated by the utilization of different tools and techniques.
A complete overview on tools is difficult to provide in light of the fact that many specialized tools are utilized. In addition, the landscape of external tools is extremely active and subject to change. One reason for changes is the revocation of the tools by their developers as observed in June 2019 when Bazzell withdrew his set of popular interactive online tools (Bazzell 2021) or the disappearance of the meta-crawler website searx.me. Another aspect is related to the dynamic nature of social networks. For example, Facebook and Instagram are known to actively undermine the usage of osint related tools and techniques. Therefore, they block respective web services, regularly change their source code, and restrict capacities exploited by the osint community (Bazzell 2021). For example, Instagram includes special character encoding in the source code of their website to make it difficult to directly extract URLs.
The contribution of this chapter is twofold. On the one hand, it recommends countermeasures against osint to preserve privacy in the face of such intrusion and, on the other hand, discusses the legal environment in Germany regarding the use of open source intelligence.
Another interesting aspect arises from the question of whether the data collected is really public. In the course of this work, public data was described as the opposite of classified data or closed sources. Thereby, this point of view regards almost all data accessible in the Internet as public. However, this description deserves a closer analysis. In particular, Sect. 1.1 implied that data acquired from social networks can be considered as public. However, access to the most common social networks like Facebook or Instagram is restricted by a registration process. This means that only registered members of this social network are eligible to access. Although this registration is free of charge, it requires to enter into a contract with the social network. Thereby, the user has to agree to the terms of service of the social network. In some cases, the application of osint tools and techniques violates this agreement. For example, it is forbidden to access Facebook or Instagram in an automated manner according to their terms of service or to create a Facebook account using a fake name. Moreover, it is known that Facebook actively blocks osint related services (Bazzell 2021). 2b1af7f3a8