0% found this document useful (0 votes)

33 views

Paperaccepted ICACDS2020

The document proposes two open source intelligence (OSINT) solutions: 1) OSINTEI, an investigation platform that extracts relevant public information about a target individual to assist investigations in an efficient, timely manner. 2) OSINTSF, a social media search tool to help businesses find customer details and track individuals for various commercial purposes like insurance. Both solutions require minimal input and resources to provide rapid results.

Uploaded by

selma khalafi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Paperaccepted ICACDS2020

Uploaded by

selma khalafi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/343033121

Open Source Intelligence Initiating Efﬁcient Investigation and Reliable Web

Searching

Conference Paper · July 2020

DOI: 10.1007/978-981-15-6634-9_15

CITATIONS READS

3 2,999

4 authors, including:

Bipin Kumar Rai

ABES Institute of Technology
62 PUBLICATIONS 213 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Open Source Intelligence View project

PcPbEHR Healthcare solution View project

All content following this page was uploaded by Bipin Kumar Rai on 06 June 2021.

The user has requested enhancement of the downloaded file.

Open Source Intelligence Initiating Efficient
Investigation and Reliable Web Searching

Shiva Tiwari1, Ravi Verma1, Janvi Jaiswal1, Bipin Kumar Rai2

1
Department of IT, ABES Institute of Technology, Ghaziabad 201009, Uttar Pradesh, India
{shivatiwari757, ravipcm2000, janvijaiswal104}@gmail.com
2
Department of IT, ABES Institute of Technology, Ghaziabad 201009, Uttar Pradesh, India
{[email protected]}

Abstract. Open Source Intelligence (OSINT) is the collection and processing of

information collected from publicly available or open-source web portals or
sites. OSINT has been around for hundreds of years, under one name or
another. With the emergence of instantaneous communication and rapid
knowledge transfer, a great deal of actionable and analytical data can now be
collected from unclassified, public sources. Using OSINT as the base concept,
we have attempted to provide solutions for two different use cases i.e. the first
is an investigation platform that would help in avoiding manual information
gathering saving time and resources of information gatherers providing only the
relevant data in an understandable template format rather than in graphical
structure and focuses on demanding minimal input data. The second is a
business intelligence solution that allows users to find details about an
individual or themselves for business growth, brand establishment, and client
tracking further elaborated in the paper.

Keywords: Open Source Intelligence (OSINT), Machine Learning, AI,

Investigation, Security, Automation, Web Crawling.

1 Introduction

Security researcher Mark M. Lowenthal defines OSINT as “any and all information
that can be obtained from the overt collection: all media types, government reports,
and other files, scientific research and reports, business information providers, the
Internet, etc.” [1]
The major help that Open Source Intelligence does is the wide variety of information
it can give which is not restricted to only a single format such as text or image but the
entire possible and available format of data can be extracted from a publicly
accessible domain such as audio, video, etc.
In this paper, we are aiming to provide digital solutions that would help in collecting
information about the targeted entity through a single platform, saving most
importantly time.
For the first one, we have proposed a solution named OSINTEI (Open Source
Intelligence for Efficient Investigation) that helps in the investigation and data
extraction of a target host, particularly administrative officers as they are responsible
for all the administrative duties and root development of the nation. Government
officials are responsible for the development of the nation and its citizens. But what if
the officials who are looked upon for carrying out administrative responsibilities,
involve in owning illegal assets, show abnormality in expenses, have eye-catching
work behavior, etc. For such officials, information gathering is started by looking into
various different records and files available in public records present on governmental
and web portals. The process takes a lot of time and it all goes to nothing when
nothing suspicious is found. The time consumed in such cases could have been used
for other tasks too.
Investigation being the most crucial, time and effort consuming, cost absorbing phase
which is done manually and is being done the same way for ages. This traditional way
costs a lot than just money [2]. Even the risk of life for most of the investigation team
officials. This has risen the need for an automated investigation platform that is the
product of a cohesive technological advancement which reduces all the above-
mentioned investments and results in a digital age information-gathering protocol that
may ensure efficient investigation.
OSINT has been used for various other tools that fulfil searching, data extraction and
compilation goals earning fame and wide user demand due to the compatibility,
efficiency, and ease of use they provide [3].
For the next one, we have proposed a solution named OSINTSF (Open Source
Intelligence as Social Finder) that aims to provide a tool that makes searching and
finding details/information about an entity easy and accessible. Also, to benefit
businesses with the search capability and would allow content to be searched present
on social networking sites in real-time and provide profound analytical data.

2 Related Work

J. Pastor-Galindo, P. Nespoli, F. Gómez Mármol and G. Martínez Pérez, [4] in "The

Not Yet Exploited Goldmine of OSINT: Opportunities, Open Challenges and Future
Trends," – 2020, described the existing state of OSINT and gave a detailed review of
the system, concentrating on the methods and strategies that strengthen the area of
cybersecurity. They shared their views and notes on the problems that need to be
solved in the future. Also, they researched the role of OSINT in the governmental
public domain, which is an ideal place to exploit open data. The paper lacked in
providing an approach that formulated the development of a solution that could help
validate their idea to save open-source data. Unlikely, our paper concentrates on the
development of platforms that solve applicable use-cases mentioned ahead.
In “The Evolution of Open Source Intelligence” - 2010, Florian Schaurer and Jan
Störger [5] very clearly discuss the reason, history, and challenges behind the
evolution of OSINT. The adoption of Open Source Intelligence by the private sector
or agencies such as CIA and its partnerships for development in the field has also
been discussed. It fulfills the theoretical details of evolution but lacks in the
information related to the possible practical application.
3 Proposed Solution

The Investigation [6] and policing methodologies have evolved with time. But neither
has been embedded totally with the technology.
The first proposed solution aims to provide a complete one destination platform for
the entire information extraction investigation, focusing on the targeted individual and
delivering miscellaneous as well as sorted data to the researcher. Data being either an
age ago news coverage or the current financial status, this product would search each
and every single module present in the entire webspace and find out the relevant
information about the host needed for the investigation. Once complete, it would help
to attain all the necessary information or data that are needed to find a direction for
investigation, saving time that manual information gathering consumes. Instead of
manually searching into different online sources, making a report on the same and
taking days for a single task, they can use our solution to find relevant data about the
target in few seconds and would get a data-filled report via a template on the same.
Data is extracted from various public domains that can be legally used, government
websites such as Supremo where data gets updated each year and is accurate. So, only
the rich data would be extracted and would help in the investigation or information
gathering. Making the primary stage of Investigation an easy task with efficient and
reliable results.
The second solution aims to help businesses that keep records of their clients and seek
new interested hosts. Would help individuals keep track of their social and web
presence. Social media is becoming a crucial part of digital communication strategies.
It is now an effective tool not only to improve brand loyalty and win new clients but
also to strengthen the customer service by allowing businesses to access the social
media networks to establish relationships and expand the span of their interactions.
It would also help businesses that work in insurance, banking or any other
investments industry to build peer-to-peer networks to meet non-contactable
consumers whose renewal or incentive or maturity programs rest unclaimed. They can
use this solution to find details about the customers and contact them.
Both of these solutions ask for minimal resource requirements for usage and also
minimal input data about the target. With only a single laptop/device having a
connection to it and you get the results immediately.
Being the technology, which has not been used commonly, OSINT [7] has a lot more
in its treasure of usability that can be scraped out to create a software product with
higher usability strength [8]. [9] The solution, being a software product would use
Machine learning and Artificial Intelligence, classification and regression algorithms
such as the Naïve Bayes algorithm. Selenium [10], which is an efficient and portable
framework used for crawling and testing web applications and ensuring quality would
also be used. The solution uses JWT (Java Web Tokens) for session management
bringing security and uses microservices to upgrade the scalability of the product. A
detailed description of the technologies used and their roles are present ahead

Both the solutions use a similar technological stack but are different when it comes to
their functionalities and use cases. Facial/pictorial data or image can also be used as
an alternative input for investigation but only if name (being the primary input) is
unavailable. This feature increases the ease of use and broadens functionalities of the
solution.

Fig. 1. Technological design of the solution.

The proposed solution works by following a particular algorithm Global Search (GS).
This GS is used as a Global data structure that contains all the various details of a
particular person, a foreign agent or a group.

3.1. Global Search (GS)

This GS Algorithm further gets divided into four smaller sub algorithms,
1. GS-Crawl
2. GS-Extraction
3. GS-Reinforce
4. GS-Template

3.1.1. OSIGS-Crawl

In this part, a single keyword is considered as an input parameter which is generally

supposed to be the name of that individual or group to be search on the basis of that
input the crawler goes into the web and find digital footprints of the targeted
individual or group. This algorithm returns a JSON of a Global Data structure which
can be new if the target is unavailable on public domains.

• GS-Crawl(Pi)

driver unit.
Searcht  Pi.
If(! Searcht )
cwdriver instance
for each i in S:
tempS[i] ꓯ iS
cw ꓯ temp.
lst cw
call insertIntoGS.
db_init( JSONF ).

• insertintoGS(cwi):

retrieve cwi.
Push into Global Stack(GS).
if (GS_count < 0)
Return null
else
Return GS(r1,r2,r3……..rn)
G_StackGS(r1,r2,………rn)
If G_Stack is null :
goto 1.
else
itemi pop(G_Stack).
serialize (itemi).
JSONs  serialized(itemi)
goto 4.
return JSONs.

3.1.2. OSIGS-Extraction

OSIGS-Extraction algorithm takes input of JSONs and provides output that as

keywords of the JSONs file in JSONF file using the following formula:

GS-TF(t,d) = ft,d /(no. of words in d)

GS-IDF(t,d) = log( N / {d E D : t E d}
GS-E(t,d) = GS-TF(t,d) * GS-IDF(t,d).

where, f = frequency of letter in the document.

d = JSONF document.
D = total no of JSONF documents.
N = no. of d in which t occurs.

3.1.3. OSIGS-Reinforce

OSIGS-Reinforce algorithm takes the input from the extraction maintenance service
as a JSON and builds a date set after deserializing the response. The OSIGSR behaves
as a Rest end point consumer for the processing of consuming JSONF in order to train
the model.
The various results that have been gathered from the public domains against the target
are used as different parameters in order to train the learning model.

• OSIGS-Reinforce(JSONF )
init ɸ (JSONF , t).
Ji E JSONf , Ji ꓯ JSONF :
temp = Ji
for each (i in j):
Select t from JSONf
Do trigger t,
watch output 0 and next Ji+1
ɸ(Ji , t)Q(Ji , t) + ß [ O + ρ . maxß, ɸ(Ji’, t’ )-ɸ(Ji ,t)].
Ji  Ji’
Push JSON[Ji1’,Ji2 ‘……..Jin’] in db.

Explanation of this algo:

Initialize the ɸ value i.e., ɸ( JSON, trigger) then watch the current state JSONi
choose a trigger it , only belonging to the Ji . Now provide the output 0 and watch out
for new or next state Ji+1. Update ɸ values until all values of JSON are exhausted.
After this the new and approved results of the JSON[Ji1’,Ji2’,Ji3’,Ji4’…….Jn’] will
be reduced by the reinforcement learning service , which will go into the DB service.

3.2. Concepts and Technologies Used

3.2.1. High Level Design

It is an architecture that is used for software application development. The

architecture diagram provides an overview of a system as a whole, defining the key
components that would be developed for the product and its interfaces.
Fig. 2. Design for handling scalability.

Fig. 3. Crawler flow and Extractor Flow.

3.2.2. Kafka
Kafka architecture is being used here as it provides higher throughput, speed,
scalability, reliability and replication characteristics for any real-time streaming data
architectures, big data collection or can provide real-time analytics [11].

Fig. 4. Kafka architecture

Kafka is made up of Records, Topics, Consumers, Producers, Brokers, Logs and

Clusters. Records can have base, meaning and timestamp. Kafka archives are the
unchangeable. A Kafka Subject is a database stream ("/orders," "/user-signups"). On
many servers the Kafka Cluster consists of many Kafka Brokers. Broker also refers to
more of a logical system, or to the entire Kafka system.
To handle the cluster Kafka uses ZooKeeper. ZooKeeper is used to organize the
topology for the brokers / clusters. ZooKeeper is a reliable, configuration information
file system. ZooKeeper is used by Broker Subject Partition Members for leadership
election [12].
Kafka uses Zookeeper to hold Kafka Broker and Subject Partition pair leadership
elections. To Kafka Brokers that shape the cluster, Kafka uses Zookeeper to handle
the application discovery. Zookeeper sends topology updates to Kafka, so every
cluster node learns when a new broker joined, a Broker died, a topic was deleted or a
topic was introduced, etc. Zookeeper offers a sync view of the setup for the Kafka
Cluster [13].

3.2.3. Consistent Hashing

Consistent Hashing is a distributed hashing scheme that operates in a distributed hash

table independently of the number of servers or objects by assigning them a position
on an abstract circle, or hash ring. This allows for scale of servers and objects without
affecting the overall system. [14]
Consistent hashing is based on mapping each object to a point in a circle (or mapping
each object to a real angle, equivalently). The system maps each machine (or other
storage bucket) that is available to many pseudo-randomly distributed points on the
same circle.

3.2.4. Data Set and Training of the model

The data being extracted from the public and authentic government sources would be
converted into data sets that would be used to train our model which further would
help us to relevantly classify between the most relevant new link or information
which would be added to the template.

3.2.5. Relevancy Factor and data classification via Naïve Bayes Theorem

Naive Bayes is a simple technique for building classifiers: models assigning class
labels to problem instances defined as vectors of feature values, where the class labels
are taken from some finite set. [15] There is no single algorithm for training such
classifiers, but a family of algorithms based on a common principle: all naive Bayes
classifiers conclude that, given the class variable, the value of a particular feature is
independent of the value of any other attribute.
To check which link to be given priority of being shown via the template over the
other, the Naïve Bayes theorem comes in handy. The crawler extracts various links
that are yet to be checked for the relevancy. For the check, all the links are checked
and compared with the trained data searches for the probability of relevancy for each
of the query link.
The link with the highest probability is then chosen to be showcased in the template.

Fig. 5. Relevancy factor identification (For the above graph, it is easily clear that
Query 1 has the highest probability of relevancy than Query 1 or 3 and so, Query 2
will be added to the template.)

After all the links are compared, as being shown in all above graphs we can notice a
link showing higher relevancy probability ratio than the others in each graph and so
they’ll be used in the template that would ensure higher efficiency and information
relevancy.

3.2.6. Using Spring Boot to create Micro Service

Spring Boot, being a Java-based open source framework, helps create micro Service
[16]. It would make our solution more scalable.
It offers a flexible way to configure Java Beans, XML, and Database Transactions.
This offers efficient batch processing and REST endpoints management. Everything
is auto-configured in Spring Boot; no manual settings are required. It offers Spring
application based on annotation. Managing reliance eases. This requires Embedded
Servlet Container [17].
Micro Service is an architecture that allows the developers to independently develop
and deploy services. Every program running has its own mechanism and this enables
the lightweight business application support model.
Spring Boot provides Java developers with a good platform to develop a stand-alone
and production-grade spring application that they can just run. With minimal
configurations, you can get started without having to set up a whole Spring
configuration [18].

3.2.7. Selenium for Crawling

Selenium is an open source tool designed to automate web browsers. This provides a
single interface that allows you write test scripts in various programming languages
such as Ruby, Java, NodeJS, PHP, Perl, Python, and C#, and more.
Versatility of Selenium is part of the reason why selenium is so popular. Anyone who
codes for the web may use Selenium to check their code / app–from individual
freelance developers running a short series of debugging tests to UI engineers
conducting visual regression tests after a new integration process [19].

3.2.8. Spring Reactive

Reactive programming is indeed a programming model that advocates a data

processing approach which is asynchronous, non-blocking, event-driven. Reactive
programming includes modeling data and events as measurable sources of data and
applying routines for data processing to respond to changes in those streams [20].
We make a request for the resource in the reactive programming style, and start
performing other things. When the data is available, we receive the notification in the
form of a call back function along with the data. We manage the response in callback
feature as per application / user needs.

3.2.9. Load Balancer

Load balancing [21] improves workload distribution across multiple computer

resources, such as computers, a computer cluster, network connections, central
processing units, or disk drives. Load balancing aims at optimizing resource
utilization, maximizing throughput, minimizing response time and avoiding
overloading of any resource. Use through load balancing modules instead of a single
system will improve the efficiency and availability by redundancy. Load balancing
[22] typically involves specialized software or hardware, such as a multilayer switch
or a cloud Domain Name Network.

3.2.10. JWT

JWT (JSON Web Token) is an Internet standard for creating JSON access tokens
which assert a number of claims. A server could, for example, generate a token that
has the claim "logged in as an admin" and provide it to a client. Then the client could
use the token to show it's signed in as admin. The tokens are signed by the private key
of one party (usually the server's), so that both parties (the other being already in
control of the respective public key by some appropriate and trustworthy means) can
check that the token is valid. The tokens are designed to be lightweight, URL-safe and
especially usable in a single-sign-on (SSO) web browser setting.
Usually, JWT [23] statements can be used to transfer identification of authenticated
users between an identity provider and a service provider, or any other form of
assertion that business processes require.

3.2.11. Face Recognition using Amazon’s ReKognition boto3 API

Amazon Rekognition offers fast and accurate face recognition, enabling us to use our
private face picture repository to identify individuals on a photo or video. We can also
check identity by evaluating a facial picture for contrast to photographs that you have
kept. [24]
We can easily detect when faces appear in images and videos with Amazon
Rekognition, and get attributes such as gender, age range, eyes open, glasses, facial
hair for each. We can also calculate how these facial features change over time in
film, such as creating a timeline of an actor's articulated emotions.
So, no matter how old the image is, it can efficiently recognize the individual and help
our software to search and find out the related data about that particular entity.

4 Future Work

Based on the proposed solution we can create a platform that can make investigation
stress and hassle-free process and providing a business-friendly solution that solves
crucial business use cases. With the adoption of such a high-end digital platform,
national security and efficient investigation won’t be a dream anymore. The success
rate of any investigation would be maintained by making efficient use of time and
resources. Tracking presence and staying updated on one’s or the client’s web/social
presence would be much easy. Estimating the buzz of your brand would be handy.
OSINT can prove to be a wonderful choice when putting in terms of data extraction
and processing. It truly helps in getting and availing the best of the best data from the
abundantly available data stocks available on legal and accessible public domains.
This solution when ready can bring a new wave in the field of investigation and real-
time searching with huge pros to the nation.

5 Conclusion

To establish efficient and reliable law solutions that solve real-world problems related
to the use of data, records, and information, OSINT can be considered as a better
option and can be considered as the time, money and resource saver making it an
efficient option. In the digital age, to investigate manually and traditionally by visiting
record rooms and searching through the document files can be called foolish and so
bringing out a software product can brief these efforts and make investigation a lot
easier and also broader in perspective. The application of Open Source Intelligence in
the investigation process can truly help in receiving better, optimized, accurate and
reliable results all in one single platform with only minimal input information and no
physical effort at all. It can also help track the presence and online fame of an
individual gathering a lot of commercial project ideas in the future using the same
technology.

References

[1] Roger Z George, Robert D Kline, Mark M. Lowenthal. “Intelligence and the national
security strategist: enduring issues and challenges”, Rowman and
Littlefield, ISBN 9780742540392, vol. 58, pp.273-284, 2005.
[2] James Byrne1, Gary Marx. “Technological Innovations in Crime Prevention and Policing.
A Review of the Research on Implementation and Impact”. Maklu-Uitgevers. ISBN 978-
90-466-0412-0, pp.17-40, 2011.
[3] Ricardo Andrés Pinto Rico, Martin José Hernández Medina, Cristian Camilo Pinzón
Hernández, Daniel Orlando Díaz López, Juan Carlos Camilo García Ruíz. Open source
intelligence (OSINT) as support of cybersecurity operations. “Use of OSINT in a
colombian context and sentiment Analysis.” Revista Vínculos: Ciencia, Tecnología y
Sociedad. Vol 15, pp.195-214, 2018.
[4] J. Pastor-Galindo, P. Nespoli, F. Gómez Mármol and G. Martínez Pérez, "The Not Yet
Exploited Goldmine of OSINT: Opportunities, Open Challenges and Future Trends," in
IEEE Access, vol. 8, pp. 10282-10304, 2020.
[5] Florian Schaurer, Jan Störger. “Guide to the Study of Intelligence. The Evolution of Open
Source Intelligence (OSINT)”. Intelligencer: Journal of U.S. Intelligence Studies. Vol 19
No 3, pp.53-56, 2010.
[6] Richard Adderley & Peter Musgrove. “Police crime recording and investigation systems –
A user’s view. Policing: An International Journal of Police Strategies & Management”.
Emerald. 24(1), pp.100-114, 2001.
[7] Clive Best, "Web Mining for Open Source Intelligence,". IEEE. 12th International
Conference Information Visualisation, London, 2008, pp.321-325.
[8] Giovanni Nacci. “The General Theory for Open Source Intelligence in brief. A proposal”.
Intelli|sfèra. pp.1-3, 2019.
[9] Nihad A. Hassan, Rami Hijazi. “Open Source Intelligence Methods and Tools”. Apress
Media LLC. ISBN-13 (pbk): 978-1-4842-3212-5 ISBN-13 (electronic): 978-1-4842-3213-
2. 15-18, 2018.
[10] Arjun Satheesh, Monisha Singh. (2017). “Comparative Study of Open Source Automated
Web Testing Tools: Selenium and Sahi”. Vol 10(13), ISSN (Print): 0974-6846. ISSN
(Online): 0974-5645, 2017.
[11] Philippe Dobbelaere, Kyumars Sheykh Esmaili. “Kafka versus RabbitMQ: A comparative
study of two industry reference publish/subscribe implementations: Industry Paper”,
pp.227-238, 2017.
[12] Bell, Jason. (2020). Machine Learning Streaming with Kafka. O’Reilly, ch12, pp.239-303,
2020.
[13] R. Shree, T. Choudhury, S. C. Gupta and P. Kumar, "KAFKA: The modern platform for
data management and analysis in big data domain,". 2nd International Conference on
Telecommunication and Networks (TEL-NET), 2017, pp. 1-5.
[14] X. Wang and D. Loguinov, "Load-Balancing Performance of Consistent Hashing:
Asymptotic Analysis of Random Node Join," in IEEE/ACM Transactions on Networking,
vol. 15, no. 4, pp. 892-905, Aug. 2007.
[15] Z. Zi-qiong, Y. Qiang and Li Yi-jun, "Using Naïve Bayes Classifier to Distinguish
Reviews from Non-review Documents in Chinese," 2007 International Conference on
Management Science and Engineering, Harbin, 2007, pp. 115-121.
[16] P. D. Francesco, I. Malavolta and P. Lago, "Research on Architecting Microservices:
Trends, Focus, and Potential for Industrial Adoption," 2017 IEEE International
Conference on Software Architecture (ICSA), Gothenburg, 2017, pp. 21-30.
[17] B. Christudas, “Spring Boot, Practical Microservices Architectural Patterns”, pp.147-182,
2019.
[18] K. Reddy, “Web Applications with Spring Boot - Beginning Spring Boot 2: Applications
and Microservices with the Spring Framework”, pp.107-132, 2017.
[19] R. Chen and H. Miao, "A Selenium based approach to automatic test script generation for
refactoring JavaScript code," 2013 IEEE/ACIS 12th International Conference on
Computer and Information Science (ICIS), Niigata, 2013, pp. 341-346.
[20] Iuliana Cosmina, “Building Reactive Applications Using Spring”. Pivotal Certified
Professional Core Spring 5 Developer Exam, 2020, pp.903-955.
[21] S. S. Abdhullah, K. Jyoti, S. Sharma and U. S. Pandey, "Review of recent load balancing
techniques in cloud computing and BAT algorithm variants," 2016 3rd International
Conference on Computing for Sustainable Global Development (INDIACom), New Delhi,
2016, pp. 2428-2431.
[22] S. W. Prakash and P. Deepalakshmi, "Server-based Dynamic Load Balancing," 2017
International Conference on Networks & Advances in Computational Technologies
(NetACT), Thiruvanthapuram, 2017, pp. 25-28.
[23] P. Wehner, C. Piberger and D. Göhringer, "Using JSON to manage communication
between services in the Internet of Things," 2014 9th International Symposium on
Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC), Montpellier,
2014, pp. 1-4.
[24] Abhishek Mishra, “Amazon Rekognition- Machine Learning in the AWS Cloud”, John
Wiley & Sons, ch18, 2019, pp.421-444.

Appendix
f = frequency of letter in the document.
d = JSONF document.
D = total number of JSONF documents.
N = number of d in which t occurs
pi = ith person
cw = crawler
S = set of links to be searched
lst = local storage of each crawl result
G_Stack = Global stack
JSONF = final JSON
t = triggers
ß = learning rate

View publication stats

TMS-402.602 - 16 - Masonry Code
0% (4)
TMS-402.602 - 16 - Masonry Code
383 pages
(OSINT) Open Source Intelligence Investigation - From Strategy To Implementation (2016)
73% (11)
(OSINT) Open Source Intelligence Investigation - From Strategy To Implementation (2016)
302 pages
How OSINT Took Down The Dark Web's Silk Road
No ratings yet
How OSINT Took Down The Dark Web's Silk Road
25 pages
Python Crash Course: The Complete Step-By-Step Guide On How to Come Up Easily With Your First Data Science Project From Scratch In Less Than 7 Days
From Everand
Python Crash Course: The Complete Step-By-Step Guide On How to Come Up Easily With Your First Data Science Project From Scratch In Less Than 7 Days
Simon Tallman
No ratings yet
Analytics and Big Data for Accountants
From Everand
Analytics and Big Data for Accountants
Jim Lindell
No ratings yet
Instruction Manual Brother RH-981A
No ratings yet
Instruction Manual Brother RH-981A
138 pages
PRM Lec 1 27 Feb 28022021 072815pm Merged
100% (1)
PRM Lec 1 27 Feb 28022021 072815pm Merged
332 pages
Using Open Source Intelligence As A Tool For Reliable Web Searching
No ratings yet
Using Open Source Intelligence As A Tool For Reliable Web Searching
12 pages
Fusing Algorithms and Analysts Open-Source Intelligence in The Age of Big Data' in Intelligence and National Security
No ratings yet
Fusing Algorithms and Analysts Open-Source Intelligence in The Age of Big Data' in Intelligence and National Security
17 pages
MTA finder - Unified OSINT platform for efficient data gathering
No ratings yet
MTA finder - Unified OSINT platform for efficient data gathering
6 pages
Case Study Sexual Assault
No ratings yet
Case Study Sexual Assault
8 pages
20bce7201-Cs Final Lab Report
No ratings yet
20bce7201-Cs Final Lab Report
20 pages
Mastering Modern AI Tools
From Everand
Mastering Modern AI Tools
Jean Claude AI
No ratings yet
5_C. Julan, M. Togan
No ratings yet
5_C. Julan, M. Togan
6 pages
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
Research On OSNIT1
No ratings yet
Research On OSNIT1
18 pages
Adopting Open Source Software: A Practical Guide
From Everand
Adopting Open Source Software: A Practical Guide
Brian Fitzgerald
5/5 (1)
Tuominen Sanna
No ratings yet
Tuominen Sanna
90 pages
Big Data Ethics in Research
From Everand
Big Data Ethics in Research
Nicolae Sfetcu
No ratings yet
1675586852614
No ratings yet
1675586852614
24 pages
The Not Yet Exploited Goldmine of OSINT Opportunit
100% (2)
The Not Yet Exploited Goldmine of OSINT Opportunit
23 pages
OSINT in the Intelligence Era: Lecture notes
From Everand
OSINT in the Intelligence Era: Lecture notes
Gianluigi Me
No ratings yet
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
From Everand
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
How to Be a Successful Software Project Manager
From Everand
How to Be a Successful Software Project Manager
Dr. Tuhin Chattopadhyay
No ratings yet
Data Analytics with Python: Data Analytics in Python Using Pandas
From Everand
Data Analytics with Python: Data Analytics in Python Using Pandas
Frank Millstein
3/5 (1)
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
ERPANET Case Study: Project Gutenberg
From Everand
ERPANET Case Study: Project Gutenberg
ERPANET
No ratings yet
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Open Source Intelligence Tools (OSINT)
No ratings yet
Open Source Intelligence Tools (OSINT)
19 pages
OSINT
No ratings yet
OSINT
80 pages
Open Source Intelligence (OSINT) : Issues and Trends: January 2020
No ratings yet
Open Source Intelligence (OSINT) : Issues and Trends: January 2020
11 pages
Big Data: Opportunities and challenges
From Everand
Big Data: Opportunities and challenges
BCS, The Chartered Institute for IT
No ratings yet
Makers of the Environment : Building Resilience Into Our World, One Model at a Time.
From Everand
Makers of the Environment : Building Resilience Into Our World, One Model at a Time.
Finith Jernigan
No ratings yet
The Concept of Open Source Intelligence
No ratings yet
The Concept of Open Source Intelligence
3 pages
Data Mining: Concepts, Fundamentals And Applications
From Everand
Data Mining: Concepts, Fundamentals And Applications
Enrico Guardelli
No ratings yet
The Internet of Things: System and Applications
From Everand
The Internet of Things: System and Applications
Ajit Singh
No ratings yet
Reality Mining: Using Big Data to Engineer a Better World
From Everand
Reality Mining: Using Big Data to Engineer a Better World
Nathan Eagle
4/5 (2)
Data Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next (English Edition)
From Everand
Data Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next (English Edition)
Dr. Gypsy Nandi
No ratings yet
Image Retrieval: Unlocking the Power of Visual Data
From Everand
Image Retrieval: Unlocking the Power of Visual Data
Fouad Sabry
No ratings yet
Data Science, AI, and Blockchain: Integrated Approaches
From Everand
Data Science, AI, and Blockchain: Integrated Approaches
Ekaaksh Deshpande
No ratings yet
Comprehensive Guide to Implementing Data Science and Analytics: Tips, Recommendations, and Strategies for Success
From Everand
Comprehensive Guide to Implementing Data Science and Analytics: Tips, Recommendations, and Strategies for Success
Rick Spair
No ratings yet
The Data Whisperer - Making Sense of Big Data
From Everand
The Data Whisperer - Making Sense of Big Data
Keaton Rivers
No ratings yet
Open Source AI
From Everand
Open Source AI
Adam Smith
No ratings yet
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
From Everand
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
daniel Huston
No ratings yet
Image Retrieval: Fundamentals and Applications
From Everand
Image Retrieval: Fundamentals and Applications
Fouad Sabry
No ratings yet
Megaproject Organization and Performance: The Myth and Political Reality
From Everand
Megaproject Organization and Performance: The Myth and Political Reality
Nuno Gil
No ratings yet
The Practical Guide for OSINT Reasearch
100% (1)
The Practical Guide for OSINT Reasearch
67 pages
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
IoT Data Analytics using Python: Learn how to use Python to collect, analyze, and visualize IoT data (English Edition)
From Everand
IoT Data Analytics using Python: Learn how to use Python to collect, analyze, and visualize IoT data (English Edition)
M S Hariharan
No ratings yet
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
Big Data: Revolutionizing the Future
From Everand
Big Data: Revolutionizing the Future
Parvati Mishra
No ratings yet
Workflow Management: Models, Methods, and Systems
From Everand
Workflow Management: Models, Methods, and Systems
Kees Van Hee
3.5/5 (11)
Unveiling Insights: Mastering Data Mining and Knowledge Discovery in the Digital Age: O6.0 TRANSFORM DATA
From Everand
Unveiling Insights: Mastering Data Mining and Knowledge Discovery in the Digital Age: O6.0 TRANSFORM DATA
Elizabeth Mogopodi
No ratings yet
From Data to Decisions: A Practical Guide to Implementing Modern Decision Intelligence
From Everand
From Data to Decisions: A Practical Guide to Implementing Modern Decision Intelligence
Raissa Gomez
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Business Models in Emerging Technologies: Data Science, AI, and Blockchain
From Everand
Business Models in Emerging Technologies: Data Science, AI, and Blockchain
Stylianos Kampakis
No ratings yet
Capitalizing Data Science: A Guide to Unlocking the Power of Data for Your Business and Products (English Edition)
From Everand
Capitalizing Data Science: A Guide to Unlocking the Power of Data for Your Business and Products (English Edition)
Mathangi Sri Ramachandran
No ratings yet
Crash Course Big Data
From Everand
Crash Course Big Data
IntroBooks Team
No ratings yet
Enterprise Data Science: Smarter Decisions with Big Data
From Everand
Enterprise Data Science: Smarter Decisions with Big Data
Vidhur Gupta
No ratings yet
Graph Data Science with Python and Neo4j
From Everand
Graph Data Science with Python and Neo4j
Timothy Eastridge
No ratings yet
Graph Data Science with Python and Neo4j: Hands-on Projects on Python and Neo4j Integration for Data Visualization and Analysis Using Graph Data Science for Building Enterprise Strategies (English Edition)
From Everand
Graph Data Science with Python and Neo4j: Hands-on Projects on Python and Neo4j Integration for Data Visualization and Analysis Using Graph Data Science for Building Enterprise Strategies (English Edition)
Timothy Eastridge
No ratings yet
Applied Sciences: Open-Source Intelligence Educational Resources: A Visual Perspective Analysis
No ratings yet
Applied Sciences: Open-Source Intelligence Educational Resources: A Visual Perspective Analysis
25 pages
Open Source Intelligence
No ratings yet
Open Source Intelligence
17 pages
World Intellectual Property Organization
No ratings yet
World Intellectual Property Organization
11 pages
Branson 1965 PDF
No ratings yet
Branson 1965 PDF
16 pages
Hubei Heqiang Machinery HQ09B-300A
No ratings yet
Hubei Heqiang Machinery HQ09B-300A
2 pages
Introduction To Micro Economics
100% (1)
Introduction To Micro Economics
32 pages
Susan Farrell v. Planters Lifesavers Company Nabisco, Inc, 206 F.3d 271, 3rd Cir. (2000)
No ratings yet
Susan Farrell v. Planters Lifesavers Company Nabisco, Inc, 206 F.3d 271, 3rd Cir. (2000)
22 pages
Workshop Practice
No ratings yet
Workshop Practice
2 pages
BAE ACV
No ratings yet
BAE ACV
2 pages
What Is A Resort History of Resorts
No ratings yet
What Is A Resort History of Resorts
10 pages
Ruban Resume
No ratings yet
Ruban Resume
3 pages
Ibep Project 3 Sourabh Kumar
No ratings yet
Ibep Project 3 Sourabh Kumar
6 pages
Power Dynamics in Shakespeare's The Tempest
No ratings yet
Power Dynamics in Shakespeare's The Tempest
5 pages
HAEDJG 121 Faithful - Friends PDF
No ratings yet
HAEDJG 121 Faithful - Friends PDF
44 pages
Trip To Bromo Mountain
No ratings yet
Trip To Bromo Mountain
2 pages
Monthly Leads
No ratings yet
Monthly Leads
15 pages
8041 Product Life Cycle Powerpoint
No ratings yet
8041 Product Life Cycle Powerpoint
20 pages
CAT 2002 Question Paper by Cracku
No ratings yet
CAT 2002 Question Paper by Cracku
84 pages
A Leaders Guide To Manufacturing 4 0 - Ar - Ddi PDF
No ratings yet
A Leaders Guide To Manufacturing 4 0 - Ar - Ddi PDF
44 pages
Session 1
No ratings yet
Session 1
10 pages
Cadbury World Case Study
100% (1)
Cadbury World Case Study
88 pages
Damodaram Sanjivayya National Law University Visakhapatnam, Andhra Pradesh, India
No ratings yet
Damodaram Sanjivayya National Law University Visakhapatnam, Andhra Pradesh, India
6 pages
Florendo vs. Philam Plans
No ratings yet
Florendo vs. Philam Plans
1 page
Method of Statement For Pile Foundation
No ratings yet
Method of Statement For Pile Foundation
8 pages
Din 1683
No ratings yet
Din 1683
2 pages
Development of The Process of Electric Steel Production and Methods For Improving The Technicalandeconomic Indices of Electric Arc Furnaces
No ratings yet
Development of The Process of Electric Steel Production and Methods For Improving The Technicalandeconomic Indices of Electric Arc Furnaces
5 pages
98-2000 Flash Point
No ratings yet
98-2000 Flash Point
44 pages
Food and Beverage Services - Final
No ratings yet
Food and Beverage Services - Final
35 pages
Bid Application For Olympic Games
No ratings yet
Bid Application For Olympic Games
133 pages

Paperaccepted ICACDS2020

Uploaded by

Paperaccepted ICACDS2020

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Open Source Intelligence Initiating Efﬁcient Investigation and Reliable Web

Conference Paper · July 2020

Bipin Kumar Rai

Open Source Intelligence View project

PcPbEHR Healthcare solution View project

The user has requested enhancement of the downloaded file.

Shiva Tiwari1, Ravi Verma1, Janvi Jaiswal1, Bipin Kumar Rai2

Abstract. Open Source Intelligence (OSINT) is the collection and processing of

Keywords: Open Source Intelligence (OSINT), Machine Learning, AI,

J. Pastor-Galindo, P. Nespoli, F. Gómez Mármol and G. Martínez Pérez, [4] in "The

Fig. 1. Technological design of the solution.

3.1. Global Search (GS)

In this part, a single keyword is considered as an input parameter which is generally

OSIGS-Extraction algorithm takes input of JSONs and provides output that as

GS-TF(t,d) = ft,d /(no. of words in d)

where, f = frequency of letter in the document.

Explanation of this algo:

3.2. Concepts and Technologies Used

3.2.1. High Level Design

It is an architecture that is used for software application development. The

Fig. 3. Crawler flow and Extractor Flow.

Fig. 4. Kafka architecture

Kafka is made up of Records, Topics, Consumers, Producers, Brokers, Logs and

3.2.3. Consistent Hashing

Consistent Hashing is a distributed hashing scheme that operates in a distributed hash

3.2.4. Data Set and Training of the model

3.2.6. Using Spring Boot to create Micro Service

3.2.7. Selenium for Crawling

3.2.8. Spring Reactive

Reactive programming is indeed a programming model that advocates a data

3.2.9. Load Balancer

Load balancing [21] improves workload distribution across multiple computer

3.2.11. Face Recognition using Amazon’s ReKognition boto3 API

View publication stats

You might also like