0% found this document useful (0 votes)
30 views

Mini Combined Report

Uploaded by

vedant.r1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Mini Combined Report

Uploaded by

vedant.r1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Seek AI

SY Mini Project Report


Submitted in partial fulfillment of the requirements

of the Subject Project Based Learning: Mini Project Lab-I (Sem III) by

Tanay Desai
Jay Dhake
Pravir Dighe
Mitansh Gala

Supervisor
Dr . Nilesh Yadav

Department of Computer Engineering

K J Somaiya Institute of Technology


Ayurvihar, Sion Mumbai-400022 2024-
25
CERTIFICATE

This is to certify that the project entitled “Seek AI ” is bonafide work Tanay Desai, Jay
Dhake, Pravir Dighe, Mitansh Gala submitted as a SY Sem III Mini project in the subject
of Project Based Learning: Mini Project Lab-I, Computer Engineering for the academic
year 2024-25.

Dr . Nilesh Yadav
Project Guide
Department of Computer Engineering

Dr. Sarita Ambadekar Dr. Vivek Sunnapwar


Principal, KJSIT
Head of Department
Dept. of Computer Engineering

Place : Sion, Mumbai 400022


Date :
i
PROJECT APPROVAL FOR S. Y.

This project report entitled “Seek AI”

Tanay Desai –A/13


Jay Dhake – A/16
Pravir Dighe – A/18
Mitansh Gala – A/21

is an approved Second Year Mini Project Semester III in Computer


Engineering.

EXAMINER:

1.
External Examiner Name and Sign

2.
Internal Examiner Name and Sign

ii
DECLARATION

We declare that this written submission represents our ideas in our own words and where
others' ideas or words have been included, we have adequately cited and referenced the original
sources. We also declare that we have adhered to all principles of academic honesty and integrity
and have not misrepresented or fabricated or falsified any idea/data/fact/source in our submission.
We understand that any violation of the above will be cause for disciplinary action by the Institute
and can also evoke penal action from the sources which have thus not been properly cited or from
whom proper permission has not been taken when needed.

Tanay Desai

Jay Dhake

Pravir Dighe

Mitansh Gala

Date :

iii
ACKNOWLEDGEMENT

Before presenting our SY mini project work entitled “Seek AI ”, we would like to convey our
sincere thanks to the people who guided us throughout the course for this project work.

First,We would like to express our immense gratitude towards our Project Guide Vaishali Patil
for the constant encouragement, support, guidance, and mentoring at the ongoing stages of the
project and report.

We would like to express our sincere thanks to our H.O.D Dr. Sarita Ambadekar for the
encouragement, co-operation, and suggestions progressing stages of the report.

We would like to express our sincere thanks to our beloved Principal Dr. Vivek Sunnapwar
for providing various facilities to carry out this project.

Finally, we would like to thank all the teaching and non-teaching staff of the college, and our
friends, for their moral support rendered during the course of the reported work, and for their
direct and indirect involvement in the completion of our report work, which made our endeavor
fruitful.

Place : Sion, Mumbai-400022

Date :

iv
ABSTRACT

In this project, we develop the "Seek AI" project which is an initiative aimed at
fostering better integration between senior students and fresher’s on university campuses.
This report presents an overview of the project, its objectives, methodology, key findings,
recommendations, and a conclusion, including the additional suggestion of facilitating the
sale of important items from seniors to freshers at reduced prices.

The project employed a mixed-methods approach, including surveys, focus groups, and
interviews with students and campus staff. The data collection and analysis methods helped
in gaining insights into the existing gaps and preferences of students regarding interaction,
mentorship, and the economical access to essential items.

The majority of abstracts are informative. While they still do not critique or evaluate a
work, they do more than describe it. A good informative abstract acts as a surrogate for
the work itself. That is, the researcher presents and explains all the main arguments and
the important results and evidence in the paper. An informative abstract includes the
information that can be found in a descriptive abstract [purpose, methods, scope] but it
also includes the results and conclusions of the research and the recommendations of
the author. The length varies according to discipline, but an informative abstract is
usually no more than 300 words in length.

v
CONTENTS
Chapter Page
TITLE
No. no.
LIST OF FIGURES viii
LIST OF TABLES viii

1 INTRODUCTION 1
1.1 Problem Definition 1
1.2 Aim and Objective 1
1.3 Organization of the Report 2
2 REVIEW OF LITERATURE 3
2.1 Literature Survey 3
2.2 Summarized Findings 4
3 REQUIREMENT SPECIFICATION 5
3.1 Introduction 5
3.2 Hardware requirements 5
3.3 Software requirements 6
3.4 Feasibility Study 6
3.5 Cost Estimation 7
4 PROJECT ANALYSIS & DESIGN 8
4.1 Introduction 8
4.2 Architecture of Project 9
4.3 Timeline Chart 12
5 METHODOLOGY 14
5.1 Introduction 14
5.2 14
6 IMPLEMENTATION DETAILS & Results 16
6.1 Introduction 16
6.2 System implementation(Screenshot with detail description) 17

vi
7 CONCLUSION & FUTURE SCOPE 20
REFERENCES 21
PLAGIARISM REPORT 22

vii
LIST OF FIGURES

Figure No. Title Page No.


1. Project-timeline chart
2. System Architecture
3.1 Chat-Interaction UI
3.2 Similar Papers,Citations
3.3 Seek AI UI

viii
CHAPTER 1
INTRODUCTION

1.1 Problem Definition :

Academic research often involves reading, reviewing, and analyzing a large number of
research papers. These papers are typically long, detailed, and complex, making it time-
consuming for researchers, students, and professionals to extract key insights quickly. The
sheer volume of literature can overwhelm users, causing delays in the research process
and potential missed insights. This inefficiency hampers productivity, slows decision-
making, and hinders the ability to stay updated with the latest developments across
multiple disciplines.

Moreover, the technical and specialized language used in academic papers creates barriers
for non-specialists, making it difficult for interdisciplinary collaboration and broader
dissemination of knowledge. This complexity limits access to valuable information for
those outside of a specific field, further slowing innovation and cross-disciplinary
research.

Current manual methods of reviewing papers are prone to human bias and inconsistency,
leading to potential errors in extracting the most relevant data. There is a growing need
for a tool that can streamline the review process, improve the accuracy of information
extraction, and ensure unbiased summarization.

1.2 Aim and Objective :

The aim of this project is to develop an AI-powered summarization tool that enhances the
efficiency and productivity of academic research by providing concise, accurate, and
unbiased summaries of research papers. This tool addresses the challenges researchers,
students, and professionals face when managing large volumes of complex literature. By
quickly extracting critical insights, users can focus on deeper analysis and decision-
making without spending excessive time reviewing lengthy papers. Additionally, the tool
will simplify complex academic language, making research more accessible to non-
specialists and promoting interdisciplinary collaboration.

Another key objective is to ensure the consistency and accuracy of summaries, reducing
the risk of human error. The tool will generate unbiased, standardized summaries across
multiple papers, helping researchers conduct efficient literature reviews. By improving
knowledge management, it will prevent users from missing important insights amidst the
overwhelming amount of literature. The tool also aims to support broader knowledge
dissemination by simplifying research content for diverse users, fostering collaboration,
and promoting faster innovation. Ultimately, it will enable timely access to critical
insights, supporting informed decision-making and advancing research productivity.
1.3 Organization of the Report :

The organization of the report for the AI-Powered Research Paper Summarization Tool
project is structured as follows:

The report begins with the Project Title page, stating the title "AI-Powered Research
Paper Summarization Tool," submitted in partial fulfillment of the requirements for the
subject Project Based Learning: Mini Project Lab-I (Sem III), by (Student Names), under
the supervision of the project guide (Guide Name), from the Department of Computer
Engineering, (Institution Name), for the academic year 2024-25.

Following this, the Certificate section certifies that the project is the original work of the
students and has been submitted in fulfillment of the requirements of the Mini Project
course for the academic year. Next, the Declaration section is where the students affirm
that the work is their original contribution, and all references are duly cited.

The Acknowledgement section comes next, expressing gratitude to the project guide,
Head of Department, principal, and other individuals who provided support throughout
the project. This is followed by the Abstract, a concise summary of the project,
highlighting the problem it addresses, its objectives, methodology, and key results,
typically around 250-300 words.

The Table of Contents lists all the chapters and subsections with page numbers, along
with separate lists of figures and tables. The report then moves into Chapter 1:
Introduction, which covers the Problem Definition, explaining the challenges in
summarizing vast amounts of research literature, followed by the Aim and Objectives of
the project, which focus on enhancing efficiency, accessibility, and interdisciplinary
collaboration. The Organization of the Report briefly outlines the structure of the report.

Chapter 2: Literature Review provides a survey of existing AI-based summarization tools


and Natural Language Processing (NLP) models, followed by Summarized Findings that
highlight gaps in current tools and technologies, supporting the need for this project.
Chapter 3: Requirement Specification outlines the project’s hardware and software
requirements, discusses the feasibility study, and provides an estimate of the costs
involved.

In Chapter 4: Project Analysis and Design, the project’s architecture is explained using
system diagrams, and a Timeline Chart details the key milestones and phases of the
project. Chapter 5: Methodology dives into the technical development of the AI tool,
including steps such as the implementation of NLP models and the construction of a
semantic search engine based on cosine similarity for matching user queries with relevant
research content.
Chapter 6: Implementation and Results presents screenshots of the working system,
explaining how the tool functions, along with a discussion of the results in terms of time
savings, accuracy, and the consistency of the summaries generated by the tool. Chapter 7:
Conclusion and Future Scope wraps up the report, summarizing the project’s impact on
improving research efficiency and proposing possible future improvements, such as
expanding the tool’s capabilities or refining the AI models.

The report concludes with a References section, listing all cited sources, followed by an
Appendix that includes additional materials like flowcharts, code snippets, or test data
used in the project. This structured approach ensures a comprehensive presentation of the
AI-powered summarization tool project.

CHAPTER 2

REVIEW OF LITERATURE

2.1 Literature Survey :

1. Introduction to AI in Research
The growth of academic literature has presented a challenge for researchers, with
increasing volumes of papers being published across diverse fields. Traditional methods
of manually reviewing research papers are not only time-consuming but also prone to bias
and human error. As a result, the need for automated solutions that help streamline the
literature review process has emerged, leading to the development of AI-based
summarization tools. These tools are powered by advances in Natural Language
Processing (NLP), enabling them to generate concise, accurate summaries that improve
accessibility and productivity.

2. NLP and Transformer Models


AI-driven summarization tools leverage transformer models like BERT and GPT to
understand the context and extract key points from large texts. Transformer architectures
allow models to process long-range dependencies, capturing both syntactic and semantic
aspects of academic language. Notable works such as Sentence-BERT (SBERT) by
Reimers and Gurevych (2019) introduced sentence embeddings for better contextual
similarity, which form the foundation for AI-based semantic search and summarization.
OpenAI’s GPT models have also gained prominence by providing coherent summaries
and contextual responses, further refining the process of automated academic
summarization.

3. Semantic Search and Retrieval-Augmented Generation


AI tools do not merely generate summaries; they enhance them through semantic search
and Retrieval-Augmented Generation (RAG), a framework that combines NLP with
search algorithms to deliver context-aware summaries. IBM Research (2023) highlights
RAG's ability to bridge traditional search with advanced language models, enabling more
precise answers by retrieving relevant documents based on user queries. This integration
ensures that summaries generated align with the user’s specific needs and queries, a key
feature in academic research tools.

4. Accessibility and Interdisciplinary Collaboration


One of the challenges in research is that findings are often written in highly specialized
language, limiting accessibility for non-experts. AI-powered tools address this by
simplifying complex academic language, making research findings more understandable
to a broader audience. Google’s NotebookLM, for example, provides note-taking tools
integrated with LLMs, enabling users to explore summaries with greater ease and
facilitating interdisciplinary collaboration. By lowering the barrier to access, AI
summarization tools foster knowledge exchange across fields, enhancing the impact of
research.

5. Applications of AI Summarization in Academia and Beyond


The use of summarization tools is becoming widespread in academic and professional
domains. Students benefit from quick summaries for literature reviews, while researchers
use them to stay updated on developments across disciplines. In professional settings,
industries utilize these tools to extract insights from reports, saving time and improving
decision-making. Additionally, semantic search capabilities integrated with these tools
enable users to retrieve relevant literature efficiently, which is crucial in time-sensitive
scenarios such as patent searches and systematic reviews.

6. Summary Tools and Platform Integration


Most AI summarization solutions feature web-based platforms for ease of access. The
proposed AI summarization tool in this study utilizes Next.js and CSS for a user-friendly
interface, with backend integration relying on OpenAI’s API for summary generation. A
cosine similarity-based search engine ensures that user queries yield relevant research
sections, enhancing accuracy and usability. Additionally, platforms like arXiv provide rich
data sources for these tools, helping build robust knowledge bases that improve
summarization quality.

7. Challenges and Future Directions


Although AI-based summarization tools offer numerous advantages, certain limitations
exist. Models trained on general corpora may struggle with domain-specific terminology,
requiring further fine-tuning. Moreover, ethical concerns arise regarding the reliability of
automatically generated content, especially when used in high-stakes decision-making
scenarios. Future research could focus on developing domain-specific models that
improve accuracy in specialized fields, such as medical or legal research. Another
promising direction is enhancing interpretability in summarization models to ensure that
the generated summaries align with user expectations and eliminate potential bias.

This survey highlights how AI-powered summarization tools address critical pain points
in academic research by improving time efficiency, accessibility, and collaboration.
Through advanced NLP models, semantic search, and RAG frameworks, these tools
simplify complex information and make it more accessible to researchers, students, and
professionals. Integration with platforms like arXiv and seamless web-based designs
ensure these tools are practical and easy to use, paving the way for future innovations in
automated summarization.

2.2 Summarized Findings :

1. Need for AI in Academic Research


- Managing the large volume of research papers is challenging.
- Traditional manual reviews are time-consuming, error-prone, and may introduce bias.

2. AI and NLP Models for Summarization


- Tools use advanced NLP models like BERT, SBERT, and GPT to extract key insights.
- Sentence-BERT (SBERT) improves semantic search accuracy using sentence-level
embeddings.

3. Semantic Search and Retrieval-Augmented Generation (RAG)


- RAG frameworks bridge search algorithms with language models to provide more
context-aware summaries.
- AI tools use cosine similarity to match user queries with relevant sections of research
papers.

4. Improving Accessibility and Collaboration


- AI tools simplify complex academic language, making research more understandable
for non-specialists.
- Enhanced accessibility fosters interdisciplinary collaboration by lowering barriers to
entry across fields.

5. Applications and Platform Integration


- AI summarization tools benefit students, researchers, and professionals by saving time
and providing relevant insights.
- Web-based tools (e.g., Next.js) offer user-friendly interfaces with integrated APIs like
OpenAI’s GPT.

6. Challenges and Future Directions


- Current tools may struggle with domain-specific terminology and require **further
fine-tuning**.
- Ethical concerns about bias and the reliability of automated summaries need to be
addressed.
- Future efforts should focus on specialized models and improving model
interpretability to ensure alignment with user expectations.

CHAPTER 3

REQUIREMENT SPECIFICATION

3.1 Introduction :
Managing the growing volume of academic literature is time-consuming and often leads
to missed insights. This project aims to develop an AI-powered summarization tool to
generate concise, accurate summaries of research papers, enhancing productivity and
accessibility. By leveraging NLP models like OpenAI’s GPT and SBERT-based semantic
search, the tool will extract key points efficiently and ensure unbiased results.

The system will feature a web-based interface using Next.js for seamless interaction,
allowing users to input queries and retrieve relevant summaries quickly. It will simplify
complex academic language, making research more accessible to non-specialists and
promoting interdisciplinary collaboration. This solution will help researchers and students
stay updated across fields, focus on deeper analysis, and accelerate innovation by
streamlining the literature review process.

3.2 Hardware Requirements :

 Server Requirements

Processor: Intel Core i7 or AMD Ryzen 7 (or higher)


RAM: Minimum 16 GB (32 GB recommended for high performance)
Storage: 512 GB SSD or higher for fast data retrieval and caching
GPU (Optional): NVIDIA RTX 3060 or higher for accelerated NLP model inference
Network: High-speed internet connection (1 Gbps or above) for API integration and
data fetching

 Client System (User Interface)

Processor: Intel Core i5 or equivalent


RAM: 8 GB or higher
Storage: 256 GB SSD (for local storage and caching)
Display: 1080p resolution or higher for optimal UI experience
Network: Stable internet connection for accessing the web-based interface

 Cloud Resources

Cloud GPU Instances: For scaling NLP model inference (e.g., AWS, Azure)
Storage: Cloud database or vector database (e.g., Pinecone) for semantic search
indexing.

3.3 Software Requirements :

 Frontend (User Interface)

Next.js: For building the web-based interface and server-side rendering


CSS: For styling the user interface
React.js: Core library for developing the frontend components
 Backend and API Integration

OpenAI API: For generating summaries using NLP models like GPT
Node.js: Backend runtime environment to handle API requests
Express.js: Web framework for managing routes and API calls

 Database and Search Engine

Vector Database : For semantic search indexing


Cosine Similarity Algorithms: To match user queries with relevant research content

 Development Tools and Platform

Visual Studio Code: Code editor for development


Git/GitHub: Version control and collaboration
Postman: For testing API endpoints

 Hosting and Deployment

Vercel/Netlify: Hosting for frontend applications


AWS/Azure: Optional for backend processing and scalable infrastructure

3.4 Feasibility Study :

1. Technical Feasibility
a. Technology Availability: The project utilizes proven technologies such as
Next.js, OpenAI API, and vector search engines (e.g., Pinecone).
b. Infrastructure Requirements: The solution requires moderate server resources
(CPU, GPU) and cloud integration for hosting and API access, which are
readily available.
c. Development Complexity: Implementing the frontend with React and the
backend with Node.js is feasible for developers familiar with web
technologies and NLP APIs.

2. Economic Feasibility
a. Development Costs: The use of free tools (like Visual Studio Code) and open-
source frameworks (Next.js) minimizes expenses.
b. API and Hosting Costs: There may be subscription fees for OpenAI API and
cloud hosting services (e.g., AWS, Vercel). These costs are manageable within
a moderate project budget.

3. Operational Feasibility
a. Ease of Use: The web-based interface ensures that users can easily interact
with the tool.
b. Maintenance and Updates: Regular maintenance is required for updating APIs
and improving summarization accuracy, but it is manageable with a small
team.

4. Legal Feasibility
a. Data Privacy: The tool must comply with data privacy regulations, ensuring
that no personal data is stored or mishandled.
b. API Usage Policies: The project must follow the usage policies of APIs (like
OpenAI) to avoid misuse or limitations.

3.5 Cost Estimation :

The estimated cost for developing and maintaining the AI-powered summarization tool
includes API usage fees, with OpenAI API costing approximately $100 to $500 per month
based on usage. Cloud hosting through platforms like Vercel or Netlify may range from
free tiers to $20–$50 per month, with optional backend hosting on AWS or Azure adding
$50–$100 monthly. For semantic search, Pinecone offers a free tier, but advanced usage
could cost $100 or more per month. Development tools such as Visual Studio Code and
GitHub are free, though some premium tools may incur minor expenses. Annual
maintenance, including updates and optimizations, is estimated at $500 to $1,000. In
total, the project is expected to cost between $1,500 and $5,000 annually, depending on
resource usage and scaling requirements.

CHAPTER 4
PROJECT ANALYSIS & DESIGN

4
4.1 Introduction :

The development of an AI-powered summarization tool requires careful analysis and


design to ensure that the solution meets user needs effectively and functions seamlessly.
Project analysis involves identifying key functional and non-functional requirements,
such as fast and accurate summarization, user-friendly interfaces, and reliable backend
operations. This phase also addresses potential challenges, such as handling diverse
academic content, maintaining data privacy, and ensuring scalability for high-demand
scenarios.

The design phase focuses on creating a structured architecture that integrates the
frontend, backend, and API services. Key components include the Next.js frontend for
smooth user interactions, Node.js backend for managing API requests, and the OpenAI
API for NLP-based summarization. Additionally, the system design incorporates
semantic search capabilities to enhance query relevance, ensuring that the tool delivers
efficient and meaningful results to users.
4.2 Architecture of Project :

The project Seek AI revolves around building an AI-powered summarization tool for
research papers, leveraging Natural Language Processing (NLP) techniques and large
language models like OpenAI's GPT API. Here’s an overview of the architecture as
described:
1. Frontend (User Interface):
o Web-based Application: Built using Next.js and CSS, providing a user-
friendly interface for inputting queries and receiving summaries.
2. Backend:
o NLP Integration: The core functionality involves OpenAI’s API, which is
responsible for summarizing research papers by extracting essential points and
methodologies.
o Semantic Search Engine: This uses cosine similarity for matching user
queries with relevant sections of the research papers. It ensures that the most
contextually accurate information is retrieved based on the user’s request.
3. Data Processing Flow:
o Query Input and Embedding: User inputs a query, which is then converted
into a numerical vector (embedding) using an embedding model.
o Context Retrieval: The system searches through a vector database
containing precomputed document embeddings. This database is built from
academic sources like arXiv, and relevant context is retrieved using similarity
metrics.
o Response Generation: The retrieved context is combined with the user query
into a prompt. This prompt is processed by a large language model (LLM) to
generate a precise and context-aware response.
4. Data Sources:
o The system sources research papers through web scraping platforms such as
arXiv.
o
4.3 Timeline Chart :

The project timeline for Seek AI is organized into several key phases:

First, the Planning & Design phase takes 3 weeks. It includes project kickoff,
architecture design, and technology stack setup. Once the foundation is set, the Backend
Development begins, spanning 5 weeks. This phase involves integrating the GPT API for
summarization, setting up the embedding model, and building a vector database for
context retrieval.

Next, the Frontend Development takes 3 weeks, during which the UI/UX is designed,
and the frontend is implemented using Next.js to allow user interaction with the system.
In parallel, the Data Sourcing & Integration phase runs for 4 weeks. This includes
scraping academic papers from platforms like arXiv and connecting the data with the
backend.
The Testing & Optimization phase lasts 4 weeks, ensuring the system’s reliability
through unit and functional tests. During this time, the NLP models are fine-tuned for
better summarization accuracy. After testing, the Deployment & Documentation phase
takes 2 weeks. The application is deployed to the cloud, and technical/user documentation
is prepared.

Fig.1 : Project-timeline chart

CHAPTER 5

METHODOLOGY

5
5.1 Introduction :

The Seek AI project aims to streamline the process of understanding complex academic
research papers through AI-powered summarization. It uses advanced Natural Language
Processing (NLP) models to generate concise and accurate summaries, allowing users to
quickly grasp key insights, methodologies, and outcomes. The tool is designed for
researchers, students, and professionals who need to manage large volumes of academic
literature efficiently. By leveraging AI to eliminate human bias and simplify technical
language, Seek AI enhances productivity, improves accessibility for non-specialists, and
fosters interdisciplinary collaboration, ultimately accelerating the research process and
innovation.

5.2 Key Components :


1. Query Input and Embedding: The system begins with user input in the form of a
query. This query is transformed into a numerical vector or embedding, which
captures its semantic meaning. The embedding process involves using an NLP model
that represents the user’s input in a way that retains its contextual relevance, making it
suitable for comparison with the database of research papers.
2. Context Retrieval: Once the query is embedded, the system searches a vector
database containing precomputed document embeddings. These embeddings
represent the content of research papers, which have been processed and stored in
advance. By comparing the user query’s embedding with the stored document
embeddings, the system identifies the most contextually relevant sections of papers,
using similarity metrics such as cosine similarity to ensure accuracy in matching the
query with the corresponding text.
3. Response Generation: After retrieving the relevant sections, the system combines
this context with the user query to generate a comprehensive response. This process
involves creating a prompt that incorporates both the query and the retrieved content,
which is then passed to a large language model (LLM), such as OpenAI’s GPT API.
The LLM processes this combined input to produce a coherent and accurate response,
summarizing the essential points from the relevant research papers.
4. AI Summarization: The core feature of Seek AI is its ability to generate concise
summaries of research papers using the integrated LLM. The model identifies key
insights, methodologies, and outcomes, presenting them in a simplified manner that is
easy to understand for both specialists and non-specialists. This approach not only
saves time but also ensures consistency and eliminates human bias.
5. User Interface: The tool is built with a web-based application using Next.js and
CSS, offering a user-friendly interface. Users can input queries, view results, and
navigate through summaries seamlessly, enhancing the overall experience.
6. Data Sourcing: Research papers are sourced from platforms such as arXiv, where
relevant academic literature is scraped and fed into the system’s database for
processing.
Fig.2 : System Architecture

CHAPTER 6

IMPLEMENTATION DETAILS & RESULTS

6.1 Introduction :

The implementation of Seek AI focuses on integrating state-of-the-art technologies to


build an efficient and user-friendly summarization tool. The project leverages advanced
Natural Language Processing (NLP) models, including OpenAI’s GPT API, alongside
a robust backend for processing academic literature. The frontend is designed using
Next.js to provide an intuitive user interface, enabling easy query input and summary
retrieval. Additionally, a vector database and cosine similarity algorithms ensure
accurate context matching between user queries and research paper content. The
implementation results demonstrate the system’s ability to generate consistent, accurate,
and unbiased summaries, significantly enhancing research productivity.

6.2 System Implementation :

Frontend Development: The user interface is built using Next.js and CSS, offering a
seamless experience for users to input queries and receive summaries. The frontend is
designed to be intuitive, allowing researchers, students, and professionals to interact with
the system easily. This web-based application provides users with fields to enter queries
and displays the generated summaries in a readable format.

Backend Development: The backend of the system is responsible for handling requests,
processing research papers, and managing summarization tasks. It integrates with
OpenAI’s GPT API, which powers the core summarization functionality. The backend
receives user queries, processes them through the NLP model, and sends back the
summarized output. A key aspect of the backend is the integration of a vector database
that stores precomputed embeddings of research papers for efficient retrieval.

Embedding and Semantic Search: When a user submits a query, the system converts
the query into an embedding using an NLP model. This embedding captures the semantic
meaning of the query, allowing for accurate context matching. The system then uses
cosine similarity to search the vector database and retrieve relevant sections of research
papers based on their similarity to the query. This step ensures that the most contextually
appropriate information is selected.

Summarization Process: After retrieving the relevant content, the system passes this
data along with the user’s query to a large language model (LLM), such as GPT. The
LLM then generates a concise and accurate summary that includes the key insights,
methodologies, and outcomes from the papers.

Fig.3.1 : Chat-Interaction UI
Fig.3.2 : Similar Papers,Citations

Fig.3.3 : Seek AI UI
CHAPTER 7

CONCLUSION & FUTURE SCOPE

7.1 Conclusion :
The Seek AI project presents an innovative solution to the challenges faced by researchers,
students, and professionals in managing and comprehending vast amounts of academic
literature. By leveraging advanced Natural Language Processing (NLP) models and
integrating state-of-the-art technologies, Seek AI effectively simplifies and accelerates the
process of extracting key insights, methodologies, and conclusions from complex research
papers.
The core of the system is its ability to generate concise, unbiased, and accurate summaries
using OpenAI’s GPT API, enabling users to quickly grasp essential information without
manually sifting through lengthy papers. The integration of a semantic search engine
powered by cosine similarity ensures that user queries are matched with the most relevant
and contextually appropriate sections of the academic content. This not only enhances the
precision of the summarization process but also improves the overall user experience by
providing more targeted and relevant summaries.
The web-based interface built with Next.js offers a seamless user experience, making the
tool accessible and easy to use. Users can input queries, retrieve results, and navigate through
summaries efficiently. The backend’s robust architecture, which includes a vector database
for storing document embeddings, ensures the system can handle large volumes of data while
maintaining performance and accuracy.
Overall, Seek AI addresses the critical need for efficiency and accessibility in academic
research by automating the summarization process. It enables users to focus on deeper
analysis and innovation, rather than being bogged down by the time-consuming task of
reading entire papers. By streamlining knowledge extraction and eliminating human bias,
Seek AI promotes interdisciplinary collaboration and fosters broader dissemination of
research findings, ultimately contributing to faster decision-making and advancements in
various fields of study.

REFERENCES :

 Google. (n.d.). NotebookLM: AI-powered note-taking tool. Retrieved October 16,


2024, from https://notebooklm.google
 Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using
Siamese BERT-Networks. arXiv preprint. Retrieved from
https://arxiv.org/abs/2005.11401v4
 IBM Research. (2023, March 1). Retrieval-Augmented Generation (RAG): Bridging
the Gap between Search and Language Models. Retrieved from
https://research.ibm.com/blog/retrieval-augmented-generation-RAG
 DataCamp. (2024). What is Retrieval-Augmented Generation (RAG)? Retrieved from
https://www.datacamp.com/blog/what-is-retrieval-augmented-generation-rag
 SBERT Semantic Search Tutorial. (n.d.). Applications of Semantic Search using
Sentence Transformers. Retrieved from
https://www.sbert.net/examples/applications/semantic-search/README.html

You might also like