Mini Combined Report
Mini Combined Report
of the Subject Project Based Learning: Mini Project Lab-I (Sem III) by
Tanay Desai
Jay Dhake
Pravir Dighe
Mitansh Gala
Supervisor
Dr . Nilesh Yadav
This is to certify that the project entitled “Seek AI ” is bonafide work Tanay Desai, Jay
Dhake, Pravir Dighe, Mitansh Gala submitted as a SY Sem III Mini project in the subject
of Project Based Learning: Mini Project Lab-I, Computer Engineering for the academic
year 2024-25.
Dr . Nilesh Yadav
Project Guide
Department of Computer Engineering
EXAMINER:
1.
External Examiner Name and Sign
2.
Internal Examiner Name and Sign
ii
DECLARATION
We declare that this written submission represents our ideas in our own words and where
others' ideas or words have been included, we have adequately cited and referenced the original
sources. We also declare that we have adhered to all principles of academic honesty and integrity
and have not misrepresented or fabricated or falsified any idea/data/fact/source in our submission.
We understand that any violation of the above will be cause for disciplinary action by the Institute
and can also evoke penal action from the sources which have thus not been properly cited or from
whom proper permission has not been taken when needed.
Tanay Desai
Jay Dhake
Pravir Dighe
Mitansh Gala
Date :
iii
ACKNOWLEDGEMENT
Before presenting our SY mini project work entitled “Seek AI ”, we would like to convey our
sincere thanks to the people who guided us throughout the course for this project work.
First,We would like to express our immense gratitude towards our Project Guide Vaishali Patil
for the constant encouragement, support, guidance, and mentoring at the ongoing stages of the
project and report.
We would like to express our sincere thanks to our H.O.D Dr. Sarita Ambadekar for the
encouragement, co-operation, and suggestions progressing stages of the report.
We would like to express our sincere thanks to our beloved Principal Dr. Vivek Sunnapwar
for providing various facilities to carry out this project.
Finally, we would like to thank all the teaching and non-teaching staff of the college, and our
friends, for their moral support rendered during the course of the reported work, and for their
direct and indirect involvement in the completion of our report work, which made our endeavor
fruitful.
Date :
iv
ABSTRACT
In this project, we develop the "Seek AI" project which is an initiative aimed at
fostering better integration between senior students and fresher’s on university campuses.
This report presents an overview of the project, its objectives, methodology, key findings,
recommendations, and a conclusion, including the additional suggestion of facilitating the
sale of important items from seniors to freshers at reduced prices.
The project employed a mixed-methods approach, including surveys, focus groups, and
interviews with students and campus staff. The data collection and analysis methods helped
in gaining insights into the existing gaps and preferences of students regarding interaction,
mentorship, and the economical access to essential items.
The majority of abstracts are informative. While they still do not critique or evaluate a
work, they do more than describe it. A good informative abstract acts as a surrogate for
the work itself. That is, the researcher presents and explains all the main arguments and
the important results and evidence in the paper. An informative abstract includes the
information that can be found in a descriptive abstract [purpose, methods, scope] but it
also includes the results and conclusions of the research and the recommendations of
the author. The length varies according to discipline, but an informative abstract is
usually no more than 300 words in length.
v
CONTENTS
Chapter Page
TITLE
No. no.
LIST OF FIGURES viii
LIST OF TABLES viii
1 INTRODUCTION 1
1.1 Problem Definition 1
1.2 Aim and Objective 1
1.3 Organization of the Report 2
2 REVIEW OF LITERATURE 3
2.1 Literature Survey 3
2.2 Summarized Findings 4
3 REQUIREMENT SPECIFICATION 5
3.1 Introduction 5
3.2 Hardware requirements 5
3.3 Software requirements 6
3.4 Feasibility Study 6
3.5 Cost Estimation 7
4 PROJECT ANALYSIS & DESIGN 8
4.1 Introduction 8
4.2 Architecture of Project 9
4.3 Timeline Chart 12
5 METHODOLOGY 14
5.1 Introduction 14
5.2 14
6 IMPLEMENTATION DETAILS & Results 16
6.1 Introduction 16
6.2 System implementation(Screenshot with detail description) 17
vi
7 CONCLUSION & FUTURE SCOPE 20
REFERENCES 21
PLAGIARISM REPORT 22
vii
LIST OF FIGURES
viii
CHAPTER 1
INTRODUCTION
Academic research often involves reading, reviewing, and analyzing a large number of
research papers. These papers are typically long, detailed, and complex, making it time-
consuming for researchers, students, and professionals to extract key insights quickly. The
sheer volume of literature can overwhelm users, causing delays in the research process
and potential missed insights. This inefficiency hampers productivity, slows decision-
making, and hinders the ability to stay updated with the latest developments across
multiple disciplines.
Moreover, the technical and specialized language used in academic papers creates barriers
for non-specialists, making it difficult for interdisciplinary collaboration and broader
dissemination of knowledge. This complexity limits access to valuable information for
those outside of a specific field, further slowing innovation and cross-disciplinary
research.
Current manual methods of reviewing papers are prone to human bias and inconsistency,
leading to potential errors in extracting the most relevant data. There is a growing need
for a tool that can streamline the review process, improve the accuracy of information
extraction, and ensure unbiased summarization.
The aim of this project is to develop an AI-powered summarization tool that enhances the
efficiency and productivity of academic research by providing concise, accurate, and
unbiased summaries of research papers. This tool addresses the challenges researchers,
students, and professionals face when managing large volumes of complex literature. By
quickly extracting critical insights, users can focus on deeper analysis and decision-
making without spending excessive time reviewing lengthy papers. Additionally, the tool
will simplify complex academic language, making research more accessible to non-
specialists and promoting interdisciplinary collaboration.
Another key objective is to ensure the consistency and accuracy of summaries, reducing
the risk of human error. The tool will generate unbiased, standardized summaries across
multiple papers, helping researchers conduct efficient literature reviews. By improving
knowledge management, it will prevent users from missing important insights amidst the
overwhelming amount of literature. The tool also aims to support broader knowledge
dissemination by simplifying research content for diverse users, fostering collaboration,
and promoting faster innovation. Ultimately, it will enable timely access to critical
insights, supporting informed decision-making and advancing research productivity.
1.3 Organization of the Report :
The organization of the report for the AI-Powered Research Paper Summarization Tool
project is structured as follows:
The report begins with the Project Title page, stating the title "AI-Powered Research
Paper Summarization Tool," submitted in partial fulfillment of the requirements for the
subject Project Based Learning: Mini Project Lab-I (Sem III), by (Student Names), under
the supervision of the project guide (Guide Name), from the Department of Computer
Engineering, (Institution Name), for the academic year 2024-25.
Following this, the Certificate section certifies that the project is the original work of the
students and has been submitted in fulfillment of the requirements of the Mini Project
course for the academic year. Next, the Declaration section is where the students affirm
that the work is their original contribution, and all references are duly cited.
The Acknowledgement section comes next, expressing gratitude to the project guide,
Head of Department, principal, and other individuals who provided support throughout
the project. This is followed by the Abstract, a concise summary of the project,
highlighting the problem it addresses, its objectives, methodology, and key results,
typically around 250-300 words.
The Table of Contents lists all the chapters and subsections with page numbers, along
with separate lists of figures and tables. The report then moves into Chapter 1:
Introduction, which covers the Problem Definition, explaining the challenges in
summarizing vast amounts of research literature, followed by the Aim and Objectives of
the project, which focus on enhancing efficiency, accessibility, and interdisciplinary
collaboration. The Organization of the Report briefly outlines the structure of the report.
In Chapter 4: Project Analysis and Design, the project’s architecture is explained using
system diagrams, and a Timeline Chart details the key milestones and phases of the
project. Chapter 5: Methodology dives into the technical development of the AI tool,
including steps such as the implementation of NLP models and the construction of a
semantic search engine based on cosine similarity for matching user queries with relevant
research content.
Chapter 6: Implementation and Results presents screenshots of the working system,
explaining how the tool functions, along with a discussion of the results in terms of time
savings, accuracy, and the consistency of the summaries generated by the tool. Chapter 7:
Conclusion and Future Scope wraps up the report, summarizing the project’s impact on
improving research efficiency and proposing possible future improvements, such as
expanding the tool’s capabilities or refining the AI models.
The report concludes with a References section, listing all cited sources, followed by an
Appendix that includes additional materials like flowcharts, code snippets, or test data
used in the project. This structured approach ensures a comprehensive presentation of the
AI-powered summarization tool project.
CHAPTER 2
REVIEW OF LITERATURE
1. Introduction to AI in Research
The growth of academic literature has presented a challenge for researchers, with
increasing volumes of papers being published across diverse fields. Traditional methods
of manually reviewing research papers are not only time-consuming but also prone to bias
and human error. As a result, the need for automated solutions that help streamline the
literature review process has emerged, leading to the development of AI-based
summarization tools. These tools are powered by advances in Natural Language
Processing (NLP), enabling them to generate concise, accurate summaries that improve
accessibility and productivity.
This survey highlights how AI-powered summarization tools address critical pain points
in academic research by improving time efficiency, accessibility, and collaboration.
Through advanced NLP models, semantic search, and RAG frameworks, these tools
simplify complex information and make it more accessible to researchers, students, and
professionals. Integration with platforms like arXiv and seamless web-based designs
ensure these tools are practical and easy to use, paving the way for future innovations in
automated summarization.
CHAPTER 3
REQUIREMENT SPECIFICATION
3.1 Introduction :
Managing the growing volume of academic literature is time-consuming and often leads
to missed insights. This project aims to develop an AI-powered summarization tool to
generate concise, accurate summaries of research papers, enhancing productivity and
accessibility. By leveraging NLP models like OpenAI’s GPT and SBERT-based semantic
search, the tool will extract key points efficiently and ensure unbiased results.
The system will feature a web-based interface using Next.js for seamless interaction,
allowing users to input queries and retrieve relevant summaries quickly. It will simplify
complex academic language, making research more accessible to non-specialists and
promoting interdisciplinary collaboration. This solution will help researchers and students
stay updated across fields, focus on deeper analysis, and accelerate innovation by
streamlining the literature review process.
Server Requirements
Cloud Resources
Cloud GPU Instances: For scaling NLP model inference (e.g., AWS, Azure)
Storage: Cloud database or vector database (e.g., Pinecone) for semantic search
indexing.
OpenAI API: For generating summaries using NLP models like GPT
Node.js: Backend runtime environment to handle API requests
Express.js: Web framework for managing routes and API calls
1. Technical Feasibility
a. Technology Availability: The project utilizes proven technologies such as
Next.js, OpenAI API, and vector search engines (e.g., Pinecone).
b. Infrastructure Requirements: The solution requires moderate server resources
(CPU, GPU) and cloud integration for hosting and API access, which are
readily available.
c. Development Complexity: Implementing the frontend with React and the
backend with Node.js is feasible for developers familiar with web
technologies and NLP APIs.
2. Economic Feasibility
a. Development Costs: The use of free tools (like Visual Studio Code) and open-
source frameworks (Next.js) minimizes expenses.
b. API and Hosting Costs: There may be subscription fees for OpenAI API and
cloud hosting services (e.g., AWS, Vercel). These costs are manageable within
a moderate project budget.
3. Operational Feasibility
a. Ease of Use: The web-based interface ensures that users can easily interact
with the tool.
b. Maintenance and Updates: Regular maintenance is required for updating APIs
and improving summarization accuracy, but it is manageable with a small
team.
4. Legal Feasibility
a. Data Privacy: The tool must comply with data privacy regulations, ensuring
that no personal data is stored or mishandled.
b. API Usage Policies: The project must follow the usage policies of APIs (like
OpenAI) to avoid misuse or limitations.
The estimated cost for developing and maintaining the AI-powered summarization tool
includes API usage fees, with OpenAI API costing approximately $100 to $500 per month
based on usage. Cloud hosting through platforms like Vercel or Netlify may range from
free tiers to $20–$50 per month, with optional backend hosting on AWS or Azure adding
$50–$100 monthly. For semantic search, Pinecone offers a free tier, but advanced usage
could cost $100 or more per month. Development tools such as Visual Studio Code and
GitHub are free, though some premium tools may incur minor expenses. Annual
maintenance, including updates and optimizations, is estimated at $500 to $1,000. In
total, the project is expected to cost between $1,500 and $5,000 annually, depending on
resource usage and scaling requirements.
CHAPTER 4
PROJECT ANALYSIS & DESIGN
4
4.1 Introduction :
The design phase focuses on creating a structured architecture that integrates the
frontend, backend, and API services. Key components include the Next.js frontend for
smooth user interactions, Node.js backend for managing API requests, and the OpenAI
API for NLP-based summarization. Additionally, the system design incorporates
semantic search capabilities to enhance query relevance, ensuring that the tool delivers
efficient and meaningful results to users.
4.2 Architecture of Project :
The project Seek AI revolves around building an AI-powered summarization tool for
research papers, leveraging Natural Language Processing (NLP) techniques and large
language models like OpenAI's GPT API. Here’s an overview of the architecture as
described:
1. Frontend (User Interface):
o Web-based Application: Built using Next.js and CSS, providing a user-
friendly interface for inputting queries and receiving summaries.
2. Backend:
o NLP Integration: The core functionality involves OpenAI’s API, which is
responsible for summarizing research papers by extracting essential points and
methodologies.
o Semantic Search Engine: This uses cosine similarity for matching user
queries with relevant sections of the research papers. It ensures that the most
contextually accurate information is retrieved based on the user’s request.
3. Data Processing Flow:
o Query Input and Embedding: User inputs a query, which is then converted
into a numerical vector (embedding) using an embedding model.
o Context Retrieval: The system searches through a vector database
containing precomputed document embeddings. This database is built from
academic sources like arXiv, and relevant context is retrieved using similarity
metrics.
o Response Generation: The retrieved context is combined with the user query
into a prompt. This prompt is processed by a large language model (LLM) to
generate a precise and context-aware response.
4. Data Sources:
o The system sources research papers through web scraping platforms such as
arXiv.
o
4.3 Timeline Chart :
The project timeline for Seek AI is organized into several key phases:
First, the Planning & Design phase takes 3 weeks. It includes project kickoff,
architecture design, and technology stack setup. Once the foundation is set, the Backend
Development begins, spanning 5 weeks. This phase involves integrating the GPT API for
summarization, setting up the embedding model, and building a vector database for
context retrieval.
Next, the Frontend Development takes 3 weeks, during which the UI/UX is designed,
and the frontend is implemented using Next.js to allow user interaction with the system.
In parallel, the Data Sourcing & Integration phase runs for 4 weeks. This includes
scraping academic papers from platforms like arXiv and connecting the data with the
backend.
The Testing & Optimization phase lasts 4 weeks, ensuring the system’s reliability
through unit and functional tests. During this time, the NLP models are fine-tuned for
better summarization accuracy. After testing, the Deployment & Documentation phase
takes 2 weeks. The application is deployed to the cloud, and technical/user documentation
is prepared.
CHAPTER 5
METHODOLOGY
5
5.1 Introduction :
The Seek AI project aims to streamline the process of understanding complex academic
research papers through AI-powered summarization. It uses advanced Natural Language
Processing (NLP) models to generate concise and accurate summaries, allowing users to
quickly grasp key insights, methodologies, and outcomes. The tool is designed for
researchers, students, and professionals who need to manage large volumes of academic
literature efficiently. By leveraging AI to eliminate human bias and simplify technical
language, Seek AI enhances productivity, improves accessibility for non-specialists, and
fosters interdisciplinary collaboration, ultimately accelerating the research process and
innovation.
CHAPTER 6
6.1 Introduction :
Frontend Development: The user interface is built using Next.js and CSS, offering a
seamless experience for users to input queries and receive summaries. The frontend is
designed to be intuitive, allowing researchers, students, and professionals to interact with
the system easily. This web-based application provides users with fields to enter queries
and displays the generated summaries in a readable format.
Backend Development: The backend of the system is responsible for handling requests,
processing research papers, and managing summarization tasks. It integrates with
OpenAI’s GPT API, which powers the core summarization functionality. The backend
receives user queries, processes them through the NLP model, and sends back the
summarized output. A key aspect of the backend is the integration of a vector database
that stores precomputed embeddings of research papers for efficient retrieval.
Embedding and Semantic Search: When a user submits a query, the system converts
the query into an embedding using an NLP model. This embedding captures the semantic
meaning of the query, allowing for accurate context matching. The system then uses
cosine similarity to search the vector database and retrieve relevant sections of research
papers based on their similarity to the query. This step ensures that the most contextually
appropriate information is selected.
Summarization Process: After retrieving the relevant content, the system passes this
data along with the user’s query to a large language model (LLM), such as GPT. The
LLM then generates a concise and accurate summary that includes the key insights,
methodologies, and outcomes from the papers.
Fig.3.1 : Chat-Interaction UI
Fig.3.2 : Similar Papers,Citations
Fig.3.3 : Seek AI UI
CHAPTER 7
7.1 Conclusion :
The Seek AI project presents an innovative solution to the challenges faced by researchers,
students, and professionals in managing and comprehending vast amounts of academic
literature. By leveraging advanced Natural Language Processing (NLP) models and
integrating state-of-the-art technologies, Seek AI effectively simplifies and accelerates the
process of extracting key insights, methodologies, and conclusions from complex research
papers.
The core of the system is its ability to generate concise, unbiased, and accurate summaries
using OpenAI’s GPT API, enabling users to quickly grasp essential information without
manually sifting through lengthy papers. The integration of a semantic search engine
powered by cosine similarity ensures that user queries are matched with the most relevant
and contextually appropriate sections of the academic content. This not only enhances the
precision of the summarization process but also improves the overall user experience by
providing more targeted and relevant summaries.
The web-based interface built with Next.js offers a seamless user experience, making the
tool accessible and easy to use. Users can input queries, retrieve results, and navigate through
summaries efficiently. The backend’s robust architecture, which includes a vector database
for storing document embeddings, ensures the system can handle large volumes of data while
maintaining performance and accuracy.
Overall, Seek AI addresses the critical need for efficiency and accessibility in academic
research by automating the summarization process. It enables users to focus on deeper
analysis and innovation, rather than being bogged down by the time-consuming task of
reading entire papers. By streamlining knowledge extraction and eliminating human bias,
Seek AI promotes interdisciplinary collaboration and fosters broader dissemination of
research findings, ultimately contributing to faster decision-making and advancements in
various fields of study.
REFERENCES :