Stack Overflow
Stack Overflow
A PROJECT REPORT
Submitted By:
Shashank Singh(20BCS9109)
Nomula Bala Sai Vighnesh Guptha(21BCS8054)
Certified that this project report “Stack over Flow Clone Website” is the bonafide
work of”Shashank Singh and Nomula Bala Sai Vighnesh Guptha” who carried out
the project work under my/our supervision.
SIGNATURE SIGNATURE
DISHA SHARMA
SUPERVISOR
We take this opportunity to express my gratitude to all those people who have been directly and indirectly
with me during the completion of this Project. We pay a special thanks to my guide Disha Sharma, who has
given guidance and a light to us during this training.We acknowledge here out debt to those who contributed
significantly to one or more steps. We take full responsibility for any remaining sins of omission and
commission.
TABLE OF CONTENTS
1. List of Figures
2. Abstract
3. Chapter 1- Introduction
4. Chapter 2- Literature Review/Background Study
5. Chapter 3- Design Flow/Process
6. Chapter 4- Results Analysis and Validation
7. Chapter 5- Conclusion and Future Work
8. References
List of Figures
When programmers look for how to achieve certain programming tasks, Stack Overflow is a popular
destination in search engine results. Over the years, Stack Overflow has accumulated an impressive knowledge
base of snippets of code that are amply documented. We are interested in studying how programmers use these
snippets of code in their projects. Can we find Stack Overflow snippets in real projects? When snippets are
used, is this copy literal or does it suffer adaptations? And are these adaptations specializations required by
the idiosyncrasies of the target artifact, or are they motivated by specific requirements of the programmer?
The large-scale study presented on this paper analyzes 909k non-fork Python projects hosted on Github, which
contain 290M function definitions, and 1.9M Python snippets captured in Stack Overflow. Results are
presented as quantitative analysis of block-level code cloning intra and inter Stack Overflow and GitHub, and
as an analysis of programming behaviors through the qualitative analysis of our findings.
CHAPTER 1- INTRODUCTION:
This work presents a systematic mapping study and quality evaluation of publications that are written about
Stack Overflow. Stack Overflow (available via stackoverflow.com) is the leading Community Question and
Answer (CQA) platform for programmers, with more than 10 million users, contributing some 16 million
questions as of 20191 . This platform’s success is evident through the fact that 92% of questions posted are
answered, with a median answering time of 11 minutes . Parnin and Treude found that 84.4% of general web
searches for jQuery API on Google resulted in at least one Stack Overflow outcome returned on the first page,
showing that programmers can turn to Stack Overflow for questions already asked. Chen and Xing also found
data from Stack Overflow tags to be useful for creating an overall view of technologies trend. Software
engineering has a particular reliance on crowdsourced knowledge (and CQAs), given the community’s general
drive and emphasis on knowledge reuse . CQA can benefit software practitioners seeking information, as it is
likely that other practitioners have faced a similar problem, and so, a relevant question may have already been
asked that has invoked a suitable answer. On the other hand, new questions may be created, and experts have
the opportunity to lend their particular experience which allows them to solve a problem and gain the respect
of their peers in the community. Beyond Stack Overflow, Yahoo!Answers2, TopCoder3 and Bountify4 all
provide crowd-based support for software practitioners’ challenges. However, Stack Overflow tends to be the
preferred choice for developers . While few would doubt the utility of Stack Overflow to software
practitioners, questions regarding the quality of the responses generated to questions abound. For instance, Jin,
et al. studied how the need to ‘win’ reputation rewards can influence answer quality, or the drive and
willingness to underestimate the quality of answers provided by others. Anand and Ravichandran identified a
need for separating answer quality and popularity, as the reward system (contributors on Stack Overflow are
rewarded points for their contributions) provided by Stack Overflow can sometimes conflict with quality
answers. These works suggest that there is need for caution when approaching Stack Overflow for
recommended solutions to software engineering challenges. However, Stack Overflow code reuse is
ubiquitous. For instance, Abdalkareem, et al. investigated code reused from Stack Overflow in mobile apps
and found that 1.3% of the apps they sampled were constructed from Stack Overflow posts. They also
discovered that mid-aged and older apps contained Stack Overflow code introduced later in their lifetime. An,
et al. also investigated Android apps and found that 62 out of 399 (15.5%) apps contained exact code clones;
and of the 62 apps, 60 had potential license violations. In terms of Stack Overflow, they discovered that 1,226
posts contained code found in 68 apps. Furthermore, 126 Stack Overflow snippets were involved in code
migration, where 12 cases of migration involved apps published under different licenses. Yang, et al. noted
that, in terms of Python projects, over 1% of code blocks in their token form exist in both GitHub and Stack
Overflow. At an 80% similarity threshold, over 1.1% of code blocks in GitHub were similar to those in Stack
Overflow, and 2% of Stack Overflow code blocks were similar to those in GitHub.
CHAPTER 2- LITERATURE REVIEW/BACKGROUND STUDY
Platforms such as Stack Overflow are available for software practitioners to solicit solutions to their challenges
and knowledge needs. This community’s practices have in recent times however triggered quality-related
concerns. This is a noteworthy issue when considering that the Stack Overflow platform is used by numerous
software developers. Academic research tends to provide validation for the practices and processes employed
by Stack Overflow and other such forums. However, previous work did not review the scale of scientific
attention that is given to this cause. Evidence resulting from such an analysis could be useful for understanding
the focus of academic studies when considering Stack Overflow and how research is conducted on this forum.
Furthermore, pertinent research quality evaluations may direct replication studies.
PROBLEM DEFINITION:
Stack Overflow (SO) is an online, collaborative platform for developers to post their programming questions,
provide answers to the existing questions and find solution to their difficulties faced during programming. A
developer needs to add tags while posting a question to help other users to find out what the question is about.
If the answer provided by any user gives solution to the problem faced by the questioner, that answer can be
selected by the questioner which is called the accepted answer to that question. Different members of the site
can vote on questions and answers. The positive votes called the upvote and negative votes called the
downvote, which shows how helpful that question/answer was for other users. The score of a question/answer
is determined by the difference between the number of up/down votes. Based on the different activities of each
user on Stack Overflow such as posting questions or answers, voting on them, posting comments, etc.
OBJECTIVE:
This website serves as a platform fir users to ask and answer question, and, through membership and active
participation, to vote questions and answers up and down similar to Reddit and edit questions and answers in
a fashion similar to a wiki.
CHAPTER 2- DESIGN FLOW/PROCESS
Stack Overflow is a question and answer website for professional and enthusiast programmers. It is the
flagship site of the Stack Exchange Network. It was created in 2008 by Jeff Atwood and Joel Spolsky. It
features questions and answers on a wide range of topics in computer programming. It was created to be a
more open alternative to earlier question and answer websites such as Experts-Exchange. Stack Overflow was
sold to Prosus, a Netherlands-based consumer internet conglomerate, on 2 June 2021 for $1.8 billion.
The website serves as a platform for users to ask and answer questions, and, through membership and active
participation, to vote questions and answers up or down similar to Reddit and edit questions and answers in a
fashion similar to a wiki. Users of Stack Overflow can earn reputation points and "badges"; for example, a
person is awarded 10 reputation points for receiving an "up" vote on a question or an answer to a question, and
can receive badges for their valued contributions, which represents a gamification of the traditional Q&A
website. Users unlock new privileges with an increase in reputation like the ability to vote, comment, and even
edit other people's posts.
As of March 2021 Stack Overflow has over 14 million registered users, and has received over 21 million questions and
31 million answers. The site and similar programming question and answer sites have globally mostly replaced
programming books for day-to-day programming reference in the 2000s, and today are an important part of computer
programming. Based on the type of tags assigned to questions, the top eight most discussed topics on the site
are: JavaScript, Java, C#, PHP, Android, Python, jQuery, and HTML.
1. Search questions.
2. Create a new question with bounty and tags.
3. Add/modify answers to questions.
4. Add comments to questions or answers.
5. Moderators can close, delete, and un-delete any question.
Fig 1: Use Case Diagram for Stack Overflow
Class Diagram
Here are the main classes of Stack Overflow System:
• Question: This class is the central part of our system. It has attributes like Title and Description to
define the question. In addition to this, we will track the number of times a question has been viewed
or voted on. We should also track the status of a question, as well as closing remarks if the question
is closed.
• Answer: The most important attributes of any answer will be the text and the view count. In addition
to that, we will also track the number of times an answer is voted on or flagged. We should also track
if the question owner has accepted an answer.
• Comment: Similar to answer, comments will have text, and view, vote, and flag counts. Members
can add comments to questions and answers.
• Tag: Tags will be identified by their names and will have a field for a description to define them. We
will also track daily and weekly frequencies at which tags are associated with questions.
• Badge: Similar to tags, badges will have a name and description.
• Photo: Questions or answers can have photos.
• Bounty: Each member, while asking a question, can place a bounty to draw attention. Bounties will
have a total reputation and an expiry date.
• Account: We will have four types of accounts in the system, guest, member, admin, and moderator.
Guests can search and view questions. Members can ask questions and earn reputation by answering
questions and from bounties.
• Notification: This class will be responsible for sending notifications to members and assigning
badges to members based on their reputations.
Fig 2: Class Diagram for Stack Overflow
Fig 3: UML for Stack Overflow
Activity Diagram
Post a new question: Any member or moderator can perform this activity. Here are the steps to post a
question:
Fig 4: Activity Diagram for Stack Overflow
Sequence Diagram
Following is the sequence diagram for creating a new question: