DA3 SQL Portfolio Project
DA3 SQL Portfolio Project
Dataset:
https://www.kaggle.com/datasets/stackoverflow/stackoverflow/data?select=post_history
Project Objective
The objective is to write queries to analyze the history of Stack Overflow posts, including edits,
comments, and other changes, to gain insights into user activity and content evolution while
mastering SQL skills. Since the original dataset is very large with millions of rows per table, you
have been given 10 rows of data per table. Run your project_table_load.sql file which will
create your schema and tables and will insert the data. Write your queries using these tables.
1. badges
○ Tracks badges earned by users.
○ Key Fields:
■ id, user_id, name (badge name), date (earned date).
2. comments
○ Contains comments on posts.
○ Key Fields:
■ id, post_id, user_id, creation_date, text.
3. post_history
○ Tracks the history of edits, comments, and other changes made to posts.
○ Key Fields:
■ id, post_history_type_id, post_id, user_id, text,
creation_date.
4. post_links
○ Links between related posts.
○ Key Fields:
■ id, post_id, related_post_id, link_type_id.
5. posts_answers
○ Contains questions and answers.
○ Key Fields:
■ id, post_type_id (question or answer), creation_date, score,
view_count, owner_user_id.
6. tags
○ Information about tags associated with posts.
○ Key Fields:
■ id, tag_name.
7. users
○ Details about Stack Overflow users.
○ Key Fields:
■ id, display_name, reputation, creation_date.
8. votes
○ Tracks voting activity on posts.
○ Key Fields:
■ id, post_id, vote_type_id, creation_date.
9. posts
○ Information about posts.
○ Key Fields:
■ id, title, post_type_id, creation_date, score,
view_count, owner_user_id.
Part 1: Basics
Part 3: Subqueries
● Which users have contributed the most in terms of comments, edits, and votes?
● What types of badges are most commonly earned, and which users are the top
earners?
● Which tags are associated with the highest-scoring posts?
● How often are related questions linked, and what does this say about knowledge
sharing?
Deliverables
● A report containing:
○ SQL scripts for each task. (70 marks)
○ Key insights derived from the queries. (30 marks)
○ Visualizations (if using tools like Tableau or Power BI). (Optional) (+10 bonus)