I’ve completed a project on NYC Taxi Trip Duration Prediction using various regression models, including Linear Regression, Random Forest, and Gradient Boosting. The goal was to analyze and predict the duration of taxi trips based on factors like pick-up and drop-off locations, timestamps, and passenger details. Key Highlights: - Data Analysis: Conducted thorough exploratory data analysis (EDA) to understand the dataset and visualize key trends. - Model Development: Implemented multiple regression models and optimized them for better accuracy, achieving an R² score of 0.70 with Gradient Boosting. -Performance Evaluation: Evaluated model performance using various metrics, ensuring reliable predictions. This project enhanced my skills in data analysis, machine learning, and model optimization. I’m looking forward to applying these learnings in future endeavors! You can check out the project on my GitHub: [https://lnkd.in/g8b_awJU] #DataScience #MachineLearning #Python #NYCTaxi #Project
Bhavik Ramina’s Post
More Relevant Posts
-
📢 Hello Everyone! Recently, I've been practicing and using the Matplotlib library for better data analysis insights and visualization. Did you ever try it? I used a dataset from kaggle, performed data manipulation by summing the total revenue, and filtered it by region. Then, I used Matplotlib to create a clear and insightful bar chart. Check it out! 🔹 Key Insights: North America is leading with the highest total revenue. Asia and Europe follow, with Asia slightly ahead of Europe. 🔎 Have you tried Matplotlib for your projects? What’s your favorite feature? #DataAnalysis #DataViz #Matplotlib #Python #SalesData #DataScience #Analytics
To view or add a comment, sign in
-
-
🌱 Excited to share my latest project: Mushroom Classification using Machine Learning! 🍄 In this project, I built a classification model to distinguish between different mushroom types based on features like cap diameter, gill attachment, stem color, and more. Here’s a quick summary of my approach: 🔹 Exploratory Data Analysis (EDA): Gained insights on patterns and correlations in mushroom features. 🔹 Preprocessing: Used Label Encoding for categorical variables, scaled the data for consistency, and engineered some useful features. 🔹 Modeling: Employed Random Forest algorithmfor robust results. 🔹 Deployment: Built an interactive app with Streamlit where users can input mushroom features to get instant classification predictions. 🔹 Packages used: Sklearn, Pandas, Numpy, Matplotlib, Seaborn, Imblearn, Pickle, Streamlit 🎯 Results: With our model, we achieved 99% accuracy. This project was a fun mix of data science, domain knowledge, and practical application! 👉Link to dataset: https://lnkd.in/eNzYcE9a 👉 Github repo: https://lnkd.in/e9TXNAc7 👉Feel free to check out the report. Feedback is welcome, and I'd love to connect with others working in machine learning, data science, or similar areas! #MachineLearning #DataScience #Classification #MushroomProject #Python #Streamlit #RandomForest
To view or add a comment, sign in
-
🌟 Exploring Heart Health Through Data Visualization 🌟 🚀 Excited to share my latest project: Data Visualization on Heart Disease Dataset! This project leverages Python libraries like Pandas, Seaborn, and Matplotlib to dive deep into a heart disease dataset. Through visualization, I uncovered patterns, trends, and relationships that could be critical in understanding heart health. 🔍 Key Highlights: 1️⃣ Analyzed age distribution with DistPlots. 2️⃣ Visualized categorical data using Pie Charts. 3️⃣ Examined correlations through HeatMaps and PairPlots. 4️⃣ Explored bivariate distributions with JointPlots. 5️⃣ Communicated insights effectively with clear and intuitive visuals. 💡 Why This Matters: Data visualization is not just about creating pretty charts—it's about telling stories that drive impactful decisions, especially in healthcare. 🎯 Next Steps: Continuously refining the project and exploring advanced techniques to make these insights even more actionable. 📂 Check it out here: https://lnkd.in/gjYsBPex #DataVisualization #HeartDisease #EDA #Python #Pandas #Seaborn #Matplotlib #DataScience
GitHub - SaketJ3003/Project-Heart-Disease-EDA: Data Visualization on Heart Disease (Exploratory Data Analysis)
github.com
To view or add a comment, sign in
-
DS Task 1: Analysis of World Population Dataset (2001-2022) and creating a bar chart or histogram for data visualisation Dataset Overview: For this task, I analyzed the world_population_dataset, which captures population records from 2001 to 2022, providing a valuable perspective on global demographic changes. Tools & Libraries -Google Colab: For code development and documentation - Pandas: Data manipulation and analysis - Numpy: Numerical computations - Matplotlib & Seaborn: Data visualization Key Steps in Exploratory Data Analysis (EDA): 1. Data Cleaning: Handled missing data, duplicates, and outliers to ensure dataset accuracy and reliability. 2. Data Visualization: Created bar charts and stacked charts to effectively illustrate the distribution of variables, highlighting trends and patterns. Conclusion: This EDA process uncovered valuable insights into global population trends over the past two decades, setting a solid foundation for future exploration and predictive modeling in the data science workflow. #DataScience #EDA #Python #DataVisualization #ProdigyInfotech #PopulationAnalysis
To view or add a comment, sign in
-
Day 12: Another productive day with data. I learnt about different types of #memories(iconic, short-term, long-term) and how to use #preattentive #attributes to focus my #audience's attention in data #design. Preattentive attributes signal where to look. We can use elements like colours, size, text attributes (bold, italics, underline), and space to get the audience to understand our #data #story faster. I tried implementing this with the #churn analysis #dashboard I built last week. I used colour to focus the audience's attention on the #metrics that are important and should be looked at first. I have attached the #pdf below. I also did a couple of #assessments in Python from DataCamp to reinforce my knowledge. I learned about #seasonality in #time #series from Kaggle. Today, I checked ALL tasks I set out to do for my #data #learning. I am excited to be going on (and going strong 💪🏽) with this challenge and how much I am learning through it. We go again, tomorrow! #Day12 #25DaysofDatawithData #ReinforcingConsistency #ForGrowth #PuttingMyselfOutThere
To view or add a comment, sign in
-
🚢 Exploring Titanic Survival Analysis with Data Science 🌊 I recently worked on a Data Science project analyzing the Titanic dataset to uncover insights into survival rates, and here's the exciting part — I implemented this using a Flask-based web application that dynamically handles data cleaning, generates statistics, and visualizes the results! 📊 Key Features of the Project: Automatic handling of missing values for features like Age and Embarked. Dropping unnecessary columns like Cabin to simplify analysis. Interactive visualizations, including class distribution, gender-based survival, and age distribution, using Seaborn and Matplotlib. Live web app setup with Flask to display real-time data summaries and visualizations. 🌟 Special thanks to KODI PRAKASH SENAPATI sir😊 for inspiring me to dive deeper into the Titanic dataset and improve my data visualization skills. Your guidance and support mean a lot! 🙌 🔗 If you're curious about how to build something similar or have suggestions to enhance this project, feel free to connect with me or share your feedback! Let’s continue to explore the fascinating world of data science together! 🚀 #DataScience #MachineLearning #Python #TitanicAnalysis #Flask #Visualization #Seaborn #Matplotlib
To view or add a comment, sign in
-
hello connections 👋 I recently completed an exciting data science project where I built a model to predict house prices based on key features like location, size, and amenities. Here's a glimpse into my journey: 💡 What I Did: Data Exploration & Cleaning: Analyzed over 1,400 rows of data with 79 features to handle missing values, outliers, and categorical variables. Feature Engineering: Identified crucial predictors like neighborhood quality, square footage, and the year built. Modeling: Tried several approaches, including Linear Regression, Random Forest, and XGBoost, to find the best fit. Evaluation: Achieved an RMSE of 50,000, significantly improving prediction accuracy through hyperparameter tuning. 🔧 Tools & Technologies: Python, Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, XGBoost. 🌟 What I Learned: 1️⃣ Location matters more than you'd think! Proximity to amenities and neighborhood quality are game-changers. 2️⃣ Balancing simplicity and complexity in models can significantly impact performance. 3️⃣ Data preprocessing is as critical as model selection for accurate predictions. 💻 I've also created visualizations to showcase the feature importance and deployed the model for easy interaction! Check it out here:Github link. [https://lnkd.in/gtuF7yWM] What do you think about leveraging machine learning in real estate? Have you worked on similar projects? Let’s discuss! 🚀 #DataScience #MachineLearning #RealEstate #HousePricePrediction #Python #DataAnalytics
To view or add a comment, sign in
-
Journaling My Data Science Journey! Part 10 On Monday, November 18, 2024, we continued our journey in Data Visualization. In the previous class, we visualized our data using Seaborn. However, on this particular day, we used Matplotlib. The processes are similar, as I mentioned in previous journals—Seaborn is built on Matplotlib. The major difference lies in the syntax we use with Matplotlib. Similar to what we did with Seaborn, we also created the following charts using Matplotlib: - Scatter Plot - Line Chart - Bar Chart - Histogram By examining these different chart outputs, we were able to determine which one best represented our data and would be easiest to understand by a layman. Always remember, we can only visualize clean data. This means that before we think about data visualization, we must first clean our data. I can confidently say that I’ve progressed from being a novice to becoming more proficient in Data Science, but I'm not there yet. I will keep moving forward until my good is better, and my better becomes the best. This journey wouldn't have been possible without these individuals and organizations: - Bolatito Sarumi, my Instructor - Brainbench Technology, the host - The World Bank, the pillar behind the IDEAS PROJECT Thank you all. #Zionomotusi #BrainBenchTechnology #WorldBank #SustainableDevelopment #DataScience #Python
To view or add a comment, sign in
-
Hey Connections 💫 Excited to share my latest project where I implemented a 𝐋𝐢𝐧𝐞𝐚𝐫 𝐑𝐞𝐠𝐫𝐞𝐬𝐬𝐢𝐨𝐧 model to predict housing prices using 𝐔𝐒𝐀 𝐇𝐨𝐮𝐬𝐢𝐧𝐠 𝐃𝐚𝐭𝐚𝐒𝐞𝐭 📊🏠 Housing prices are influenced by numerous factors, making accurate predictions crucial for buyers, sellers, and investors. Leveraging linear regression, I aimed to establish a relationship between key variables such as average area income, house age, number of rooms, and population with housing prices. The process involved several steps: 𝐃𝐚𝐭𝐚 𝐂𝐥𝐞𝐚𝐧𝐢𝐧𝐠 𝐚𝐧𝐝 𝐏𝐫𝐞𝐩𝐫𝐨𝐜𝐞𝐬𝐬𝐢𝐧𝐠: Ensured the dataset was free of null values and outliers. 𝐄𝐱𝐩𝐥𝐨𝐫𝐚𝐭𝐨𝐫𝐲 𝐃𝐚𝐭𝐚 𝐀𝐧𝐚𝐥𝐲𝐬𝐢𝐬 (𝐄𝐃𝐀): Visualized data trends and relationships using Seaborn and Matplotlib. 𝐅𝐞𝐚𝐭𝐮𝐫𝐞 𝐒𝐞𝐥𝐞𝐜𝐭𝐢𝐨𝐧: Identified the most significant predictors to enhance model accuracy. 𝐌𝐨𝐝𝐞𝐥 𝐈𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧: Applied linear regression using scikit-learn to train the model on 80% of the dataset and tested it on the remaining 20%. 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧: Achieved a promising R-squared value, indicating the model’s strong predictive power. Check out the detailed analysis and code on GitHub! https://lnkd.in/dxgnSFSz IDE used: Google Colab Dataset from kaggle: https://lnkd.in/dcFbvFF6 #MachineLearning #DataScience #LinearRegression #DataAnalysis #Python #PredictiveModeling
To view or add a comment, sign in
-
Whether you're working with complex datasets or performing advanced data analysis, Pandas MultiIndex objects, MultiIndex Series, and MultiIndex DataFrames are essential tools in your data science toolkit. They enable you to manage multi-level indexing and bring unparalleled flexibility to your data manipulation tasks. Perfect for handling hierarchical data, slicing and dicing across multiple levels, and simplifying your workflows. https://lnkd.in/gvQ6BNcA 👉 Master MultiIndex in Pandas to take your #DataScience, #MachineLearning, and #BigData projects to the next level! #Python #DataAnalysis #DataEngineering #DataManipulation #Pandas #MultiIndex #DataScienceTips #AI #TechSkills #Programming
Pandas Multi-Index Object
kaggle.com
To view or add a comment, sign in
Associate Research Analyst Intern|MBA (Business Analytics)|MSc. Data Science|BSc. (Economics, Mathematics, Statistics)
4moGreat work Bhavik