0% found this document useful (0 votes)
9 views7 pages

Presentation 7 (2)

Uploaded by

blueorange630
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views7 pages

Presentation 7 (2)

Uploaded by

blueorange630
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Optimizing

Machine
Learning
Pipeline
Muhammad Omer - i220572
Shariq Usman - i220447
Azeem Chaudary - i220479
Introduction to Problems
• Data Imbalance:
o Class 0 (60.2%), Class 1 (39.8%).
• Missing Values
• 18.5% rows missing
• Distribution Issues:
• Skewed features (e.g., feature_4)
• Outliers (e.g., feature_7)
• Correlation Issues:
• Weak target correlation
• Dataset too small
• GPU slowed things down
Proposed Solution ML

1 Class Imbalance:
Applied under sampling
Preproces
2 sing
Missing Data:
Filled using median
3 Under Clustered to
sampling new feature
Feature Engineering:
• Log-scaled Feature 4
• Squared Feature 4
• Sine & cosine transform on Feature 6
• Clustered adjusted Feature 4 & Feature 7 Median
filling Periodic
Normalizatio
n
Parallelization
Approach
• Multi-threading: Ran tasks on all CPU
cores for preprocessing, training, and
testing where there’s parallel processing
possible.
Paralleli
• Broke data into chunks: Processed pieces ze
at the same time. Split multi-
tasks threading

• Added safety: Switched to single mode if


parallel failed.
data
into Added
chunks safety
Model
Performance
Comparison
• Models
• Random Forest: Works well with small
data, handles class imbalance
reasonably
• XGBoost: uses regularized (L1/L2) and
can capture nonlinear patterns
• Performance
• RFC: 59.17%, F1: 53.90%.
• XGBoost: 59.38%, F1: 47.71%.

• Best:
• XGBoost (59.38% accuracy)
• RFC (88.69% speedup)
• Resource Use:
• Memory: 486.58 MB.
• CPU: 0.00%.
Conclusion
• Achieved 80% faster processing,
best accuracy at 59.38%.
• Worked well on old systems with
low resource use

Future Work:
• Test with larger data for
better results.
• Explore advanced models for
higher accuracy
Machine parallel
processing performanc
AI Learning
e

distributed
LLMs
computing

Project
planning
Plannin
g Equipment

Prompts Dollars
Strategy Profit

You might also like