回归算法预测房价-Kaggle

该博客详细介绍了如何使用回归算法预测房价,包括查看目标变量、处理异常值、正态变换、缺失值处理、特征工程以及建模分析等步骤。作者采取了Box Cox变换、Log转换来处理偏正态分布的目标变量,并通过删除异常点、Label Encoding、独热编码等方式预处理特征。在建模阶段,使用了多种基模型和集成模型,通过交叉验证和性能度量进行模型优化。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >


https://www.kaggle.com/serigne/stacked-regressions-top-4-on-leaderboard

https://segmentfault.com/a/1190000018717280?utm_source=tag-newest

The features engeneering is rather parsimonious (at least compared to some others great scripts) . It is pretty much :

Imputing missing values by proceeding sequentially through the data

Transforming some numerical variables that seem really categorical

Label Encoding some categorical variables that may contain information in their ordering set

**Box Cox Transformation of skewed features (instead of log-transformation) **: This gave me a slightly better result both on leaderboard and cross-validation.

Getting dummy variables for categorical features.

Then we choose many base models (mostly sklearn based models + sklearn API of DMLC’s XGBoost and Microsoft’s LightGBM), cross-validate them on the data before stacking/ensembling them. The key here is to make the (linear) models robust to outliers. This improved the result both on LB and cross-validation.

To my surprise, this does well on LB ( 0.11420 and top 4% the last time I tested it : July 2, 2017 )

1.查看目标变量

第一步查看

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值