[机器学习]逻辑回归Logistic regression
目的:分类还是回归?经典的二分类算法!
机器学习算法选择:先逻辑回归再用复杂的,能简单还是用简单的
逻辑回归的决策分界:可以是非线性的
Sigmoid函数
公式:g(z)=11+e−zg(z) = \frac{1}{1 + e^{-z}}g(z)=1+e−z1
自变量取值为任意实数,值域为[0,1]自变量取值为任意实数,值域为[0,1]自变量取值为任意实数,值域为[0,1]
解释:将任意的输入映射到了[0,1]区间,我们在线性回归中可以得到一个预测值,再将该值映射到Sigmoid函数中,这样就完成了由值到概率的转换,也就是分类任务
多分类逻辑回归
eg:
数据集 100×3数据集~~100\times 3数据集 100×3
表示100个样本点,3个特征表示100个样本点,3个特征表示100个样本点,3个特征
θ 3×3\theta ~~3\times 3θ 3×3
表示3个特征,3个分类表示3个特征,3个分类表示3个特征,3个分类
多分类逻辑回归代码实现
# -*- coding: utf-8 -*-
"""
Created on Wed Apr 17 15:59:36 2024
@author: Tom
"""
import numpy as np
from scipy.optimize import minimize
from utils.features import prepare_for_training
from utils.hypothesis import sigmoid
class LogisticRegression:
def __init__(self,data,labels,polynomial_degree = 0,sinusoid_degree = 0,normalize_data = False):
"""
1.对数据进行预处理操作
2.先得到所有的特征个数
3.初始化参数矩阵
"""
(data_processed,
features_mean,
features_deviation) = prepare_for_training(data, polynomial_degree, sinusoid_degree,normalize_data=False)
self.data = data_processed
self.labels = labels
# 计算标签数量
self.unique_labels = np.unique(labels)
self.features_mean = features_mean
self.features_deviation = features_deviation
self.polynomial_degree = polynomial_degree
self.sinusoid_degree = sinusoid_degree
self.normalize_data = normalize_data
# 计算多少特征
num_features = self.data.shape[1]
num_unique_labels = np.unique(labels).shape[0]
# num_unique_labels 你要做多少类别
# num_features 特征
self.theta = np.zeros((num_unique_labels,num_features))
def train(self,max_iterations = 1000):
# 损失值
cost_histories = []
# 特征个数
num_features = self.data.shape[1]
# 训练多个二分类模型
for label_index,unique_label in enumerate(self.unique_labels):
current_initial_theta = np.copy(self.theta[label_index].reshape(num_features,1))
current_labels = (self.labels == unique_label).astype(