# MM算法

MM算法用一系列简单的优化问题代替一个困难的优化问题。简单体现在（1）避免大矩阵求逆，（2）线性化优化问题，（3）分离优化问题的参数，（4）更优雅的处理等式和不等式约束，（5）将一个不可微问题转化为一个光滑问题。迭代计算则是我们需要付出的代价。

MM算法思想：令$\theta^{(m)}$是参数$\theta$的一个固定值，$g(\theta|\theta^{(m)})$表示$\theta$的实值函数，其形式依赖于$\theta^{(m)}$。定义函数$g(\theta|\theta^{(m)})$在点$\theta^{(m)}$处Majorize函数$f(\theta)$，如果成立
$g(\theta|\theta^{(m)})\ge{f(\theta)},\ \forall\ \theta,$
$g(\theta^{(m)}|\theta^{(m)})=f(\theta^{(m)}).(*)$

\begin{align*}
f(\theta^{(m+1)})&=g(\theta^{(m+1)}|\theta^{(m)})+f(\theta^{(m+1)})-g(\theta^{(m+1)}|\theta^{(m)})\\
&\le{g(\theta^{(m)}|\theta^{(m)})+f(\theta^{(m)})-g(\theta^{(m)}|\theta^{(m)})}\\
&=f(\theta^{(m)})(**)
\end{align*}

$\pi_i(\theta)=exp\{x_i^T\theta\}/(1+exp\{x_i^T\theta\})。$

\begin{align*}
f(\theta)&=\prod\limits_{i=1}^n(\frac{e^{x_i^T\theta}}{1+e^{x_i^T\theta}})^{y_i}(\frac{1}{1+e^{x_i^T\theta}})^{1-y_i}\\
&=\prod\limits_{i=1}^n\frac{e^{x_i^T\theta{y_i}}}{1+e^{x_i^T\theta}}
\end{align*}

$l(\theta)=\sum\limits_{i=1}^n[Y_ix_i^T\theta-\log(1+e^{x_i^T\theta})]$

$\theta_{m+1}=\theta_m-(\nabla^2l(\theta_m))^{-1}\nabla{l(\theta_m)}$

$\nabla{l(\theta)}=\sum\limits_{i=1}^n[Y_i-\pi_i(\theta)]x_i$
$\nabla^2l(\theta)=\sum\limits_{i=1}^n-\pi_i(\theta)[1-\pi_i(\theta)]x_ix_i^T$

$g(\theta|\theta^{(m)})=l(\theta^{(m)})+\nabla{l(\theta^{(m)})}^T(\theta-\theta^{(m)})+\frac{1}{2}(\theta-\theta^{(m)})^TB(\theta-\theta^{(m)})$

\begin{align*}
\theta^{(m+1)}&=\theta^{(m)}-B^{-1}\nabla{l(\theta^{(m)})}\\
&=\theta^{(m)}-4(X^TX)^{-1}X^T[\pi(\theta^{(m)})-Y]
\end{align*}