# Locally Weighted Linear Regression It is a non-parametric ML algorithm that does not learn on a fixed set of parameters such as **linear regression**. \ So, here comes a question of what is *linear regression*? \ **Linear regression** is a supervised learning algorithm used for computing linear relationships between input (X) and output (Y). \ ### Terminology Involved number_of_features(i) = Number of features involved. \ number_of_training_examples(m) = Number of training examples. \ output_sequence(y) = Output Sequence. \ $\theta$ $^T$ x = predicted point. \ J($\theta$) = COst function of point. The steps involved in ordinary linear regression are: Training phase: Compute \theta to minimize the cost. \ J($\theta$) = $\sum_{i=1}^m$ (($\theta$)$^T$ $x^i$ - $y^i$)$^2$ Predict output: for given query point x, \ return: ($\theta$)$^T$ x

This training phase is possible when data points are linear, but there again comes a question can we predict non-linear relationship between x and y ? as shown below Non-linear Data

So, here comes the role of non-parametric algorithm which doesn't compute predictions based on fixed set of params. Rather parameters $\theta$ are computed individually for each query point/data point x.

While Computing $\theta$ , a higher "preferance" is given to points in the vicinity of x than points farther from x. Cost Function J($\theta$) = $\sum_{i=1}^m$ $w^i$ (($\theta$)$^T$ $x^i$ - $y^i$)$^2$ $w^i$ is non-negative weight associated to training point $x^i$. \ $w^i$ is large fr $x^i$'s lying closer to query point $x_i$. \ $w^i$ is small for $x^i$'s lying farther to query point $x_i$. A Typical weight can be computed using \ $w^i$ = $\exp$(-$\frac{(x^i-x)(x^i-x)^T}{2\tau^2}$) Where $\tau$ is the bandwidth parameter that controls $w^i$ distance from x. Let's look at a example : Suppose, we had a query point x=5.0 and training points $x^1$=4.9 and $x^2$=5.0 than we can calculate weights as : $w^i$ = $\exp$(-$\frac{(x^i-x)(x^i-x)^T}{2\tau^2}$) with $\tau$=0.5 $w^1$ = $\exp$(-$\frac{(4.9-5)^2}{2(0.5)^2}$) = 0.9802 $w^2$ = $\exp$(-$\frac{(3-5)^2}{2(0.5)^2}$) = 0.000335 So, J($\theta$) = 0.9802*($\theta$ $^T$ $x^1$ - $y^1$) + 0.000335*($\theta$ $^T$ $x^2$ - $y^2$) So, here by we can conclude that the weight fall exponentially as the distance between x & $x^i$ increases and So, does the contribution of error in prediction for $x^i$ to the cost. Steps involved in LWL are : \ Compute \theta to minimize the cost. J($\theta$) = $\sum_{i=1}^m$ $w^i$ (($\theta$)$^T$ $x^i$ - $y^i$)$^2$ \ Predict Output: for given query point x, \ return : $\theta$ $^T$ x LWL