Machine Learning - Andrew Ng Exercise 1(Optional) in Python

Machine Learning - Andrew Ng Exercise 1(Optional) in Python

During week 2 of Machine Learning Course by Andrew Ng we are given an exercise to implement linear regression. Which was divided into two parts the first part was done in my last post which was not efficient but I have posted what I actually came up with in my first try. This is the implementation of the optional part of the exercise where I have looked upon my mistake and have tried to keep it as efficient as I can.

The problem was, "Suppose you are selling your house and you want to know what a good market price would be. One way to do this is to first collect information on recent houses sold and make a model of housing prices. The ex1data2.txt contains a training set of housing prices in Port- land, Oregon. The first column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price of the house."

Firstly we will import all the necessary libraries

#Importing necessary libraries 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Now we will import the data and describe it to get a better understanding of the data

#Importing Data
df = pd.read_csv("ex1data2.txt")
df.describe() #Gives you a summary of the data so you can understand the data in a better way
df.head()  #Show the first 5 entries of the data

data head optional.JPG data optional.JPG

We will divide our data into two part: Features(X) and Target(Y) and try to visualize it.

#Dividing the data into features and target 
#A visualization of the data to understand it better
x = df[["size","bedrooms"]]
y = df["price"]

plt.scatter(x["size"],y,c = "red",s = 12)
plt.xlabel("Size of the houses")
plt.ylabel("Price of the house")
plt.title("Size V/S Price")

optional.png

This question has multiple variables and we can clearly see that this problem demands Feature Normalization so in this step we will do that.

#Feature Scaling is really important for this problem

def NormalizeData(data):
    mean=np.mean(data,axis=0)
    std=np.std(data,axis=0)

    X = (data - mean)/std

    return X , mean , std


x,mean_x,std_x = NormalizeData(x)
Y,mean_y,std_y = NormalizeData(y)


X = np.column_stack((np.ones((x.shape[0],1)),x))
Y = np.resize(Y.to_numpy(), (x.shape[0],1))

Now we will initialize our parameters and write down the cost function.

#Initializing Parameters
alpha = 0.03
iterations = 2000
theta  = np.zeros((3,1))

def cost_function(X,Y,theta):
    """
    Take in a numpy array X,y, theta and generate the cost of using 
    theta as parameter in the model
    """
    m = X.shape[0]
    avg = 1/(2*m)

    estimate = np.square(np.dot(X,theta)-Y)
    sqr_estimate = np.sum(estimate)
    cost = avg * sqr_estimate
    return cost

Now we will write down our gradient descent algorithm:

def gradient_descent(X,Y,theta,alpha,iterations):
    """
    Take in numpy array X, y and theta and update theta by taking i gradient 
    steps with learning rate of alpha

    return history of cost function
    """
    m = X.shape[0]
    J_history = []

    for i in range(iterations):
        hyp = np.dot(X,theta)
        prediction = np.dot(X.transpose(),(hyp - Y))
        descent =alpha * 1/m * prediction
        theta-=descent
        J_history.append(cost_function(X,Y,theta))

    return J_history

J_history = gradient_descent(X,Y,theta,alpha,iterations)

Now we will verify if gradient descent worked properly we will plot the graph of cost against iterations and print out our hypothesis.

#Visualizing if cost is decreasing or not with iteration to verify gradient descent
plt.plot(J_history)
plt.title("Cost against Iterations")
plt.xlabel("Iterations")
plt.ylabel("Cost")

#Print the hypothesis equation

print("h(x) = {} + {}x1 + {}x2".format((round(theta[0,0],2)),(round(theta[1,0],2)),(round(theta[2,0],2))))

The print statement prints out: h(x) = -0.0 + 0.88x1 + -0.05x2

optional gradient.png

Finally, we will predict the price of a house whose size = 1650 and number of bedrooms = 3.

def predict(x,theta):
    predictions= np.dot(x,theta)
    return predictions[0]

x_sample = np.array([1650,3])
x_sample = NormalizeData(x_sample)[0]


x_sample=np.append(np.ones(1),x_sample)

prediction = predict(x_sample,theta)

prediction = ((prediction*std_y) + mean_y)
print("For size of house = 1650, Number of bedroom = 3, the predicted value of price is ${}".format(round(prediction,0)))

The print statement will print out: For size of house = 1650, Number of bedroom = 3, the predicted value of price is $456439.0

With that we wrap up the first exercise completely, if you want the solution to the first part you can check it out here -> (qusaikader.hashnode.dev/machine-learning-an..). Hope you enjoyed reading it. Feel free to leave me any comment on any ways that I can improve my code. If you want to access the notebook for this assignment, I have uploaded the code on Github(github.com/qusaikader/Machine-Learning-Andr..).

Thank You for reading.

Did you find this article valuable?

Support Qusai Kader by becoming a sponsor. Any amount is appreciated!