During week 2 of Machine Learning Course by Andrew Ng we are given an exercise to implement linear regression. Which was divided into two parts the first part was done in my last post which was not efficient but I have posted what I actually came up with in my first try. This is the implementation of the optional part of the exercise where I have looked upon my mistake and have tried to keep it as efficient as I can.
The problem was, "Suppose you are selling your house and you want to know what a good market price would be. One way to do this is to first collect information on recent houses sold and make a model of housing prices. The ex1data2.txt contains a training set of housing prices in Port- land, Oregon. The first column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price of the house."
Firstly we will import all the necessary libraries
#Importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Now we will import the data and describe it to get a better understanding of the data
#Importing Data
df = pd.read_csv("ex1data2.txt")
df.describe() #Gives you a summary of the data so you can understand the data in a better way
df.head() #Show the first 5 entries of the data
We will divide our data into two part: Features(X) and Target(Y) and try to visualize it.
#Dividing the data into features and target
#A visualization of the data to understand it better
x = df[["size","bedrooms"]]
y = df["price"]
plt.scatter(x["size"],y,c = "red",s = 12)
plt.xlabel("Size of the houses")
plt.ylabel("Price of the house")
plt.title("Size V/S Price")
This question has multiple variables and we can clearly see that this problem demands Feature Normalization so in this step we will do that.
#Feature Scaling is really important for this problem
def NormalizeData(data):
mean=np.mean(data,axis=0)
std=np.std(data,axis=0)
X = (data - mean)/std
return X , mean , std
x,mean_x,std_x = NormalizeData(x)
Y,mean_y,std_y = NormalizeData(y)
X = np.column_stack((np.ones((x.shape[0],1)),x))
Y = np.resize(Y.to_numpy(), (x.shape[0],1))
Now we will initialize our parameters and write down the cost function.
#Initializing Parameters
alpha = 0.03
iterations = 2000
theta = np.zeros((3,1))
def cost_function(X,Y,theta):
"""
Take in a numpy array X,y, theta and generate the cost of using
theta as parameter in the model
"""
m = X.shape[0]
avg = 1/(2*m)
estimate = np.square(np.dot(X,theta)-Y)
sqr_estimate = np.sum(estimate)
cost = avg * sqr_estimate
return cost
Now we will write down our gradient descent algorithm:
def gradient_descent(X,Y,theta,alpha,iterations):
"""
Take in numpy array X, y and theta and update theta by taking i gradient
steps with learning rate of alpha
return history of cost function
"""
m = X.shape[0]
J_history = []
for i in range(iterations):
hyp = np.dot(X,theta)
prediction = np.dot(X.transpose(),(hyp - Y))
descent =alpha * 1/m * prediction
theta-=descent
J_history.append(cost_function(X,Y,theta))
return J_history
J_history = gradient_descent(X,Y,theta,alpha,iterations)
Now we will verify if gradient descent worked properly we will plot the graph of cost against iterations and print out our hypothesis.
#Visualizing if cost is decreasing or not with iteration to verify gradient descent
plt.plot(J_history)
plt.title("Cost against Iterations")
plt.xlabel("Iterations")
plt.ylabel("Cost")
#Print the hypothesis equation
print("h(x) = {} + {}x1 + {}x2".format((round(theta[0,0],2)),(round(theta[1,0],2)),(round(theta[2,0],2))))
The print statement prints out: h(x) = -0.0 + 0.88x1 + -0.05x2
Finally, we will predict the price of a house whose size = 1650 and number of bedrooms = 3.
def predict(x,theta):
predictions= np.dot(x,theta)
return predictions[0]
x_sample = np.array([1650,3])
x_sample = NormalizeData(x_sample)[0]
x_sample=np.append(np.ones(1),x_sample)
prediction = predict(x_sample,theta)
prediction = ((prediction*std_y) + mean_y)
print("For size of house = 1650, Number of bedroom = 3, the predicted value of price is ${}".format(round(prediction,0)))
The print statement will print out: For size of house = 1650, Number of bedroom = 3, the predicted value of price is $456439.0
With that we wrap up the first exercise completely, if you want the solution to the first part you can check it out here -> (qusaikader.hashnode.dev/machine-learning-an..). Hope you enjoyed reading it. Feel free to leave me any comment on any ways that I can improve my code. If you want to access the notebook for this assignment, I have uploaded the code on Github(github.com/qusaikader/Machine-Learning-Andr..).
Thank You for reading.