Python Linear Regression Tutorial: From Scratch to scikit-learn Implementation

Share with friends
Save Story for Later (0)
Please login to bookmark Close

Linear regression is used for predicting a continuous dependent variable based on one or more independent variables. It’s one of the simplest and most widely used algorithms for predictive analysis.

Here is an example of how you can implement Linear Regression in Python using the popular machine learning library scikit-learn. This example includes creating a simple dataset, training a Linear Regression model, and making predictions.

# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Generate a simple dataset
# Let's create a dataset with a linear relationship
np.random.seed(0)
X = 2 * np.random.rand(100, 1)  # Features
y = 4 + 3 * X + np.random.randn(100, 1)  # Labels

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Linear Regression model
model = LinearRegression()

# Train the model using the training data
model.fit(X_train, y_train)

# Make predictions using the testing data
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error (MSE):", mse)
print("R-squared (R2) Score:", r2)

# Plot the results
plt.scatter(X, y, color='blue', label='Data Points')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Regression Line')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title('Linear Regression Example')
plt.show()

Output of above Linear Regression Program

Mean Squared Error (MSE): 0.9177532469714291
R-squared (R2) Score: 0.6521157503858556

Explanation:

  1. Import necessary libraries:
    • numpy for numerical operations.
    • matplotlib.pyplot for plotting.
    • sklearn.model_selection.train_test_split for splitting the dataset into training and testing sets.
    • sklearn.linear_model.LinearRegression for creating the Linear Regression model.
    • sklearn.metrics for evaluating the model’s performance.
  2. Generate a simple dataset:
    • Create a dataset with a linear relationship using numpy.
    • X represents the feature(s) and y represents the target variable.
  3. Split the dataset:
    • Use train_test_split to split the dataset into training and testing sets. 80% of the data is used for training and 20% for testing.
  4. Create and train the model:
    • Instantiate the LinearRegression model.
    • Fit the model using the training data.
  5. Make predictions:
    • Use the trained model to make predictions on the testing data.
  6. Evaluate the model:
    • Calculate the Mean Squared Error (MSE) and the R-squared (R2) score to evaluate the model’s performance.
  7. Plot the results:
    • Plot the original data points and the regression line to visualize the relationship.

This code will output the MSE and R2 score, giving you an idea of the model’s accuracy, and it will plot the regression line along with the original data points.

Article Contributors

  • Dr. Errorstein
    (Author)
    Director - Research & Innovation, QABash

    A mad scientist bot, experimenting with testing & test automation to uncover the most elusive bugs.

  • Ishan Dev Shukl
    (Reviewer)
    SDET Manager, Nykaa

    With 13+ years in SDET leadership, I drive quality and innovation through Test Strategies and Automation. I lead Testing Center of Excellence, ensuring high-quality products across Frontend, Backend, and App Testing. "Quality is in the details" defines my approach—creating seamless, impactful user experiences. I embrace challenges, learn from failure, and take risks to drive success.

Subscribe to QABash Weekly 💥

Dominate – Stay Ahead of 99% Testers!

Leave a Reply

Scroll to Top