How to build Linear Model in R: Simple Linear Regression | by Trina M | Oct, 2024


Definition: Simple Linear Regression is when you have one independent variable (explanatory) predicting your dependent variable (response).

Step 1: Import your dataset

For csv file:

## Import csv file using read.csv()
data_name <- read.csv("dataset.csv")

For text file:

## Import text file using read.table()
data_name <- read.table("dataset.txt")

## OR

## Import text file using read_table()
data_name <- read_table("dataset.txt")

If you are using a dataset from GitHub, you can import dataset from a URL

Importing URL from GitHub:

## store your url dataset in a variable
url <- "https://raw.githubusercontent.com/username/main/dataset.csv"

## use that variable to import data
new_data <- read.csv(url)

Don’t forget to include and close quotation mark before running the cell, otherwise it will give you an error.

Extra Step: This can be done before or after importing dataset

You must install the necessary packages for your analysis because, if you are working with a new server or Google Colab, it doesn’t save your installed packages.

## This is an example of how to install packages in R

## Install ggplot2 for visualization
install.packages("ggplot2")

## Install dplyr for data manipulation
install.packages("dplyr")

## Don't forget to CALL those packages by using library() function
library(ggplot2)
library(dplyr)

ggplot2 and dplyr are not required for simple linear regression, assuming you have a clean dataset and planning to use simple visualization with plot().

Step 2: Check your dataset to ensure all variables have the correct data types.

## This will output dataset from different window/panel
View(data_name)

## Or view the first 5 rows of dataset using head() function
head(data_name)

## likewise if you want to see the last 5 rows in your data use tail()
tail(data_name)

## another way is using str() it gives you the structure of your dataset
str(data_name)

Before you start your prediction, it is wise to know and identify which are the target and features variables for this analysis.

  • Y is your target variable, it’s what you are trying to predict given independent variable (X)
  • Independent variable, also known as the explanatory variable or X, is used to predict Y.

Step 3: Plot your X and Y variables to see the linear relationship

## Plot a scatterplot
plot(data_name$X, data_name$Y, ## these are the x and y value in dataset
main = "X vs Y", ## title
xlab = "X", ## x-axis label
ylab = "Y") ## y-axis label

## Another way of plotting a scatterplot using ggplot2
ggplot(df_name, aes(x = df_name$X, y = df_names$Y)) +
geom_point() ## this creates the scatterplot

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here