Definition: Simple Linear Regression is when you have one independent variable (explanatory) predicting your dependent variable (response).
Step 1: Import your dataset
For csv file:
## Import csv file using read.csv()
data_name <- read.csv("dataset.csv")
For text file:
## Import text file using read.table()
data_name <- read.table("dataset.txt")## OR
## Import text file using read_table()
data_name <- read_table("dataset.txt")
If you are using a dataset from GitHub, you can import dataset from a URL
Importing URL from GitHub:
## store your url dataset in a variable
url <- "https://raw.githubusercontent.com/username/main/dataset.csv"## use that variable to import data
new_data <- read.csv(url)
Don’t forget to include and close quotation mark before running the cell, otherwise it will give you an error.
Extra Step: This can be done before or after importing dataset
You must install the necessary packages for your analysis because, if you are working with a new server or Google Colab, it doesn’t save your installed packages.
## This is an example of how to install packages in R## Install ggplot2 for visualization
install.packages("ggplot2")
## Install dplyr for data manipulation
install.packages("dplyr")
## Don't forget to CALL those packages by using library() function
library(ggplot2)
library(dplyr)
ggplot2 and dplyr are not required for simple linear regression, assuming you have a clean dataset and planning to use simple visualization with plot().
Step 2: Check your dataset to ensure all variables have the correct data types.
## This will output dataset from different window/panel
View(data_name)## Or view the first 5 rows of dataset using head() function
head(data_name)
## likewise if you want to see the last 5 rows in your data use tail()
tail(data_name)
## another way is using str() it gives you the structure of your dataset
str(data_name)
Before you start your prediction, it is wise to know and identify which are the target and features variables for this analysis.
- Y is your target variable, it’s what you are trying to predict given independent variable (X)
- Independent variable, also known as the explanatory variable or X, is used to predict Y.
Step 3: Plot your X and Y variables to see the linear relationship
## Plot a scatterplot
plot(data_name$X, data_name$Y, ## these are the x and y value in dataset
main = "X vs Y", ## title
xlab = "X", ## x-axis label
ylab = "Y") ## y-axis label## Another way of plotting a scatterplot using ggplot2
ggplot(df_name, aes(x = df_name$X, y = df_names$Y)) +
geom_point() ## this creates the scatterplot