Definition: Simple Linear Regression is when you have one independent variable (explanatory) predicting your dependent variable (response).
Step 1: Import your dataset
For csv file:
## Import csv file using read.csv()
data_name <- read.csv("dataset.csv")
For text file:
## Import text file using read.table()
data_name <- read.table("dataset.txt")## OR
## Import text file using read_table()
data_name <- read_table("dataset.txt")
If you are using a dataset from GitHub, you can import dataset from a URL
Importing URL from GitHub:
## store your url dataset in a variable
url <- ""## use that variable to import data
new_data <- read.csv(url)
Don’t forget to include and close quotation mark before running the cell, otherwise it will give you an error.
Extra Step: This can be done before or after importing dataset
You must install the necessary packages for your analysis because, if you are working with a new server or Google Colab, it doesn’t save your installed packages.
## This is an example of how to install packages in R## Install ggplot2 for visualization
## Install dplyr for data manipulation
## Don't forget to CALL those packages by using library() function
ggplot2 and dplyr are not required for simple linear regression, assuming you have a clean dataset and planning to use simple visualization with plot().
Step 2: Check your dataset to ensure all variables have the correct data types.
## This will output dataset from different window/panel
View(data_name)## Or view the first 5 rows of dataset using head() function
## likewise if you want to see the last 5 rows in your data use tail()
## another way is using str() it gives you the structure of your dataset
Before you start your prediction, it is wise to know and identify which are the target and features variables for this analysis.
- Y is your target variable, it’s what you are trying to predict given independent variable (X)
- Independent variable, also known as the explanatory variable or X, is used to predict Y.
Step 3: Plot your X and Y variables to see the linear relationship
## Plot a scatterplot
plot(data_name$X, data_name$Y, ## these are the x and y value in dataset
main = "X vs Y", ## title
xlab = "X", ## x-axis label
ylab = "Y") ## y-axis label## Another way of plotting a scatterplot using ggplot2
ggplot(df_name, aes(x = df_name$X, y = df_names$Y)) +
geom_point() ## this creates the scatterplot