Here is a guide to getting Kaggle datasets using the Kaggle API in Python:
Step 1: Sign up to Kaggle
Sign up for a free account on Kaggle
Kaggle Login or Sign up
Step 2: Create a Token
You must authenticate using an API token to use Kaggle’s public API. Go to your user profile’s ‘Account’ tab and select ‘Create New Token’. This will trigger the download of kaggle.json, a file containing your API credentials.
Step 3: Install the Kaggle Library
Make sure you have Python and pip installed on your computer. Then install the Kaggle library by running.
pip install kaggle
Step 4: Move the kaggle.json file to the correct location
To use the API credentials, the kaggle.json needs to be in a specific directory. Move it to
Linux/MacOS: ~/.kaggle/kaggle.json
Windows: C:\Users\<YourUsername>\.kaggle\kaggle.json
Replace <YourUsername> with your actual Windows username. Mine is user so C:\Users\user\.kaggle\kaggle.json
You can manually create the .kaggle folder and then move the kaggle.json file into the folder
Step 5: Authenticate using Kaggle API
In your Python script, authenticate with the API using the following code:
import kaggle
from kaggle.api.kaggle_api_extended import KaggleApi# Initialize the Kaggle API
api = KaggleApi()
api.authenticate()
Step 6: Download the Dataset
For example to download the Sales Transaction Dataset
https://www.kaggle.com/datasets/srinivasav22/sales-transactions-dataset
To download the entire dataset using api.dataset_download_files()
handle = 'srinivasav22/sales-transactions-dataset'
api.dataset_download_files(handle, path='./', unzip=True)
To download a specific file,(let’s say Test.xlsx) using api.dataset_download_file()
handle = 'srinivasav22/sales-transactions-dataset'
file = 'Test.xlsx'
api.dataset_download_file(handle, file_name=file)
The full code
import kaggle
from kaggle.api.kaggle_api_extended import KaggleApi# Initialize the Kaggle API
api = KaggleApi()
api.authenticate()
handle = 'srinivasav22/sales-transactions-dataset'
file = 'Test.xlsx'
#to download the entire dataset
api.dataset_download_files(handle, path='./', unzip=True)
# download a specific file
api.dataset_download_file(handle, file_name=file)
You’ve successfully set up the Kaggle API and downloaded a dataset directly into your Python environment. No more manual downloads.
Happy Coding, Data Scientists! 🚀