Get Kaggle Datasets via API in Python | by Opeyemi Gbenga | Jan, 2025


Here is a guide to getting Kaggle datasets using the Kaggle API in Python:

Step 1: Sign up to Kaggle

Sign up for a free account on Kaggle
Kaggle Login or Sign up

Step 2: Create a Token

You must authenticate using an API token to use Kaggle’s public API. Go to your user profile’s ‘Account’ tab and select ‘Create New Token’. This will trigger the download of kaggle.json, a file containing your API credentials.

Step 3: Install the Kaggle Library

Make sure you have Python and pip installed on your computer. Then install the Kaggle library by running.

pip install kaggle

Step 4: Move the kaggle.json file to the correct location

To use the API credentials, the kaggle.json needs to be in a specific directory. Move it to

Linux/MacOS: ~/.kaggle/kaggle.json
Windows: C:\Users\<YourUsername>\.kaggle\kaggle.json

Replace <YourUsername> with your actual Windows username. Mine is user so C:\Users\user\.kaggle\kaggle.json

You can manually create the .kaggle folder and then move the kaggle.json file into the folder

Step 5: Authenticate using Kaggle API

In your Python script, authenticate with the API using the following code:

import kaggle
from kaggle.api.kaggle_api_extended import KaggleApi

# Initialize the Kaggle API
api = KaggleApi()
api.authenticate()

Step 6: Download the Dataset

For example to download the Sales Transaction Dataset

https://www.kaggle.com/datasets/srinivasav22/sales-transactions-dataset

Sales Transaction Dataset Kaggle Page

To download the entire dataset using api.dataset_download_files()

handle = 'srinivasav22/sales-transactions-dataset'
api.dataset_download_files(handle, path='./', unzip=True)

To download a specific file,(let’s say Test.xlsx) using api.dataset_download_file()

handle = 'srinivasav22/sales-transactions-dataset'
file = 'Test.xlsx'
api.dataset_download_file(handle, file_name=file)

The full code

import kaggle
from kaggle.api.kaggle_api_extended import KaggleApi

# Initialize the Kaggle API
api = KaggleApi()
api.authenticate()

handle = 'srinivasav22/sales-transactions-dataset'
file = 'Test.xlsx'
#to download the entire dataset
api.dataset_download_files(handle, path='./', unzip=True)
# download a specific file
api.dataset_download_file(handle, file_name=file)

You’ve successfully set up the Kaggle API and downloaded a dataset directly into your Python environment. No more manual downloads.

Happy Coding, Data Scientists! 🚀

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here