How to Import Data into BigQuery



Image by Editor | Ideogram

 

Data come from everywhere, and the number of origins, sources, and formats under which valuable data may appear underscores the need for database management tools capable of loading data from multiple sources. This tutorial illustrates how to load datasets from different formats and sources into Google BigQuery. All the prerequisites we need are having registered in Google Cloud Platform (GCP) with a Google account, having created a project in Google Cloud, and enabled the BigQuery API.

Let us assume we have a newly created Google Cloud project titled “BigQuery Project”. If we click on this project in our projects list, a screen like this will appear.

 
Google Cloud project details
 

From here, there are two gateways to the core interface of BigQuery. You can either click on the “Run a query in BigQuery” button, or scroll down to the “BigQuery” quick access and click on it. Both paths will lead to the same destination: the BigQuery Studio user interface.

 
BigQuery studio GUI
 

Time to explore four different ways to load data into BigQuery:

  1. Upload a dataset from an CSV file.
  2. Upload a dataset from a JSON file.
  3. Load a dataset from Google Cloud Storage.
  4. Ingest a dataset from Google Sheets.

The tutorial will revolve around an example in which we will load data describing dishes from different asian cuisines. Be warned: you may end up hungry by the end of this read.

 

Load Data from CSV Files

 

Creating a table in BigQuery by loading a CSV file from a local file system involves a sequence of steps.

  1. On the left-hand side pane of BigQuery Studio, click on “+ Add”, and select “Local file”.
  2.  
    Upload data from a local file
     

    Upload data from a local file
     

  3. A form will appear where you must select “Upload” from the dropdown list at the top, and select the file to upload using your file browser. If this is the first data imported in your project, you’ll first need to create a BigQuery dataset. Similar to traditional database schemas, a BigQuery dataset is a set of one or more tables grouped logically or semantically. For simplicity, we will only specify the name of the dataset, “asian_cuisines”, and leave the rest of options as default, then clicking on
  4.  
    Creating a BigQuery dataset
     

  5. Back to the file upload dialog box, let’s give a name to the table that will be created upon the CSV file upload: “chinese_cuisine”. Last, make sure to enable the “Auto detect” checkbox for recognizing the schema (column names of types) automatically upon reading the CSV file. Let’s click on “CREATE TABLE” and if everything went well, a table containing five columns of data about chinese cuisines dishes will be created in a matter of seconds.
  6. By clicking on the newly created table name that appears at the central panel of BigQuery, you’ll see its schema:

     
    Viewing a table schema
     

    Notice of the resources navigation pane on the left-hand side now incorporates the newly created dataset, “asian_cuisines”, and the imported table inside it.

 

Load Data from JSON Files

 

Now we will load another data file into a table, this time from a file in JSON format. The process is very similar to that followed for loading a CSV from your local machine, with the only difference that you can now click on the dataset on the left-hand side pane, and click on “Create Table”.

 
Create a table in an existing datasetCreate a table in an existing dataset
 

We will name the table as the namesake JSON file to upload, “japanese_cuisine”. Make sure again that the schema is auto detected, and click on “CREATE TABLE”. Once created, this time we will look at the actual data loaded. The simplest way to do this without venturing into writing and executing queries yet (be patient, we will get there very soon!), is through the “PREVIEW” tab:

 
Preview of the table data uploaded from a JSON filePreview of the table data uploaded from a JSON file
 

Important: when uploading your own JSON file, make sure each JSON object is on its own line and not encapsulated in an array, as shown in the below sample. Otherwise, BigQuery may throw an error when trying to upload the file.

 "name": "Sushi", "description": "Vinegared rice served with a variety of ingredients such as raw fish, vegetables, and seaweed.", "type": "Main Course", "preparation_time_mins": 60, "difficulty": "High" 
 "name": "Tempura", "description": "Lightly battered and deep-fried vegetables, seafood, or meat, often served with a dipping sauce.", "type": "Main Course", "preparation_time_mins": 45, "difficulty": "Medium" 
 "name": "Miso Soup", "description": "A traditional Japanese soup made from a stock called dashi, miso paste, tofu, and seaweed.", "type": "Soup", "preparation_time_mins": 15, "difficulty": "Low" 

 

Load Data from Google Cloud Storage

 

Google Cloud Storage is a secure and scalable data storage solution offered by Google Cloud. It accommodates storing a variety of unstructured data, including text and media. Data in Google Cloud Storage are typically organized into buckets, hence when trying to load data into BigQuery from Google Cloud Storage, you’ll need to specify the bucket to look at. One advantage of loading data into BigQuery from Cloud Storage is the support for recurring batch loads, as well as process scheduling. This is particularly attractive when data stored in buckets keep continuously evolving or growing.

To load a data file in Cloud Storage into BigQuery, select “Google Cloud Storage” in the “Create table from:” dropdown list at the top of the table creation dialog box. You’ll need to have at least one bucket created in Cloud Storage, and your desired data file(s) in that bucket to select them. This in turn requires having billing information activated and set up in your Google Cloud account.

 

Data Ingestion from Google Sheets

 

Last, one more option to load data is via your Google Drive file system in the cloud, concretely upon spreadsheets created with Google Sheets.
Take for instance this Google Sheets spreadsheet containing data about thai cuisine dishes.

 
Thai cuisine data in a Google Sheets fileThai cuisine data in a Google Sheets file
 

To help you replicate the example data in your google spreadsheet, here is its actual content:

name	description	type	preparation_time_mins	difficulty
Pad Thai	Stir-fried rice noodles with shrimp, tofu, peanuts, scrambled eggs, and bean sprouts, flavored with tamarind	Main Course	30	Medium
Tom Yum Goong	A hot and sour soup with shrimp, lemongrass, kaffir lime leaves, galangal, and chili peppers	Soup	25	Medium
Green Curry	A spicy curry with green chilies, coconut milk, chicken, eggplant, and Thai basil	Main Course	40	High
Som Tum	A spicy green papaya salad with peanuts, dried shrimp, tomatoes, and lime	Appetizer	20	Medium
Mango Sticky Rice	A dessert made from sticky rice, fresh mango slices, and coconut milk	Dessert	20	Low
Massaman Curry	A rich and mild curry with beef, potatoes, onions, peanuts, and spices like cinnamon and cardamom	Main Course	60	High
Thai Iced Tea	A sweet and creamy tea made with brewed black tea, spices, sugar, and condensed milk	Beverage	10	Low
Chicken Satay	Grilled skewers of marinated chicken served with a peanut sauce	Appetizer	35	Medium
Panang Curry	A thick and creamy curry with beef, peanuts, coconut milk, and Panang curry paste	Main Course	50	High
Khao Soi	A Northern Thai coconut curry noodle soup with chicken, pickled mustard greens, and crispy noodles	Main Course	45	High

 

How to import these data into another BigQuery table? There are a couple of aspects to consider here. After selecting to upload from “Drive” in the table creation dialog, just copy and paste the base URL (without suffixes, if any) into the URI field, then specify “Google Sheet” for file format. You may also want to specify the table range if you do not want to load the entire spreadsheet or it has more than one sheet. Name the table as “thai_cuisine” and enable auto schema detection as usual, and you’re done! The below figure illustrates the process.

 
Creating a table from Google SheetsCreating a table from Google Sheets
 

There is one little caveat in this case. By default, BigQuery does not offer data previewing for tables created upon Google spreadsheets. Don’t panic, the data are still there, you only need to query it, as we will cover in the next tutorial of this series.

 
Querying delicious thai dishes imported from Google SheetsQuerying delicious thai dishes imported from Google Sheets
 

Now that we have our three tables about asian cuisines imported in our project’s BigQuery, in the next tutorials we’ll start performing queries, learning the syntax and capabilities of BigQuery’s SQL along the way.

 
 

Iván Palomares Carrascosa is a leader, writer, speaker, and adviser in AI, machine learning, deep learning & LLMs. He trains and guides others in harnessing AI in the real world.

Our Top 3 Course Recommendations

1. Google Cybersecurity Certificate – Get on the fast track to a career in cybersecurity.

2. Google Data Analytics Professional Certificate – Up your data analytics game

3. Google IT Support Professional Certificate – Support your organization in IT

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here