Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Exam Professional Data Engineer Topic 5 Question 103 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 103
Topic #: 5
[All Professional Data Engineer Questions]

You store and analyze your relational data in BigQuery on Google Cloud with all data that resides in US regions. You also have a variety of object stores across Microsoft Azure and Amazon Web Services (AWS), also in US regions. You want to query all your data in BigQuery daily with as little movement of data as possible. What should you do?

Show Suggested Answer Hide Answer
Suggested Answer: C

To improve the performance of querying a large BigQuery table with filters on countryname and username, clustering the table by these fields is the most effective approach. Here's why option C is the best choice:

Clustering in BigQuery:

Clustering organizes data based on the values in specified columns. This can significantly improve query performance by reducing the amount of data scanned during query execution.

Clustering by countryname and username means that data is physically sorted and stored together based on these fields, allowing BigQuery to quickly locate and read only the relevant data for queries using these filters.

Filter Efficiency:

With the table clustered by countryname and username, queries that filter on these columns can benefit from efficient data retrieval, reducing the amount of data processed and speeding up query execution.

This directly addresses the performance issue of the dashboard queries that apply filters on these fields.

Steps to Implement:

Redesign the Table:

Create a new table with clustering on countryname and username:

CREATE TABLE project.dataset.new_table

CLUSTER BY countryname, username AS

SELECT * FROM project.dataset.customer_order;

Migrate Data:

Transfer the existing data from the original table to the new clustered table.

Update Queries:

Modify the dashboard queries to reference the new clustered table.


BigQuery Clustering Documentation

Optimizing Query Performance

Contribute your Thoughts:

Maryln
2 days ago
Haha, I'm glad they're not asking us to physically carry the data between cloud providers. That would be a real workout!
upvoted 0 times
...
Melynda
13 days ago
I'm not sure, but option D seems like a good option too. Using BigQuery Data Transfer Service can load files from Azure and AWS into BigQuery efficiently.
upvoted 0 times
...
Gilma
14 days ago
I disagree, I believe option C is the way to go. BigQuery Omni can query files in Azure and AWS without moving the data.
upvoted 0 times
...
Elroy
16 days ago
I think option B is the best choice because Dataflow can ingest files from Azure and AWS directly into BigQuery.
upvoted 0 times
...
Luis
17 days ago
Hmm, I'm not sure about using Cloud Shell and rsync to load files. Seems a bit manual and error-prone. I'd prefer an automated solution like Dataflow or Data Transfer Service.
upvoted 0 times
...
Nana
18 days ago
Option C seems like the way to go. Integrating BigQuery Omni and BigLake tables allows me to query all my data from different cloud providers without having to move it around. Efficiency FTW!
upvoted 0 times
...

Save Cancel
az-700  pass4success  az-104  200-301  200-201  cissp  350-401  350-201  350-501  350-601  350-801  350-901  az-720  az-305  pl-300  

Warning: Cannot modify header information - headers already sent by (output started at /pass.php:70) in /pass.php on line 77