Google Exam Professional Data Engineer Topic 5 Question 103 Discussion

Actual exam question for Google's Professional Data Engineer exam

Question #: 103
Topic #: 5

[All Professional Data Engineer Questions]

You store and analyze your relational data in BigQuery on Google Cloud with all data that resides in US regions. You also have a variety of object stores across Microsoft Azure and Amazon Web Services (AWS), also in US regions. You want to query all your data in BigQuery daily with as little movement of data as possible. What should you do?

ALoad files from AWS and Azure to Cloud Storage with Cloud Shell gautil rsync arguments.

BCreate a Dataflow pipeline to ingest files from Azure and AWS to BigQuery.

CUse the BigQuery Omni functionality and BigLake tables to query files in Azure and AWS.

DUse BigQuery Data Transfer Service to load files from Azure and AWS into BigQuery.

Show Suggested Answer

Suggested Answer: C

To improve the performance of querying a large BigQuery table with filters on countryname and username, clustering the table by these fields is the most effective approach. Here's why option C is the best choice:

Clustering in BigQuery:

Clustering organizes data based on the values in specified columns. This can significantly improve query performance by reducing the amount of data scanned during query execution.

Clustering by countryname and username means that data is physically sorted and stored together based on these fields, allowing BigQuery to quickly locate and read only the relevant data for queries using these filters.

Filter Efficiency:

With the table clustered by countryname and username, queries that filter on these columns can benefit from efficient data retrieval, reducing the amount of data processed and speeding up query execution.

This directly addresses the performance issue of the dashboard queries that apply filters on these fields.

Steps to Implement:

Redesign the Table:

Create a new table with clustering on countryname and username:

CREATE TABLE project.dataset.new_table

CLUSTER BY countryname, username AS

SELECT * FROM project.dataset.customer_order;

Migrate Data:

Transfer the existing data from the original table to the new clustered table.

Update Queries:

Modify the dashboard queries to reference the new clustered table.

BigQuery Clustering Documentation

Optimizing Query Performance

by Izetta at Feb 08, 2025, 08:58 PM

Limited Time Offer

25%

Off

Get Premium Professional Data Engineer Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Keena

2 months ago

Option D looks good to me. The BigQuery Data Transfer Service sounds like the easiest way to get all my data into BigQuery without any fuss. Just set it and forget it!

upvoted 0 times

Chaya

2 days ago

Yeah, it definitely sounds like the most efficient way to query all your data in BigQuery daily with minimal data movement. Option D is the way to go.

upvoted 0 times

...

Maurine

8 days ago

I agree, using the BigQuery Data Transfer Service seems like the most convenient option for querying data from Azure and AWS in BigQuery.

upvoted 0 times

...

Clemencia

13 days ago

Definitely, it's a hassle-free way to get all your data into BigQuery without much effort.

upvoted 0 times

...

Clay

17 days ago

I agree, using the BigQuery Data Transfer Service seems like the most convenient option.

upvoted 0 times

...

Val

23 days ago

Option D looks good to me. The BigQuery Data Transfer Service sounds like the easiest way to get all my data into BigQuery without any fuss. Just set it and forget it!

upvoted 0 times

...

Cherelle

1 months ago

Option D sounds like the easiest way to go. Just set it up and let it do the work for you.

upvoted 0 times

...

Maryln

2 months ago

Haha, I'm glad they're not asking us to physically carry the data between cloud providers. That would be a real workout!

upvoted 0 times

Tamar

8 days ago

D) Use BigQuery Data Transfer Service to load files from Azure and AWS into BigQuery.

upvoted 0 times

...

Desmond

12 days ago

C) Use the BigQuery Omni functionality and BigLake tables to query files in Azure and AWS.

upvoted 0 times

...

Van

14 days ago

B) Create a Dataflow pipeline to ingest files from Azure and AWS to BigQuery.

upvoted 0 times

...

Twila

1 months ago

A) Load files from AWS and Azure to Cloud Storage with Cloud Shell gautil rsync arguments.

upvoted 0 times

...

Melynda

2 months ago

I'm not sure, but option D seems like a good option too. Using BigQuery Data Transfer Service can load files from Azure and AWS into BigQuery efficiently.

upvoted 0 times

...

Gilma

2 months ago

I disagree, I believe option C is the way to go. BigQuery Omni can query files in Azure and AWS without moving the data.

upvoted 0 times

...

Elroy

2 months ago

I think option B is the best choice because Dataflow can ingest files from Azure and AWS directly into BigQuery.

upvoted 0 times

...

Luis

2 months ago

Hmm, I'm not sure about using Cloud Shell and rsync to load files. Seems a bit manual and error-prone. I'd prefer an automated solution like Dataflow or Data Transfer Service.

upvoted 0 times

...

Nana

2 months ago

Option C seems like the way to go. Integrating BigQuery Omni and BigLake tables allows me to query all my data from different cloud providers without having to move it around. Efficiency FTW!

upvoted 0 times

Thersa

1 months ago

Definitely! It's all about optimizing data access and analysis while keeping data transfer to a minimum.

upvoted 0 times

...

Sherron

1 months ago

I agree, it's great to have a solution that minimizes data movement and allows for efficient querying across different cloud platforms.

upvoted 0 times

...

Cyril

1 months ago

That's a smart choice! BigQuery Omni is designed for exactly that purpose, to query data from multiple cloud providers seamlessly.

upvoted 0 times

...

Kallie

1 months ago

Option C seems like the way to go. Integrating BigQuery Omni and BigLake tables allows me to query all my data from different cloud providers without having to move it around. Efficiency FTW!

upvoted 0 times

...