Google Exam Professional Machine Learning Engineer Topic 1 Question 98 Discussion

Actual exam question for Google's Professional Machine Learning Engineer exam

Question #: 98
Topic #: 1

[All Professional Machine Learning Engineer Questions]

You trained a model on data stored in a Cloud Storage bucket. The model needs to be retrained frequently in Vertex AI Training using the latest data in the bucket. Data preprocessing is required prior to retraining. You want to build a simple and efficient near-real-time ML pipeline in Vertex AI that will preprocess the data when new data arrives in the bucket. What should you do?

ACreate a pipeline using the Vertex AI SDK. Schedule the pipeline with Cloud Scheduler to preprocess the new data in the bucket. Store the processed features in Vertex AI Feature Store.

BCreate a Cloud Run function that is triggered when new data arrives in the bucket. The function initiates a Vertex AI Pipeline to preprocess the new data and store the processed features in Vertex AI Feature Store.

CBuild a Dataflow pipeline to preprocess the new data in the bucket and store the processed features in BigQuery. Configure a cron job to trigger the pipeline execution.

DUse the Vertex AI SDK to preprocess the new data in the bucket prior to each model retraining. Store the processed features in BigQuery.

Show Suggested Answer

Suggested Answer: B

Cloud Run can be triggered on new data arrivals, which makes it ideal for near-real-time processing. The function then initiates the Vertex AI Pipeline for preprocessing and storing features in Vertex AI Feature Store, aligning with the retraining needs. Cloud Scheduler (Option A) is suitable for scheduled jobs, not event-driven triggers. Dataflow (Option C) is better suited for batch processing or ETL rather than ML preprocessing pipelines.

by Hillary at Mar 05, 2025, 08:33 PM

Limited Time Offer

25%

Off

Get Premium Professional Machine Learning Engineer Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Mike

3 months ago

I see the benefits of option C as well. Building a Dataflow pipeline to preprocess data and store features in BigQuery could be a solid choice.

upvoted 0 times

...

Doug

3 months ago

Option F: Hire a team of psychic interns to monitor the bucket and trigger the pipeline whenever they sense new data. It's the future of MLOps!

upvoted 0 times

...

Ernest

3 months ago

Option E: Write a Python script that uses a Ouija board to divine the latest data and automatically retrain the model. It's foolproof!

upvoted 0 times

Carri

2 months ago

Option E: Write a Python script that uses a Ouija board to divine the latest data and automatically retrain the model. It's foolproof!

upvoted 0 times

...

Wynell

2 months ago

B) Create a Cloud Run function that is triggered when new data arrives in the bucket. The function initiates a Vertex AI Pipeline to preprocess the new data and store the processed features in Vertex AI Feature Store.

upvoted 0 times

...

Dalene

2 months ago

A) Create a pipeline using the Vertex AI SDK. Schedule the pipeline with Cloud Scheduler to preprocess the new data in the bucket. Store the processed features in Vertex AI Feature Store.

upvoted 0 times

...

3 months ago

I'm not sure about Option C. Configuring a cron job to trigger a Dataflow pipeline seems a bit overkill for this use case. Why not just use Vertex AI's built-in capabilities?

upvoted 0 times

Eve

3 months ago

D) Use the Vertex AI SDK to preprocess the new data in the bucket prior to each model retraining. Store the processed features in BigQuery.

upvoted 0 times

...

4 months ago

Option B looks like the most efficient solution. Triggering a pipeline when new data arrives in the bucket is a great way to keep the model up-to-date in near-real-time.

upvoted 0 times

Javier

3 months ago

A) Create a pipeline using the Vertex AI SDK. Schedule the pipeline with Cloud Scheduler to preprocess the new data in the bucket. Store the processed features in Vertex AI Feature Store.

upvoted 0 times

...

Dallas

3 months ago

upvoted 0 times

...

Jackie

4 months ago

I think option A is the best choice. It allows us to create a pipeline using Vertex AI SDK and schedule it with Cloud Scheduler.

upvoted 0 times

...