Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Exam Associate Data Practitioner Topic 3 Question 9 Discussion

Actual exam question for Google's Associate Data Practitioner exam
Question #: 9
Topic #: 3
[All Associate Data Practitioner Questions]

Your organization has a petabyte of application logs stored as Parquet files in Cloud Storage. You need to quickly perform a one-time SQL-based analysis of the files and join them to data that already resides in BigQuery. What should you do?

Show Suggested Answer Hide Answer
Suggested Answer: C

Creating external tables over the Parquet files in Cloud Storage allows you to perform SQL-based analysis and joins with data already in BigQuery without needing to load the files into BigQuery. This approach is efficient for a one-time analysis as it avoids the time and cost associated with loading large volumes of data into BigQuery. External tables provide seamless integration with Cloud Storage, enabling quick and cost-effective analysis of data stored in Parquet format.


Contribute your Thoughts:

Rory
5 days ago
Option B seems like the way to go. Cloud Data Fusion makes it easy to connect multiple data sources and perform complex analyses.
upvoted 0 times
...
Monroe
6 days ago
Ha! Looks like we've got some 'Big Data' on our hands. I'd go with option D - keep it simple, stupid!
upvoted 0 times
...
Theresia
9 days ago
PySpark is overkill for a one-time analysis. Option C looks like the most straightforward approach here.
upvoted 0 times
...
Ardella
11 days ago
I'm not a fan of external tables - they can be a bit of a pain to manage. I'd go with option D and just load the Parquet files directly into BigQuery.
upvoted 0 times
...
Irma
17 days ago
I think option D could work too, loading the files into BigQuery.
upvoted 0 times
...
Filiberto
18 days ago
I prefer option C, creating external tables over the files in Cloud Storage.
upvoted 0 times
...
Fairy
22 days ago
I agree, using Dataproc cluster with PySpark seems efficient.
upvoted 0 times
...
Daniel
26 days ago
Cloud Data Fusion seems like the easiest way to get this done. No need to write any code!
upvoted 0 times
Sarah
8 days ago
A) Create a Dataproc cluster, and write a PySpark job to join the data from BigQuery to the files in Cloud Storage.
upvoted 0 times
...
...
Sommer
1 months ago
I think option A sounds like a good idea.
upvoted 0 times
...

Save Cancel
az-700  pass4success  az-104  200-301  200-201  cissp  350-401  350-201  350-501  350-601  350-801  350-901  az-720  az-305  pl-300  

Warning: Cannot modify header information - headers already sent by (output started at /pass.php:70) in /pass.php on line 77