You have an existing weekly Storage Transfer Service transfer job from Amazon S3 to a Nearline Cloud Storage bucket in Google Cloud. Each week, the job moves a large number of relatively small files. As the number of files to be transferred each week has grown over time, you are at risk of no longer completing the transfer in the allocated time frame. You need to decrease the total transfer time by replacing the process. Your solution should minimize costs where possible. What should you do?
Comprehensive and Detailed in Depth
Why B is correct:Creating parallel transfer jobs by using include and exclude prefixes allows you to split the data into smaller chunks and transfer them in parallel.
This can significantly increase throughput and reduce the overall transfer time.
Why other options are incorrect:A: Changing the storage class to Standard will not improve transfer speed.
C: Dataflow is a complex solution for a simple file transfer task.
D: Agent-based transfer is suitable for large files or network limitations, but not for a large number of small files.
You are a database administrator managing sales transaction data by region stored in a BigQuery table. You need to ensure that each sales representative can only see the transactions in their region. What should you do?
Creating a row-level access policy in BigQuery ensures that each sales representative can see only the transactions relevant to their region. Row-level access policies allow you to define fine-grained access control by filtering rows based on specific conditions, such as matching the sales representative's region. This approach enforces security while providing tailored data access, aligning with the principle of least privilege.
Your organization has a petabyte of application logs stored as Parquet files in Cloud Storage. You need to quickly perform a one-time SQL-based analysis of the files and join them to data that already resides in BigQuery. What should you do?
Creating external tables over the Parquet files in Cloud Storage allows you to perform SQL-based analysis and joins with data already in BigQuery without needing to load the files into BigQuery. This approach is efficient for a one-time analysis as it avoids the time and cost associated with loading large volumes of data into BigQuery. External tables provide seamless integration with Cloud Storage, enabling quick and cost-effective analysis of data stored in Parquet format.
You need to create a data pipeline that streams event information from applications in multiple Google Cloud regions into BigQuery for near real-time analysis. The data requires transformation before loading. You want to create the pipeline using a visual interface. What should you do?
Pushing event information to a Pub/Sub topic and then creating a Dataflow job using the Dataflow job builder is the most suitable solution. The Dataflow job builder provides a visual interface to design pipelines, allowing you to define transformations and load data into BigQuery. This approach is ideal for streaming data pipelines that require near real-time transformations and analysis. It ensures scalability across multiple regions and integrates seamlessly with Pub/Sub for event ingestion and BigQuery for analysis.
Your organization uses scheduled queries to perform transformations on data stored in BigQuery. You discover that one of your scheduled queries has failed. You need to troubleshoot the issue as quickly as possible. What should you do?
Arthur
20 days agoGracia
2 months agoSean
3 months agoCarma
3 months agoShaquana
4 months agoSocorro
4 months agoPauline
4 months ago