Name: Databricks Databricks Certified Data Engineer Professional Exam
Brand: Pass4Success
SKU: Databricks Certified Data Engineer Professional
Price: 69.00 USD
Availability: InStock
Rating: 4.9 (165 reviews)

Disscuss Databricks Databricks Certified Data Engineer Professional Topics, Questions or Ask Anything Related

Submit Cancel

Currently there are no comments in this discussion, be the first to comment!

Free Databricks Databricks Certified Data Engineer Professional Exam Actual Questions

Note: Premium Questions for Databricks Certified Data Engineer Professional were last updated On May. 10, 2024 (see below)

Question #1

Spill occurs as a result of executing various wide transformations. However, diagnosing spill requires one to proactively look for key indicators.

Where in the Spark UI are two of the primary indicators that a partition is spilling to disk?

AStage's detail screen and Executor's files

BStage's detail screen and Query's detail screen

CDriver's and Executor's log files

DExecutor's detail screen and Executor's log files

Reveal Solution

Correct Answer: B

In Apache Spark's UI, indicators of data spilling to disk during the execution of wide transformations can be found in the Stage's detail screen and the Query's detail screen. These screens provide detailed metrics about each stage of a Spark job, including information about memory usage and spill data. If a task is spilling data to disk, it indicates that the data being processed exceeds the available memory, causing Spark to spill data to disk to free up memory. This is an important performance metric as excessive spill can significantly slow down the processing.

Apache Spark Monitoring and Instrumentation: Spark Monitoring Guide

Spark UI Explained: Spark UI Documentation

Question #2

The DevOps team has configured a production workload as a collection of notebooks scheduled to run daily using the Jobs Ul. A new data engineering hire is onboarding to the team and has requested access to one of these notebooks to review the production logic.

What are the maximum notebook permissions that can be granted to the user without allowing accidental changes to production code or data?

ACan manage

BCan edit

CCan run

DCan Read

Reveal Solution

Correct Answer: D

Granting a user 'Can Read' permissions on a notebook within Databricks allows them to view the notebook's content without the ability to execute or edit it. This level of permission ensures that the new team member can review the production logic for learning or auditing purposes without the risk of altering the notebook's code or affecting production data and workflows. This approach aligns with best practices for maintaining security and integrity in production environments, where strict access controls are essential to prevent unintended modifications. Reference: Databricks documentation on access control and permissions for notebooks within the workspace (https://docs.databricks.com/security/access-control/workspace-acl.html).

Question #3

The view updates represents an incremental batch of all newly ingested data to be inserted or updated in the customers table.

The following logic is used to process these records.

MERGE INTO customers

USING (

SELECT updates.customer_id as merge_ey, updates .*

FROM updates

UNION ALL

SELECT NULL as merge_key, updates .*

FROM updates JOIN customers

ON updates.customer_id = customers.customer_id

WHERE customers.current = true AND updates.address <> customers.address

) staged_updates

ON customers.customer_id = mergekey

WHEN MATCHED AND customers. current = true AND customers.address <> staged_updates.address THEN

UPDATE SET current = false, end_date = staged_updates.effective_date

WHEN NOT MATCHED THEN

INSERT (customer_id, address, current, effective_date, end_date)

VALUES (staged_updates.customer_id, staged_updates.address, true, staged_updates.effective_date, null)

Which statement describes this implementation?

AThe customers table is implemented as a Type 2 table; old values are overwritten and new customers are appended.

BThe customers table is implemented as a Type 1 table; old values are overwritten by new values and no history is maintained.

CThe customers table is implemented as a Type 2 table; old values are maintained but marked as no longer current and new values are inserted.

DThe customers table is implemented as a Type 0 table; all writes are append only with no changes to existing values.

Reveal Solution

Correct Answer: C

The provided MERGE statement is a classic implementation of a Type 2 SCD in a data warehousing context. In this approach, historical data is preserved by keeping old records (marking them as not current) and adding new records for changes. Specifically, when a match is found and there's a change in the address, the existing record in the customers table is updated to mark it as no longer current (current = false), and an end date is assigned (end_date = staged_updates.effective_date). A new record for the customer is then inserted with the updated information, marked as current. This method ensures that the full history of changes to customer information is maintained in the table, allowing for time-based analysis of customer data. Reference: Databricks documentation on implementing SCDs using Delta Lake and the MERGE statement (https://docs.databricks.com/delta/delta-update.html#upsert-into-a-table-using-merge).

Question #4

A data engineer is testing a collection of mathematical functions, one of which calculates the area under a curve as described by another function.

Which kind of the test does the above line exemplify?

AIntegration

BUnit

CManual

Dfunctional

Reveal Solution

Correct Answer: B

A unit test is designed to verify the correctness of a small, isolated piece of code, typically a single function. Testing a mathematical function that calculates the area under a curve is an example of a unit test because it is testing a specific, individual function to ensure it operates as expected.

Software Testing Fundamentals: Unit Testing

Question #5

The view updates represents an incremental batch of all newly ingested data to be inserted or updated in the customers table.

The following logic is used to process these records.

MERGE INTO customers

USING (

SELECT updates.customer_id as merge_ey, updates .*

FROM updates

UNION ALL

SELECT NULL as merge_key, updates .*

FROM updates JOIN customers

ON updates.customer_id = customers.customer_id

WHERE customers.current = true AND updates.address <> customers.address

) staged_updates

ON customers.customer_id = mergekey

WHEN MATCHED AND customers. current = true AND customers.address <> staged_updates.address THEN

UPDATE SET current = false, end_date = staged_updates.effective_date

WHEN NOT MATCHED THEN

INSERT (customer_id, address, current, effective_date, end_date)

VALUES (staged_updates.customer_id, staged_updates.address, true, staged_updates.effective_date, null)

Which statement describes this implementation?

AThe customers table is implemented as a Type 2 table; old values are overwritten and new customers are appended.

BThe customers table is implemented as a Type 1 table; old values are overwritten by new values and no history is maintained.

CThe customers table is implemented as a Type 2 table; old values are maintained but marked as no longer current and new values are inserted.

DThe customers table is implemented as a Type 0 table; all writes are append only with no changes to existing values.

Reveal Solution

Correct Answer: C

Unlock Premium Databricks Certified Data Engineer Professional Exam Questions with Advanced Practice Test Features:

Select Question Types you want
Set your Desired Pass Percentage
Allocate Time (Hours : Minutes)
Create Multiple Practice tests with Limited Questions
Customer Support

Get Full Access Now