Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Machine-Learning-Associate Topic 3 Question 22 Discussion

Actual exam question for Databricks's Databricks-Machine-Learning-Associate exam
Question #: 22
Topic #: 3
[All Databricks-Machine-Learning-Associate Questions]

A machine learning engineer is trying to scale a machine learning pipeline by distributing its feature engineering process.

Which of the following feature engineering tasks will be the least efficient to distribute?

Show Suggested Answer Hide Answer
Suggested Answer: E

To display visual histograms and summaries of the numeric features in a Spark DataFrame, the Databricks utility function dbutils.data.summarize can be used. This function provides a comprehensive summary, including visual histograms.

Correct code:

dbutils.data.summarize(spark_df)

Other options like spark_df.describe() and spark_df.summary() provide textual statistical summaries but do not include visual histograms.


Databricks Utilities Documentation

Contribute your Thoughts:

Eveline
6 days ago
Target encoding? Really? That's going to be a nightmare to distribute. Imagine trying to coordinate all those little target values across the cluster. I'd rather just impute the missing values with the true median and call it a day.
upvoted 0 times
...
Michell
12 days ago
I disagree. I believe creating binary indicator features for missing values would be the least efficient task to distribute because it involves checking for missing values in each feature separately.
upvoted 0 times
...
Wai
18 days ago
Hmm, one-hot encoding seems like the obvious choice here. I mean, how hard can it be to distribute that process? It's not like we're training a neural network or anything. Just slap it on a few more servers and voila!
upvoted 0 times
...
Dortha
22 days ago
I agree with Lizbeth. Target encoding involves calculating statistics based on the target variable, which can be tricky to distribute efficiently.
upvoted 0 times
...
Lizbeth
24 days ago
I think target encoding categorical features will be the least efficient to distribute.
upvoted 0 times
...

Save Cancel
az-700  pass4success  az-104  200-301  200-201  cissp  350-401  350-201  350-501  350-601  350-801  350-901  az-720  az-305  pl-300  

Warning: Cannot modify header information - headers already sent by (output started at /pass.php:70) in /pass.php on line 77