Databricks Exam Databricks-Machine-Learning-Associate Topic 2 Question 20 Discussion

Actual exam question for Databricks's Databricks-Machine-Learning-Associate exam

Question #: 20
Topic #: 2

[All Databricks-Machine-Learning-Associate Questions]

An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository.

Which of the following explanations justifies this suggestion?

AOne-hot encoding is a potentially problematic categorical variable strategy for some machine learning algorithms.

BOne-hot encoding is dependent on the target variable's values which differ for each apaplication.

COne-hot encoding is computationally intensive and should only be performed on small samples of training sets for individual machine learning problems.

DOne-hot encoding is not a common strategy for representing categorical feature variables numerically.

Show Suggested Answer

Suggested Answer: A

In Spark ML, a transformer is an algorithm that can transform one DataFrame into another DataFrame. It takes a DataFrame as input and produces a new DataFrame as output. This transformation can involve adding new columns, modifying existing ones, or applying feature transformations. Examples of transformers in Spark MLlib include feature transformers like StringIndexer, VectorAssembler, and StandardScaler.

Databricks documentation on transformers: Transformers in Spark ML

by Stephanie at Oct 20, 2024, 04:27 PM

Limited Time Offer

25%

Off

Get Premium Databricks-Machine-Learning-Associate Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Glenna

6 days ago

B) Hmm, that makes sense. The target variable can vary across different applications, so one-hot encoding shouldn't be done at the feature repository level.

upvoted 0 times

...