Databricks Exam Databricks Machine Learning Associate Topic 3 Question 26 Discussion

Actual exam question for Databricks's Databricks Machine Learning Associate exam

Question #: 26
Topic #: 3

[All Databricks Machine Learning Associate Questions]

A data scientist wants to use Spark ML to one-hot encode the categorical features in their PySpark DataFrame features_df. A list of the names of the string columns is assigned to the input_columns variable.

They have developed this code block to accomplish this task:

The code block is returning an error.

Which of the following adjustments does the data scientist need to make to accomplish this task?

AThey need to specify the method parameter to the OneHotEncoder.

BThey need to remove the line with the fit operation.

CThey need to use Stringlndexer prior to one-hot encodinq the features.

DThey need to use VectorAssembler prior to one-hot encoding the features.

Show Suggested Answer

Suggested Answer: C

For large datasets, Spark ML uses iterative optimization methods to distribute the training of a linear regression model. Specifically, Spark MLlib employs techniques like Stochastic Gradient Descent (SGD) and Limited-memory Broyden--Fletcher--Goldfarb--Shanno (L-BFGS) optimization to iteratively update the model parameters. These methods are well-suited for distributed computing environments because they can handle large-scale data efficiently by processing mini-batches of data and updating the model incrementally.

Databricks documentation on linear regression: Linear Regression in Spark ML

by Brock at Mar 04, 2025, 05:06 AM

Limited Time Offer

25%

Off

Get Premium Databricks Machine Learning Associate Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Thurman

1 months ago

Maybe the error is because they forgot to add the 'sparkly' parameter to the OneHotEncoder. You know, to make it extra fabulous.

upvoted 0 times

...

Shannan

1 months ago

I heard the data scientist tried to one-hot encode their socks. Turns out they were just a bunch of ones and zeros!

upvoted 0 times

Daniela

11 days ago

C: They need to use VectorAssembler prior to one-hot encoding the features.

upvoted 0 times

...

Delisa

12 days ago

B: They need to use StringIndexer prior to one-hot encoding the features.

upvoted 0 times

...

Hannah

16 days ago

A: They need to specify the method parameter to the OneHotEncoder.

upvoted 0 times

...

Wilburn

16 days ago

A: They need to specify the method parameter to the OneHotEncoder.

upvoted 0 times

...

Alfred

1 months ago

VectorAssembler? Sounds like a superhero name. Maybe that's the solution, but I'm not sure.

upvoted 0 times

Pura

6 days ago

User3: Maybe they need to use StringIndexer before one-hot encoding the features.

upvoted 0 times

...

Lonny

8 days ago

User2: No, they should specify the method parameter to the OneHotEncoder.

upvoted 0 times

...

Dick

20 days ago

User1: I think the data scientist needs to use VectorAssembler before one-hot encoding.

upvoted 0 times

...

Rosendo

2 months ago

Ah, I see the issue. The method parameter is missing from the OneHotEncoder. We need to specify that.

upvoted 0 times

Essie

17 days ago

User1: Let's add that parameter and see if it works.

upvoted 0 times

...

Joaquin

1 months ago

User2: Yes, that's correct. That should fix the error.

upvoted 0 times

...

Eden

1 months ago

User1: I think we need to specify the method parameter to the OneHotEncoder.

upvoted 0 times

...

Jenelle

2 months ago

Wait, I think we need to use StringIndexer first to convert the string columns to numerical values. Then we can use OneHotEncoder.

upvoted 0 times

Latrice

4 days ago

D: And OneHotEncoder will then encode those numerical values as binary vectors.

upvoted 0 times

...

Osvaldo

5 days ago

C: That makes sense, StringIndexer will convert the strings to numerical values.

upvoted 0 times

...

Whitley

7 days ago

B: Then we can use OneHotEncoder to encode the categorical features.

upvoted 0 times

...

Reita

1 months ago

A: I think you're right, we should use StringIndexer first.

upvoted 0 times

...

Ligia

2 months ago

I believe they should also use StringIndexer before one-hot encoding the features to properly encode the categorical values.

upvoted 0 times

...

Quentin

2 months ago

Hmm, the error is probably due to the fit operation. Let's try removing that line and see if it works.

upvoted 0 times

Lawrence

1 months ago

User2: Yeah, let's try that and see if it fixes the error.

upvoted 0 times

...

Adaline

2 months ago

User1: I think we should remove the line with the fit operation.

upvoted 0 times

...

Essie

2 months ago

I agree with Daisy. Without specifying the method parameter, the code won't work properly.

upvoted 0 times

...

Daisy

2 months ago

I think the data scientist needs to specify the method parameter to the OneHotEncoder.

upvoted 0 times

...