Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Machine-Learning-Associate Topic 3 Question 26 Discussion

Actual exam question for Databricks's Databricks-Machine-Learning-Associate exam
Question #: 26
Topic #: 3
[All Databricks-Machine-Learning-Associate Questions]

A data scientist wants to use Spark ML to one-hot encode the categorical features in their PySpark DataFrame features_df. A list of the names of the string columns is assigned to the input_columns variable.

They have developed this code block to accomplish this task:

The code block is returning an error.

Which of the following adjustments does the data scientist need to make to accomplish this task?

Show Suggested Answer Hide Answer
Suggested Answer: C

For large datasets, Spark ML uses iterative optimization methods to distribute the training of a linear regression model. Specifically, Spark MLlib employs techniques like Stochastic Gradient Descent (SGD) and Limited-memory Broyden--Fletcher--Goldfarb--Shanno (L-BFGS) optimization to iteratively update the model parameters. These methods are well-suited for distributed computing environments because they can handle large-scale data efficiently by processing mini-batches of data and updating the model incrementally.


Databricks documentation on linear regression: Linear Regression in Spark ML

Contribute your Thoughts:

Jenelle
6 days ago
Wait, I think we need to use StringIndexer first to convert the string columns to numerical values. Then we can use OneHotEncoder.
upvoted 0 times
...
Ligia
10 days ago
I believe they should also use StringIndexer before one-hot encoding the features to properly encode the categorical values.
upvoted 0 times
...
Quentin
10 days ago
Hmm, the error is probably due to the fit operation. Let's try removing that line and see if it works.
upvoted 0 times
...
Essie
13 days ago
I agree with Daisy. Without specifying the method parameter, the code won't work properly.
upvoted 0 times
...
Daisy
15 days ago
I think the data scientist needs to specify the method parameter to the OneHotEncoder.
upvoted 0 times
...

Save Cancel
az-700  pass4success  az-104  200-301  200-201  cissp  350-401  350-201  350-501  350-601  350-801  350-901  az-720  az-305  pl-300  

Warning: Cannot modify header information - headers already sent by (output started at /pass.php:70) in /pass.php on line 77