Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Machine-Learning-Associate Topic 3 Question 23 Discussion

Actual exam question for Databricks's Databricks-Machine-Learning-Associate exam
Question #: 23
Topic #: 3
[All Databricks-Machine-Learning-Associate Questions]

A data scientist is performing hyperparameter tuning using an iterative optimization algorithm. Each evaluation of unique hyperparameter values is being trained on a single compute node. They are performing eight total evaluations across eight total compute nodes. While the accuracy of the model does vary over the eight evaluations, they notice there is no trend of improvement in the accuracy. The data scientist believes this is due to the parallelization of the tuning process.

Which change could the data scientist make to improve their model accuracy over the course of their tuning process?

Show Suggested Answer Hide Answer
Suggested Answer: C

For large datasets, Spark ML uses iterative optimization methods to distribute the training of a linear regression model. Specifically, Spark MLlib employs techniques like Stochastic Gradient Descent (SGD) and Limited-memory Broyden--Fletcher--Goldfarb--Shanno (L-BFGS) optimization to iteratively update the model parameters. These methods are well-suited for distributed computing environments because they can handle large-scale data efficiently by processing mini-batches of data and updating the model incrementally.


Databricks documentation on linear regression: Linear Regression in Spark ML

Contribute your Thoughts:

Danica
19 hours ago
I'd go with option A. Fewer compute nodes than evaluations might introduce some serial processing, but it could help the data scientist identify a more consistent trend in the results.
upvoted 0 times
...
Herman
6 days ago
Hmm, I'm not sure. The question mentions no trend of improvement, so maybe option B could work to explore a larger hyperparameter space. Worth a shot!
upvoted 0 times
...
Lilli
8 days ago
I think changing the iterative optimization algorithm used could also help improve the model accuracy.
upvoted 0 times
...
Maia
10 days ago
I disagree, I believe they should change the number of compute nodes to be double or more than double the number of evaluations.
upvoted 0 times
...
Avery
13 days ago
Option D seems like the way to go. Doubling the number of compute nodes should allow for more parallel evaluations and potentially improve the model accuracy.
upvoted 0 times
...
Nydia
18 days ago
I think option C is the best choice. Changing the optimization algorithm could help the data scientist explore the hyperparameter space more effectively and potentially find a better model.
upvoted 0 times
Emogene
2 days ago
I think option C is the best choice.
upvoted 0 times
...
...
Niesha
21 days ago
I think the data scientist should change the number of compute nodes to be half or less than half of the number of evaluations.
upvoted 0 times
...

Save Cancel
az-700  pass4success  az-104  200-301  200-201  cissp  350-401  350-201  350-501  350-601  350-801  350-901  az-720  az-305  pl-300  

Warning: Cannot modify header information - headers already sent by (output started at /pass.php:70) in /pass.php on line 77