Databricks Exam Databricks-Machine-Learning-Associate Topic 3 Question 23 Discussion

Actual exam question for Databricks's Databricks-Machine-Learning-Associate exam

Question #: 23
Topic #: 3

[All Databricks-Machine-Learning-Associate Questions]

A data scientist is performing hyperparameter tuning using an iterative optimization algorithm. Each evaluation of unique hyperparameter values is being trained on a single compute node. They are performing eight total evaluations across eight total compute nodes. While the accuracy of the model does vary over the eight evaluations, they notice there is no trend of improvement in the accuracy. The data scientist believes this is due to the parallelization of the tuning process.

Which change could the data scientist make to improve their model accuracy over the course of their tuning process?

AChange the number of compute nodes to be half or less than half of the number of evaluations.

BChange the number of compute nodes and the number of evaluations to be much larger but equal.

CChange the iterative optimization algorithm used to facilitate the tuning process.

DChange the number of compute nodes to be double or more than double the number of evaluations.

Show Suggested Answer

Suggested Answer: C

For large datasets, Spark ML uses iterative optimization methods to distribute the training of a linear regression model. Specifically, Spark MLlib employs techniques like Stochastic Gradient Descent (SGD) and Limited-memory Broyden--Fletcher--Goldfarb--Shanno (L-BFGS) optimization to iteratively update the model parameters. These methods are well-suited for distributed computing environments because they can handle large-scale data efficiently by processing mini-batches of data and updating the model incrementally.

Databricks documentation on linear regression: Linear Regression in Spark ML

by Lashandra at Jan 12, 2025, 12:56 PM

Limited Time Offer

25%

18 days ago

I think option C is the best choice. Changing the optimization algorithm could help the data scientist explore the hyperparameter space more effectively and potentially find a better model.

upvoted 0 times

Emogene

2 days ago

I think option C is the best choice.

upvoted 0 times

...

Niesha

21 days ago

I think the data scientist should change the number of compute nodes to be half or less than half of the number of evaluations.

upvoted 0 times

...