The stakeholders of a machine learning model have confirmed that they understand the objective and purpose of the model, and ensured that the proposed model aligns with their business priorities. They have also selected a framework and a machine learning model that they will be using.
What should be the next step to progress along the machine learning workflow?
The machine learning (ML) workflow follows a structured sequence of steps. Once stakeholders have agreed on the objectives, business priorities, and the framework/model selection, the next logical step is to prepare and pre-process the data before training the model.
Data Preparation is crucial because machine learning models rely heavily on the quality of input data. Poor data can result in biased, inaccurate, or unreliable models.
The process involves data acquisition, cleaning, transformation, augmentation, and feature engineering.
Preparing the data ensures it is in the right format, free from errors, and representative of the problem domain, leading to better generalization in training.
Why Other Options Are Incorrect:
A (Tune the ML Algorithm): Hyperparameter tuning occurs after the model has been trained and evaluated.
C (Agree on Acceptance Criteria): Acceptance criteria should already have been defined in the initial objective-setting phase before framework and model selection.
D (Evaluate the Framework and Model): The selection of the framework and ML model has already been completed. The next step is data preparation, not reevaluation.
Supporting Reference from ISTQB Certified Tester AI Testing Study Guide:
ISTQB CT-AI Syllabus (Section 3.2: ML Workflow - Data Preparation Phase)
'Data preparation comprises data acquisition, pre-processing, and feature engineering. Exploratory data analysis (EDA) may be performed alongside these activities'.
'The data used to train, tune, and test the model must be representative of the operational data that will be used by the model'.
Conclusion:
Since the model selection is complete, the next step in the ML workflow is to prepare and pre-process the data to ensure it is ready for training and testing. Thus, the correct answer is B.
Currently there are no comments in this discussion, be the first to comment!