Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon Exam MLS-C01 Topic 4 Question 97 Discussion

Actual exam question for Amazon's MLS-C01 exam
Question #: 97
Topic #: 4
[All MLS-C01 Questions]

A Machine Learning Specialist observes several performance problems with the training portion of a machine learning solution on Amazon SageMaker The solution uses a large training dataset 2 TB in size and is using the SageMaker k-means algorithm The observed issues include the unacceptable length of time it takes before the training job launches and poor I/O throughput while training the model

What should the Specialist do to address the performance issues with the current solution?

Show Suggested Answer Hide Answer
Suggested Answer: A

SageMaker Data Wrangler is a feature of SageMaker Studio that provides an end-to-end solution for importing, preparing, transforming, featurizing, and analyzing data. Data Wrangler includes built-in analyses that help generate visualizations and data insights in a few clicks. One of the built-in analyses is the Quick Model visualization, which can be used to quickly evaluate the data and produce importance scores for each feature. A feature importance score indicates how useful a feature is at predicting a target label. The feature importance score is between [0, 1] and a higher number indicates that the feature is more important to the whole dataset. The Quick Model visualization uses a random forest model to calculate the feature importance for each feature using the Gini importance method. This method measures the total reduction in node impurity (a measure of how well a node separates the classes) that is attributed to splitting on a particular feature. The ML developer can use the Quick Model visualization to obtain the importance scores for each feature of the dataset and use them to feature engineer the dataset. This solution requires the least development effort compared to the other options.

References:

* Analyze and Visualize

* Detect multicollinearity, target leakage, and feature correlation with Amazon SageMaker Data Wrangler


Contribute your Thoughts:

Gail
6 days ago
I hear you, 2 TB of data is no joke. Maybe the specialist should just print out the whole dataset and train the model by hand - it'd be faster than waiting for SageMaker!
upvoted 0 times
...
Felix
13 days ago
2 TB of data, yikes! That's like trying to train a model on the entire internet. I hope the specialist has a good internet connection, otherwise, they're going to be waiting a while for that job to launch.
upvoted 0 times
...
Polly
15 days ago
Batch transform, huh? That's an interesting idea, but I'm not sure it's the best fit for this scenario. I'd stick with option C and let that Pipe mode do its thing.
upvoted 0 times
...
Gennie
24 days ago
Hmm, compression is always a good idea, but I think option D is the way to go. Mounting that data on an EFS volume should give you the performance boost you need.
upvoted 0 times
Dick
6 days ago
I agree, mounting the data on an EFS volume could definitely improve performance.
upvoted 0 times
...
Jamal
8 days ago
Option D sounds like a good solution. EFS should help with the I/O throughput issue.
upvoted 0 times
...
...
Markus
25 days ago
Whoa, a 2 TB dataset? That's gotta be a real workout for SageMaker! I'd go with option C - the Pipe input mode, that should help with the I/O throughput issues.
upvoted 0 times
...
Leanora
1 months ago
I'm not sure, maybe we should also consider using the SageMaker batch transform feature.
upvoted 0 times
...
Aaron
1 months ago
I agree with Bette, that could help improve the performance.
upvoted 0 times
...
Bette
2 months ago
I think we should compress the training data into Apache Parquet format.
upvoted 0 times
...

Save Cancel
az-700  pass4success  az-104  200-301  200-201  cissp  350-401  350-201  350-501  350-601  350-801  350-901  az-720  az-305  pl-300  

Warning: Cannot modify header information - headers already sent by (output started at /pass.php:70) in /pass.php on line 77
a