Amazon Exam MLS-C01 Topic 1 Question 112 Discussion

Actual exam question for Amazon's MLS-C01 exam

Question #: 112
Topic #: 1

A data scientist needs to create a model for predictive maintenance. The model will be based on historical data to identify rare anomalies in the data.

The historical data is stored in an Amazon S3 bucket. The data scientist needs to use Amazon SageMaker Data Wrangler to ingest the dat

a. The data scientists also needs to perform exploratory data analysis (EDA) to understand the statistical properties of the data.

Which solution will meet these requirements with the LEAST amount of compute resources?

AImport the data by using the None option.

BImport the data by using the Stratified option.

CImport the data by using the First K option. Infer the value of K from domain knowledge.

DImport the data by using the Randomized option. Infer the random size from domain knowledge.

Show Suggested Answer

Suggested Answer: C

To perform efficient exploratory data analysis (EDA) on a large dataset for anomaly detection, using the First K option in SageMaker Data Wrangler is an optimal choice. This option allows the data scientist to select the first K rows, limiting the data loaded into memory, which conserves compute resources.

Given that the First K option allows the data scientist to determine K based on domain knowledge, this approach provides a representative sample without requiring extensive compute resources. Other options like randomized sampling may not provide data samples that are as useful for initial analysis in a time-series or sequential dataset context.

by Bette at Feb 07, 2025, 08:00 PM

Limited Time Offer

25%

Off

Get Premium MLS-C01 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Gerald

2 months ago

I'm going with option C. It's the 'First K' method, which is obviously the best choice since 'K' stands for 'Kool-Aid'.

upvoted 0 times

Gennie

2 months ago

I agree. It's important to choose the method that requires the least amount of compute resources.

upvoted 0 times

...

Malcom

2 months ago

That makes sense. 'First K' method could help in understanding the data better.

upvoted 0 times

...

Juliana

2 months ago

I think 'K' in option C refers to the number of samples to import.

upvoted 0 times

...

Catrice

2 months ago

Option C sounds interesting. 'First K' method could be a good choice.

upvoted 0 times

...

Fernanda

3 months ago

Woohoo, let's import the data using the 'Enchant' option! I heard it makes the data more magical and reduces compute needs by 420%.

upvoted 0 times

...

Carline

3 months ago

Option B sounds interesting, but I wonder if the data is truly stratified. Might be better to stick with a simpler approach like C or D.

upvoted 0 times

Helaine

2 months ago

Yeah, option D might also be a good choice if you can infer the random size accurately.

upvoted 0 times

...

Aleshia

2 months ago

I think option C could work well if you have good domain knowledge.

upvoted 0 times

...

Kris

3 months ago

But with option C, we can infer the value of K from domain knowledge, which could save on resources.

upvoted 0 times

...

Rozella

3 months ago

I disagree, I believe option D would require the least amount of compute resources.

upvoted 0 times

...

Rex

3 months ago

Hmm, I'm not sure. Option D might be better if we don't have much domain knowledge to infer the right sample size. Randomized sampling could be a safer bet.

upvoted 0 times