Independence Day Deal! Unlock 25% OFF Today – Limited-Time Offer - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon Exam MLS-C01 Topic 4 Question 80 Discussion

Actual exam question for Amazon's MLS-C01 exam
Question #: 80
Topic #: 4
[All MLS-C01 Questions]

A Data Scientist is working on an application that performs sentiment analysis. The validation accuracy is poor and the Data Scientist thinks that the cause may be a rich vocabulary and a low average frequency of words in the dataset

Which tool should be used to improve the validation accuracy?

Show Suggested Answer Hide Answer
Suggested Answer: A

Contribute your Thoughts:

Chaya
1 months ago
Stemming and stop word removal? Sounds like my high school English teacher's dream tool. Maybe we can throw in some thesaurus action too, just for fun.
upvoted 0 times
Linn
2 days ago
A: I think we should try using Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizers.
upvoted 0 times
...
...
Vallie
1 months ago
B) Amazon SageMaker BlazingText? Sounds like a made-up answer. I'd stick to the more well-known NLP tools and techniques.
upvoted 0 times
Novella
1 months ago
C: Natural Language Toolkit (NLTK) stemming and stop word removal could also be a good option to try.
upvoted 0 times
...
Shelia
1 months ago
B: Amazon SageMaker BlazingText? Sounds like a made-up answer. I'd stick to the more well-known NLP tools and techniques.
upvoted 0 times
...
Josphine
1 months ago
A: I think using Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizers could help improve the validation accuracy.
upvoted 0 times
...
...
Sharika
2 months ago
A) Amazon Comprehend is probably overkill for this task. It's more suited for enterprise-level NLP tasks, not a simple sentiment analysis problem.
upvoted 0 times
Virgie
1 months ago
A) Amazon Comprehend is probably overkill for this task.
upvoted 0 times
...
Nicolette
1 months ago
D) Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizers
upvoted 0 times
...
Tyra
1 months ago
C) Natural Language Toolkit (NLTK) stemming and stop word removal
upvoted 0 times
...
...
Domitila
2 months ago
D) Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizers could also be a good choice. TF-IDF can help identify the most important words in the dataset and reduce the impact of common words.
upvoted 0 times
Essie
13 days ago
A: TF-IDF can help identify the most important words and reduce the impact of common words.
upvoted 0 times
...
Son
17 days ago
C: D) Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizers could also be a good choice.
upvoted 0 times
...
Coral
2 months ago
B: I agree, removing stop words can help focus on the more important words in the dataset.
upvoted 0 times
...
Joanne
2 months ago
A: I think we should use C) Natural Language Toolkit (NLTK) stemming and stop word removal to improve the validation accuracy.
upvoted 0 times
...
...
Jamal
2 months ago
I prefer using NLTK for stemming and stop word removal to improve accuracy.
upvoted 0 times
...
Chauncey
2 months ago
C) Natural Language Toolkit (NLTK) stemming and stop word removal seems like the best option to handle the issue of rich vocabulary and low average word frequency. Removing common words and reducing words to their base form can help improve the model's performance.
upvoted 0 times
Gianna
27 days ago
I agree, it can definitely help in handling the rich vocabulary and low word frequency.
upvoted 0 times
...
Delisa
1 months ago
I think using NLTK stemming and stop word removal could really help improve the accuracy.
upvoted 0 times
...
...
Kris
2 months ago
I agree with Florinda, TF-IDF can help with the low frequency of words issue.
upvoted 0 times
...
Florinda
3 months ago
I think we should use Scikit-learn TF-IDF vectorizers.
upvoted 0 times
...

Save Cancel
az-700  pass4success  az-104  200-301  200-201  cissp  350-401  350-201  350-501  350-601  350-801  350-901  az-720  az-305  pl-300  

Warning: Cannot modify header information - headers already sent by (output started at /pass.php:70) in /pass.php on line 77