165Question#165:A data scientist is using the Amazon SageMaker Neural Topic Model(NTM)algorithm to build a model thatRecommends tags from blog posts.The raw blog post data is stored in an Amazon S3 bucket in JSON format.During model evaluation,the dataScientist discovered that the model recommends certain stopwords such as"a,""an,"and"the"as tags to certain blog posts,along with a few rareWords that are present only in certain blog entries.After a few iterations of tag review with the content team,the data scientist notices that the rareL rates from rentWords are unusual but feasible.The data scientist also must ensure that the tag recommendations of the generated model do not include theStopwords.What should the data scientist do to meet these requirements?
A、UsetheAmazonComprehendentityrecognitionAPIoperations.Removethedetectedwordsfromtheblogpostdata.ReplacetheblogPostdatasourceintheS3bucket.
B、RuntheSageMakerbuilt-inprincipalcomponentanalysis(PCA)algorithmwiththeblogpostdatafromtheS3bucketasthedatasource.ReplacetheblogpostdataintheS3bucketwiththeresultsofthetrainingjob.
C、UsetheSageMakerbuilt-inObjectDetectionalgorithminsteadoftheNTMalgorithmforthetrainingjobtoprocesstheblogpostdata.
D、RemovethestopwordsfromtheblogpostdatabyusingtheCountVectorizerfunctioninthescikit-learnlibrary.ReplacetheblogpostDataintheS3bucketwiththeresultsofthevectorizer.
发布时间:2025-07-01 20:39:55