Home
Blog
Answer the following questions for the python requirements!

Answer the following questions for the python requirements!

Daniel Kevins

0 comments

1) Answer the following to the best of your ability: (10 points)

a) Define Corpus:

b) How might you make a corpus for the following problem: I want to be able to learn characteristics of a politician’s language

2) a) Describe briefly 4 difficulties with identifying word boundaries algorithmically? (8 points) i)

ii)

iii)

iv)

b) What is the possible differences in the following two implementations of a word identifier (5 points) tokens = nltk.word_tokenize(sentence) and tokens = sentence.split(“ “)

c) Why do we use ‘tokens’ instead of ‘word’ (5 points)

3) With the following sentence “The Cat in the Hat” (12 points)

a) List the Uni-grams b) List the Bi-grams c) List the Tri-grams 4) Answer the following about predictive models: (10 points) a) What is a backoff model?

b) Give an example of how a backoff may help your model.

5) Why do we need sent_tokenize_list = sent_tokenize(text) in NLTK instead of just breaking sentences apart by punctuation? (5 points)

6) Briefly explain Transformation Based Tagging and how it differs from Ngram tagging for Part-of_Speech (8 points)

7) Answer the following: (12 points)

a) What is a False Negative?

b) What is a True Positive?

c) When should Accuracy be used as a metric?

d) What is the difference between Precision and Recall? When would you use them? 8) Fill in the 3 empty boxes for a typical machine learning cycle: (9 points)

9) What are the two differences when you test on your training data versus testing on your test data? (4 points)

10) Explain (or draw) k-fold validation when k=5 (6 points)

11) Show 3 examples where a Named Entity System by get confused by ambiguity (6 points)

About the Author

Daniel Kevins

Follow me