Given an input vector of features, a Random Forests model performs a classification task and ends in a tie. How does the model handle this outcome?
A. The model will be rebuilt
B. A winner is chosen at random
C. The tree that caused the tie is discarded
D. One more tree is added to the forest
Why would a company decide to use HBase to replace an existing relational database?
A. It is required for performing ad-hoc queries.
B. Varying formats of input data requires columns to be added in real time.
C. The company's employees are already fluent in SQL.
D. Existing SQL code will run unchanged on HBase.
What is an ideal use case for HDFS?
A. Storing files that are updated frequently
B. Storing files that are written once and read many times
C. Storing results between Map steps and Reduce steps
D. Storing application files in memory
A marketing team creates a graph using a square for each data point, where the length of each side is set to the data value. The data values are 10 and 20.
What is the lie factor of the graph?
A. 1
B. 2
C. 3
D. 6
How does Latent Dinchlet Allocation (LDA) interpret a document?
A. As a single-predefined topic
B. As a mixture of pre-defined topics
C. As having a mixture of sentiments
D. As having a single pre-defined sentiment
What is an important simu-lation design consideration?
A. Ensure model Inputs align with reality
B. Use different seed values to regenerate results
C. For rare event models, minimize number of trials
D. A complex model is better than a simple model
Assuming the node index starts at 1, what is the out-degree of node 3 in the adjacency matrix shown? Refer to the exhibit.
A. 0
B. 1
C. 2
D. 3
What is a random subspace of features, as used by Random Forests?
A. A random subset of features that are chosen at each split in the decision tree
B. Filtration of data that does not meet a pre-defined weighting thrsehold
C. The creation of out-of-bag (OOB) data that is used to select features
D. Removal of highly correlated variables to randomize the features
In the graph, which edge would be considered a weak lie? Refer to the exhibit.
A. C-E
B. E-F
C. B-C
D. G-l
What do first-order and second-order Markov processes have in common concerning next word prediction?
A. Both use WordNet to model the probability of the next word
B. Both are unsupervised methods
C. Both provide the foundation to build a trigram language model
D. Neither makes assumptions about the probability of the next word