You have a large m x n data matrix M. You decide you want to perform dimension reduction/clustering on your data and have decide to use the singular value decomposition (SVD; also called principal components analysis PCA)
Refer to the passage above.
What represents the SVD of the Matrix standard M given the following information:
U is m x m unitary V is n x n unitary S is m x n diagonal Q is n x n invertible D is n x n diagonal L is m x m lower triangular U is m x m upper triangular
A. M = U S V
B. M = U P
C. M = Q D Q-1
D. M = L U
You are building a k-nearest neighbor classifier (k-NN) on a labeled set of points in a high- dimensional space. You determine that the classifier has a large error on the training data. What is the most likely problem?
A. High-dimensional spaces effectively make local neighborhoods global
B. k-NN compotation does not coverage in high dimensions
C. k was too small
D. The VC-dimension of a k-NN classifier is too high
You have just run a MapReduce job to filter user messages to only those of a selected geographical region. The output for this job in a directory named westUsers, located just below your home directory in HDFS. Which command gathers these records into a single file on your local file system?
A. Hadoop fs getmerge westUsers WestUsers.txt
B. Hadoop fs get westUsers WestUsers.txt
C. Hadoop fs cp westUsers/* westUsers.txt
D. Hadoop fs getmerge R westUsers westUsers.txt
What are two defining features of RMSE (root-mean square error or root-mean-square deviation)?
A. It is sensitive to outliers
B. It is the mean value of recommendations of the K-equal partitions in the input data
C. It is the square of the median value of the error where error is the difference between predicted rating and actual ratings
D. It is appropriate for numeric data
E. It considers the order of recommendations
From historical data, you know that 50% of students who take Cloudera's Introduction to Data Science: Building Recommenders Systems training course pass this exam, while only 25% of students who did not take the training course pass this exam. You also know that 50% of this exam's candidates also take Cloudera's Introduction to Data Science: Building Recommendations Systems training course.
What is the probability that any individual exam candidate will pass the data science exam?
A. 3/8
B. 1/4
C. 1/8
D. 1/2
What is the best way to determine the learning rate parameters for stochastic gradient descent when the distribution of the input data shifts over time?
A. The learning rate should be adjusted periodically based on the setting that optimizes the objective function over a sample of recent observations
B. The learning rate should be fixed number that decays as the number of observations in the data set increases
C. The learning rate should be the value that optimizes the value of the objective function over the first N samples in the dataset
D. The learning rate should be a fixed number with a constant decay factor
E. The learning rate should be continuously adjusted based on the value that optimizes the objective function for the most recent observation from the input data
Which two machine learning algorithm should you consider as likely to benefit from discretizing continuous features?
A. Support vector machine
B. Naïve Bayes
C. Decision trees
D. Logistic regression
E. Singular value decomposition
You've built a model that has ten different variables with complicated independence relationships between them, and both continuous and discrete variables that have complicated, multi-parameter distributions. Computing the joint probability distribution is complex, but it turns out that computing the conditional probabilities for the variables is easy. What is the most computationally efficient for computing the expected value?
A. Method of moments
B. Markov Chain Monte Carlo
C. Gibbs sampling
D. Numerical quadrature
Which three metrics are useful in measuring the accuracy and quality of a recommender system?
A. Mutual Information
B. RMSF
C. Tanimoto coefficient
D. Pearson correlation
E. Precision
F. Recall