What are the key outcomes of the successful analytical projects?
A. Code of the model
B. Technical specifications
C. Presentations for the Analysts
D. Presentation for Project Sponsors
What are the advantages of the Hashing Features?
A. Requires the less memory
B. Less pass through the training data
C. Easily reverse engineer vectors to determine which original feature mapped to a vector location
RMSE is a useful metric for evaluating which types of models?
A. Logistic regression
B. Naive Bayes classifier
C. Linear regression
D. All of the above
In statistics, maximum-likelihood estimation (MLE) is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters and the normalizing constant usually ignored in MLEs because:
A. The normalizing constant is always very close to 1
B. The normalizing constant only has a small impact on the maximum likelihood
C. The normalizing constant is often zero and can cause division by zero
D. The normalizing constant doesn't impact the maximizing value
Which of the following question statement falls under data science category?
A. What happened in last six months?
B. How many products have been sold in a last month?
C. Where is a problem for sales?
D. Which is the optimal scenario for selling this product?
E. What happens, if these scenario continues?
The method based on principal component analysis (PCA) evaluates the features according to:
A. The projection of the largest eigenvector of the correlation matrix on the initial dimensions
B. According to the magnitude of the components of the discriminate vector
C. The projection of the smallest eigenvector of the correlation matrix on the initial dimensions
D. None of the above
Which of the below best describe the Principal component analysis
A. Dimensionality reduction
B. Collaborative filtering
C. Classification
D. Regression
E. Clustering
Select the correct statement which applies to K-Nearest Neighbors
A. No Assumption about the data
B. Computationally expensive
C. Require less memory
D. Works with Numeric Values
What is one modeling or descriptive statistical function in MADlib that is typically not provided in a standard relational database?
A. Expected value
B. Variance
C. Linear regression
D. Quantiles
You are asked to create a model to predict the total number of monthly subscribers for a specific magazine. You are provided with 1 year's worth of subscription and payment data, user demographic data, and 10 years worth of content of the magazine (articles and pictures). Which algorithm is the most appropriate for building a predictive model for subscribers?
A. Linear regression
B. Logistic regression
C. Decision trees
D. TF-IDF