The Latest Real Exam Questions from the Latest DSA-C02 Study Guide Try Free DSA-C02 Practice Questions

Pass2lead > Snowflake > SnowPro Advanced Certification > DSA-C02 > DSA-C02 Online Practice Questions and Answers

DSA-C02 Online Practice Questions and Answers

Questions 4

Which of the following process best covers all of the following characteristics?

Collecting descriptive statistics like min, max, count and sum.

Collecting data types, length and recurring patterns. ?Tagging data with keywords, descriptions or categories.

Performing data quality assessment, risk of performing joins on the data.

Discovering metadata and assessing its accuracy.

Identifying distributions, key candidates, foreign-key candidates,functional dependencies, embedded value dependencies, and performing inter-table analysis.

A. Data Visualization

B. Data Virtualization

C. Data Profiling

D. Data Collection

Buy Now

Questions 5

What is the formula for measuring skewness in a dataset?

A. MEAN - MEDIAN

B. MODE - MEDIAN

C. (3(MEAN - MEDIAN))/ STANDARD DEVIATION

D. (MEAN - MODE)/ STANDARD DEVIATION

Buy Now

Questions 6

Mark the correct steps for saving the contents of a DataFrame to aSnowflake table as part of Moving Data from Spark to Snowflake?

A. Step 1.Use the PUT() method of the DataFrame to construct a DataFrameWriter. Step 2.Specify SNOWFLAKE_SOURCE_NAME using the NAME() method. Step 3.Use the dbtable option to specify the table to which data is written. Step 4.Specify the connector options using either the option() or options() method. Step 5.Use the save() method to specify the save mode for the content.

B. Step 1.Use the PUT() method of the DataFrame to construct a DataFrameWriter. Step 2.Specify SNOWFLAKE_SOURCE_NAME using the format() method. Step 3.Specify the connector options using either the option() or options() method. Step 4.Use the dbtable option to specify the table to which data is written. Step 5.Use the save() method to specify the save mode for the content.

C. Step 1.Use the write() method of the DataFrame to construct a DataFrameWriter. Step 2.Specify SNOWFLAKE_SOURCE_NAME using the format() method. Step 3.Specify the connector options using either the option() or options() method. Step 4.Use the dbtable option to specify the table to which data is written. Step 5.Use the mode() method to specify the save mode for the content.

D. Step 1.Use the writer() method of the DataFrame to construct a DataFrameWriter. Step 2.Specify SNOWFLAKE_SOURCE_NAME using the format() method. Step 3.Use the dbtable option to specify the table to which data is written. Step 4.Specify the connector options using either the option() or options() method. Step 5.Use the save() method to specify the save mode for the content.

Buy Now

Questions 7

Which one is not the feature engineering techniques used in ML data science world?

A. Imputation

B. Binning

C. One hot encoding

D. Statistical

Buy Now

Correct Answer: D

Explanation:

Feature engineering is the pre-processing step of machine learning, which is used to transform raw data into features that can be used for creating a predictive model using Machine learning or statistical Modelling.

What is a feature?

Generally, all machine learning algorithms take input data to generate the output. The input data re-mains in a tabular form consisting of rows (instances or observations) and columns (variable or at-tributes), and these attributes are often

known as features. For example, an image is an instance in computer vision, but a line in the image could be the feature. Similarly, in NLP, a document can be an observation, and the word count could be the feature. So, we can say a feature

is an attribute that impacts a problem or is useful for the problem.

What is Feature Engineering?

Feature engineering is the pre-processing step of machine learning, which extracts features from raw data. It helps to represent an underlying problem to predictive models in a better way, which as a result, improve the accuracy of the model

for unseen data. The predictive model contains predictor variables and an outcome variable, and while the feature engineering process selects the most useful predictor variables for the model. Some of the popular feature engineering

techniques include:

Imputation Feature engineering deals with inappropriate data, missing values,human interruption, general errors, insufficient data sources, etc. Missing values within the dataset highly affect the performance of the algorithm, and to deal with them "Imputation" technique is used. Imputation is responsible for handling irregularities within the dataset. For example, removing the missing values from the complete row or complete column by a huge percentage of missing values. But at the same time, to maintain the data size, it is required to impute the missing data, which can be done as:

For numerical data imputation, a default value can be imputed in a column, and missing values can be filled with means or medians of the columns. For categorical data imputation, missing values can be interchanged with the maximum occurred value in a column.

Handling Outliers

Outliers are the deviated values or data points that are observed too away from other data points in such a way that they badly affect the performance of the model. Outliers can be handled with this feature engineering technique. This

technique first identifies the outliers and then remove them out.

Standard deviation can be used to identify the outliers. For example, each value within a space has a definite to an average distance, but if a value is greater distant than acertain value, it can be considered as an outlier. Z-score can also be

used to detect outliers.

Log transform

Logarithm transformation or log transform is one of the commonly used mathematical techniques in machine learning. Log transform helps in handling the skewed data, and it makes the distribution more approximate to normal after

transformation. It also reduces the effects of outliers on the data, as because of the normalization of magnitude differences, a model becomes much robust.

Binning

In machine learning, overfitting is one of the main issues that degrade the performance of the model and which occurs due to a greater number of parameters and noisydata. However, one of the popular techniques of feature engineering,

"binning", can be used to normalize the noisy data. This process involves segmenting different features into bins.

Feature Split

As the name suggests, feature split is the process of splitting features intimately into two or more parts and performing to make new features. This technique helps the algorithms to better understand and learn the patterns in the dataset. The

feature splitting process enables the new features to be clustered and binned, which results in extracting useful information and improving the performance of the data models.

One hot encoding

One hot encoding is the popular encoding technique in machine learning. It is a technique that converts the categorical data in a form so that they can be easily understood by machine learning algorithms and hence can make a good

prediction. It enables group theof categorical data without losing any information.

Questions 8

Data Scientist can query, process, and transform data in a which of the following ways using Snowpark Python. Choose 2.

A. Query and process data with a DataFrame object.

B. Write a user-defined tabular function (UDTF) that processes data and returns data in a set of rows with one or more columns.

C. SnowPark currently do not support writing UDTF.

D. Transform Data using DataIKY tool with SnowPark API.

Buy Now

Questions 9

A Data Scientist as data providers require to allow consumers to access all databases and database objects in a share by granting a single privilege on shared databases. Which one is incorrect SnowSQL command used by her while doing this task?

Assuming:

A database named product_db exists with a schema named product_agg and a table named Item_agg.

The database, schema, and table will be shared with two accounts named xy12345 and yz23456.

1.USE ROLE accountadmin;

2.CREATE DIRECT SHARE product_s;

3.GRANT USAGE ON DATABASE product_db TO SHARE product_s;

4.GRANT USAGE ON SCHEMA product_db. product_agg TO SHARE product_s;

5.GRANT SELECT ON TABLE sales_db. product_agg.Item_agg TO SHARE product_s; 6.SHOW GRANTS TO SHARE product_s;

7.ALTER SHARE product_s ADD ACCOUNTS=xy12345, yz23456;

8.SHOW GRANTS OF SHARE product_s;

A. GRANT USAGE ON DATABASE product_db TO SHARE product_s;

B. CREATE DIRECT SHARE product_s;

C. GRANT SELECT ON TABLE sales_db. product_agg.Item_agg TO SHARE product_s;

D. ALTER SHARE product_s ADD ACCOUNTS=xy12345, yz23456;

Buy Now

Questions 10

Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10']. What does the aggregate method shown in below code do?

g = df.groupby(df.index.str.len())

A. aggregate({'A':len, 'B':np.sum})

B. Computes Sum of column A values

C. Computes length of column A

D. Computes length of column A and Sum of Column B values of each group

E. Computes length of column A and Sum of Column B values

Buy Now

Questions 11

What Can Snowflake Data Scientist do in the Snowflake Marketplace as Consumer? Choose all apply.

A. Discover and test third-party data sources.

B. Receive frictionless access to raw data products from vendors.

C. Combine new datasets with your existing data in Snowflake to derive new business in- sights.

D. Use the business intelligence (BI)/ML/Deep learning tools of her choice.

Buy Now

Questions 12

Which object records data manipulation language (DML) changes made to tables, including inserts, updates, and deletes, as well as metadata about each change, so that actions can be taken using the changed data of Data Science Pipelines?

A. Task

B. Dynamic tables

C. Stream

D. Tags

E. Delta

F. OFFSET

Buy Now

Questions 13

Which of the following is a useful tool for gaining insights into the relationship between features and predictions?

A. numpy plots

B. sklearn plots

C. Partial dependence plots(PDP)

D. FULL dependence plots (FDP)

Buy Now

Exam Code: DSA-C02

Exam Name: SnowPro Advanced: Data Scientist Certification

Last Update: Dec 14, 2024

Questions: 65

PDF (Q&A)

$49.99

ADD TO CART

VCE

$55.99

ADD TO CART

PDF + VCE

$65.99

ADD TO CART