Which of the following process best covers all of the following characteristics?
Collecting descriptive statistics like min, max, count and sum.
Collecting data types, length and recurring patterns. ?Tagging data with keywords, descriptions or categories.
Performing data quality assessment, risk of performing joins on the data.
Discovering metadata and assessing its accuracy.
Identifying distributions, key candidates, foreign-key candidates,functional dependencies, embedded value dependencies, and performing inter-table analysis.
A. Data Visualization
B. Data Virtualization
C. Data Profiling
D. Data Collection
What is the formula for measuring skewness in a dataset?
A. MEAN - MEDIAN
B. MODE - MEDIAN
C. (3(MEAN - MEDIAN))/ STANDARD DEVIATION
D. (MEAN - MODE)/ STANDARD DEVIATION
Mark the correct steps for saving the contents of a DataFrame to aSnowflake table as part of Moving Data from Spark to Snowflake?
A. Step 1.Use the PUT() method of the DataFrame to construct a DataFrameWriter. Step 2.Specify SNOWFLAKE_SOURCE_NAME using the NAME() method. Step 3.Use the dbtable option to specify the table to which data is written. Step 4.Specify the connector options using either the option() or options() method. Step 5.Use the save() method to specify the save mode for the content.
B. Step 1.Use the PUT() method of the DataFrame to construct a DataFrameWriter. Step 2.Specify SNOWFLAKE_SOURCE_NAME using the format() method. Step 3.Specify the connector options using either the option() or options() method. Step 4.Use the dbtable option to specify the table to which data is written. Step 5.Use the save() method to specify the save mode for the content.
C. Step 1.Use the write() method of the DataFrame to construct a DataFrameWriter. Step 2.Specify SNOWFLAKE_SOURCE_NAME using the format() method. Step 3.Specify the connector options using either the option() or options() method. Step 4.Use the dbtable option to specify the table to which data is written. Step 5.Use the mode() method to specify the save mode for the content.
D. Step 1.Use the writer() method of the DataFrame to construct a DataFrameWriter. Step 2.Specify SNOWFLAKE_SOURCE_NAME using the format() method. Step 3.Use the dbtable option to specify the table to which data is written. Step 4.Specify the connector options using either the option() or options() method. Step 5.Use the save() method to specify the save mode for the content.
Which one is not the feature engineering techniques used in ML data science world?
A. Imputation
B. Binning
C. One hot encoding
D. Statistical
Data Scientist can query, process, and transform data in a which of the following ways using Snowpark Python. Choose 2.
A. Query and process data with a DataFrame object.
B. Write a user-defined tabular function (UDTF) that processes data and returns data in a set of rows with one or more columns.
C. SnowPark currently do not support writing UDTF.
D. Transform Data using DataIKY tool with SnowPark API.
A Data Scientist as data providers require to allow consumers to access all databases and database objects in a share by granting a single privilege on shared databases. Which one is incorrect SnowSQL command used by her while doing this task?
Assuming:
A database named product_db exists with a schema named product_agg and a table named Item_agg.
The database, schema, and table will be shared with two accounts named xy12345 and yz23456.
1.USE ROLE accountadmin;
2.CREATE DIRECT SHARE product_s;
3.GRANT USAGE ON DATABASE product_db TO SHARE product_s;
4.GRANT USAGE ON SCHEMA product_db. product_agg TO SHARE product_s;
5.GRANT SELECT ON TABLE sales_db. product_agg.Item_agg TO SHARE product_s; 6.SHOW GRANTS TO SHARE product_s;
7.ALTER SHARE product_s ADD ACCOUNTS=xy12345, yz23456;
8.SHOW GRANTS OF SHARE product_s;
A. GRANT USAGE ON DATABASE product_db TO SHARE product_s;
B. CREATE DIRECT SHARE product_s;
C. GRANT SELECT ON TABLE sales_db. product_agg.Item_agg TO SHARE product_s;
D. ALTER SHARE product_s ADD ACCOUNTS=xy12345, yz23456;
Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10']. What does the aggregate method shown in below code do?
g = df.groupby(df.index.str.len())
A. aggregate({'A':len, 'B':np.sum})
B. Computes Sum of column A values
C. Computes length of column A
D. Computes length of column A and Sum of Column B values of each group
E. Computes length of column A and Sum of Column B values
What Can Snowflake Data Scientist do in the Snowflake Marketplace as Consumer? Choose all apply.
A. Discover and test third-party data sources.
B. Receive frictionless access to raw data products from vendors.
C. Combine new datasets with your existing data in Snowflake to derive new business in- sights.
D. Use the business intelligence (BI)/ML/Deep learning tools of her choice.
Which object records data manipulation language (DML) changes made to tables, including inserts, updates, and deletes, as well as metadata about each change, so that actions can be taken using the changed data of Data Science Pipelines?
A. Task
B. Dynamic tables
C. Stream
D. Tags
E. Delta
F. OFFSET
Which of the following is a useful tool for gaining insights into the relationship between features and predictions?
A. numpy plots
B. sklearn plots
C. Partial dependence plots(PDP)
D. FULL dependence plots (FDP)