You are working as a data scientist for a healthcare company. They decide to analyze the data to find patterns in a large volume of electronic medical records. You are asked to build a PySpark solution to analyze these records in a JupyterLab notebook. What is the order of recommended steps to develop a PySpark application in Oracle Cloud Infrastructure (OCI) Data Science?
A. Launch a notebook session. Configure core-site.xml. Install a PySPark conda environ- ment. B. Develop your PySpark application Create a Data Flow application with the Ac- celerated Data Science (ADS) SOK
B. Configure core-site.xml. Install a PySPark conda environment. Create a Data Flow application with the Accelerated Data Science (ADS) SDK Develop your PySpark ap- plication. Launch a notebook session.
C. Launch a notebook session. Install a PySpark conda environment. Configure coresite.xml.
D. Develop your PySpark application. Create a Data Flow application with the Ac-celerated Data science (ADS) SDK.
E. Install a spark conda environment. Configure core-site.xml. Launch a notebook session:Create a Data Flow application with the Accelerated Data Science (ADS) SOK. Develop your PySpark application
You want to make your model more parsimonious to reduce the cost of collecting and processing data. You plan to do this by removing features that are highly correlated. You would like to create a heat map that displays the correlation so that you can identify candidate features to remove. Which Accelerated Data Science (ADS) SDK method would be appropriate to display the correlation between Continuous and Categorical features?
A. Corr{}
B. Correlation_ratio_plot{}
C. Pearson_plot{}
D. Cramersv_plot{}
You are creating an Oracle Cloud Infrastructure (OCI) Data Science job that will run on a recurring basis in a production environment. This job will pick up sensitive data from an Object Storage bucket, train a model, and save it to the model catalog. How would you design the authentication mechanism for the job?
A. Package your personal OC file and keys in the job artifact.
B. Use the resource principal of the job run as the signer in the job code, ensuring there is a dynamic group for this job run with appropriate access to Object Storage and the model catalog.
C. Store your personal OCI config file and kays in the Vault, and access the Vault through the job nun resource principal
D. Create a pre-authenticated request (PAA) for the Object Storage bucket, and use that in the job code.
You are building a model and need input that represents data as morning, afternoon, or evening. However, the data contains a time stamp. What part of the Data Science life cycle would you be in when creating the new variable?
A. Model type selection
B. Model validation
C. Data access
D. Feature engineering
You have just received a new data set from a colleague. You want to quickly find out summary information about the data set, such as the types of features, total number of observations, and datadistributions, Which Accelerated Data Science (ADS) SDK method from the ADandDataset class would you use?
A. Show_in_notebook{}
B. To_xgb{}
C. Compute{}
D. Show_corr{}
You are given the task of writing a program that sorts document images by language. Which Oracle AI service would you use?
A. Oracle Digital Assistant
B. OCI Speech
C. OCI Vision
D. OCI Language
You have an embarrassingly parallel or distributed batch job on a large amount of data running using Data Science Jobs
What would be the best approach to run the workload?
A. Create the job in Data Science Jobs and then start the number of simultaneous job runs required for your workload.
B. Create the job in Data Science Jobs and start a job run. When it is done, start a new job run until you achieve the number of runs required.
C. Reconfigure the job run because Data science jobs does not support embarrassingly parallel.
D. Create a new job for every job run that you have to run in parallel, because the Date Science Jobs service can have only one job run per job.
As you are working in your notebook session, you find that your notebook session does not have enough compute CPU and memory for your workload. How would you scale up your notebook session without losing your work?
A. Ensure your files and environments are written to the block volume storage under the /home/datascience directory, deactivate the notebook session, and activate the notebook larger compute shape selected.
B. Down your files and data to your local machine, delete your notebook session, provision tebook session on a larger compute shape, and upload your files from your local the new notebook session.
C. Deactivate your notebook session, provision a new notebook session on larger compute shape, and re-create all your file changes.
D. Create a temporary bucket in Object Storage, write all your files and data to Object Storage, delete tur ctebook session, provision a new notebook session on a larger com-pute shape, and capy your flies and data from your temporary bucket onto your new notebook session.
You want to ensure that all stdout and stderr from your code are automatically collected and logged, without implementing additional logging in your code. How would you achieve this with Data Science Jobs?
A. Data Science Jots does not support automatic fog collection and storing.
B. On job creation, enable logging and select a log group. Then, select either log or the op- tion to enable automatic log creation.
C. You can implement custom logging in your code by using the Data Science Jobs logging.
D. Make sure that your code is using the standard logging library and then store all the logs to Check Storage at the end of the job.
Which of the following TWO non-open source JupyterLab extensions has Oracle Cloud In- frastructure (OCI) Data Science developed and added to the notebook session experience?
A. Environment Explorer
B. Table of Contents
C. Command Palette
D. Notebook Examples
E. Terminal