Get Started Professional-Machine-Learning-Engineer Exam [2023] Dumps Google PDF Questions [Q55-Q72]

Share

Get Started: Professional-Machine-Learning-Engineer Exam [2023] Dumps Google PDF Questions

Professional-Machine-Learning-Engineer Premium Exam Engine pdf Download

NEW QUESTION # 55
You work for a bank and are building a random forest model for fraud detection. You have a dataset that includes transactions, of which 1% are identified as fraudulent.
Which data transformation strategy would likely improve the performance of your classifier?

  • A. Oversample the fraudulent transaction 10 times.
  • B. Use one-hot encoding on all categorical features.
  • C. Write your data in TFRecords.
  • D. Z-normalize all the numeric features.

Answer: A


NEW QUESTION # 56
A Data Scientist is training a multilayer perception (MLP) on a dataset with multiple classes. The target class of interest is unique compared to the other classes within the dataset, but it does not achieve and acceptable recall metric. The Data Scientist has already tried varying the number and size of the MLP's hidden layers, which has not significantly improved the results. A solution to improve recall must be implemented as quickly as possible.
Which techniques should be used to meet these requirements?

  • A. Train an XGBoost model instead of an MLP
  • B. Train an anomaly detection model instead of an MLP
  • C. Add class weights to the MLP's loss function and then retrain
  • D. Gather more data using Amazon Mechanical Turk and then retrain

Answer: A


NEW QUESTION # 57
A Data Scientist needs to create a serverless ingestion and analytics solution for high-velocity, real-time streaming data.
The ingestion process must buffer and convert incoming records from JSON to a query-optimized, columnar format without data loss. The output datastore must be highly available, and Analysts must be able to run SQL queries against the data and connect to existing business intelligence dashboards.
Which solution should the Data Scientist build to satisfy the requirements?

  • A. Write each JSON record to a staging location in Amazon S3. Use the S3 Put event to trigger an AWS Lambda function that transforms the data into Apache Parquet or ORC format and writes the data to a processed data location in Amazon S3. Have the Analysts query the data directly from Amazon S3 using Amazon Athena, and connect to BI tools using the Athena Java Database Connectivity (JDBC) connector.
  • B. Write each JSON record to a staging location in Amazon S3. Use the S3 Put event to trigger an AWS Lambda function that transforms the data into Apache Parquet or ORC format and inserts it into an Amazon RDS PostgreSQL database. Have the Analysts query and run dashboards from the RDS database.
  • C. Create a schema in the AWS Glue Data Catalog of the incoming data format. Use an Amazon Kinesis Data Firehose delivery stream to stream the data and transform the data to Apache Parquet or ORC format using the AWS Glue Data Catalog before delivering to Amazon S3. Have the Analysts query the data directly from Amazon S3 using Amazon Athena, and connect to BI tools using the Athena Java Database Connectivity (JDBC) connector.
  • D. Use Amazon Kinesis Data Analytics to ingest the streaming data and perform real-time SQL queries to convert the records to Apache Parquet before delivering to Amazon S3. Have the Analysts query the data directly from Amazon S3 using Amazon Athena and connect to BI tools using the Athena Java Database Connectivity (JDBC) connector.

Answer: C

Explanation:
Explanation/Reference:


NEW QUESTION # 58
A Data Scientist is developing a machine learning model to classify whether a financial transaction is fraudulent. The labeled data available for training consists of 100,000 non-fraudulent observations and 1,000 fraudulent observations.
The Data Scientist applies the XGBoost algorithm to the data, resulting in the following confusion matrix when the trained model is applied to a previously unseen validation dataset. The accuracy of the model is 99.1%, but the Data Scientist has been asked to reduce the number of false negatives.

Which combination of steps should the Data Scientist take to reduce the number of false positive predictions by the model? (Choose two.)

  • A. Increase the XGBoost scale_pos_weight parameter to adjust the balance of positive and negative weights.
  • B. Change the XGBoost eval_metric parameter to optimize based on rmse instead of error.
  • C. Increase the XGBoost max_depth parameter because the model is currently underfitting the data.
  • D. Change the XGBoost eval_metric parameter to optimize based on AUC instead of error.
  • E. Decrease the XGBoost max_depth parameter because the model is currently overfitting the data.

Answer: D,E


NEW QUESTION # 59
You work for a magazine publisher and have been tasked with predicting whether customers will cancel their annual subscription. In your exploratory data analysis, you find that 90% of individuals renew their subscription every year, and only 10% of individuals cancel their subscription. After training a NN Classifier, your model predicts those who cancel their subscription with 99% accuracy and predicts those who renew their subscription with 82% accuracy. How should you interpret these results?

  • A. This is not a good result because the model should have a higher accuracy for those who renew their subscription than for those who cancel their subscription.
  • B. This is a good result because predicting those who cancel their subscription is more difficult, since there is less data for this group.
  • C. This is a good result because the accuracy across both groups is greater than 80%.
  • D. This is not a good result because the model is performing worse than predicting that people will always renew their subscription.

Answer: D

Explanation:
In this case, the model has a high accuracy of 99% for identifying customers who cancel their subscriptions, but a lower accuracy of 82% for identifying customers who renew their subscriptions. However, this does not necessarily mean that the model is performing well, because 90% of the customers renew their subscription, so if the model always predicts that customers will renew, it will be correct 90% of the time. Therefore, the model's performance is worse than the baseline of always predicting that customers will renew their subscription.
https://en.wikipedia.org/wiki/Imbalanced_data
https://machinelearningmastery.com/baseline-performance-machine-learning-algorithms/


NEW QUESTION # 60
A data scientist needs to identify fraudulent user accounts for a company's ecommerce platform. The company wants the ability to determine if a newly created account is associated with a previously known fraudulent user.
The data scientist is using AWS Glue to cleanse the company's application logs during ingestion.
Which strategy will allow the data scientist to identify fraudulent accounts?

  • A. Create a FindMatches machine learning transform in AWS Glue.
  • B. Search for duplicate accounts in the AWS Glue Data Catalog.
  • C. Create an AWS Glue crawler to infer duplicate accounts in the source data.
  • D. Execute the built-in FindDuplicates Amazon Athena query.

Answer: A

Explanation:
Explanation/Reference: https://docs.aws.amazon.com/glue/latest/dg/machine-learning.html


NEW QUESTION # 61
A company that promotes healthy sleep patterns by providing cloud-connected devices currently hosts a sleep tracking application on AWS. The application collects device usage information from device users. The company's Data Science team is building a machine learning model to predict if and when a user will stop utilizing the company's devices. Predictions from this model are used by a downstream application that determines the best approach for contacting users.
The Data Science team is building multiple versions of the machine learning model to evaluate each version against the company's business goals. To measure long-term effectiveness, the team wants to run multiple versions of the model in parallel for long periods of time, with the ability to control the portion of inferences served by the models.
Which solution satisfies these requirements with MINIMAL effort?

  • A. Build and host multiple models in Amazon SageMaker. Create an Amazon SageMaker endpoint configuration with multiple production variants. Programmatically control the portion of the inferences served by the multiple models by updating the endpoint configuration.
  • B. Build and host multiple models in Amazon SageMaker. Create multiple Amazon SageMaker endpoints, one for each model. Programmatically control invoking different models for inference at the application layer.
  • C. Build and host multiple models in Amazon SageMaker Neo to take into account different types of medical devices. Programmatically control which model is invoked for inference based on the medical device type.
  • D. Build and host multiple models in Amazon SageMaker. Create a single endpoint that accesses multiple models. Use Amazon SageMaker batch transform to control invoking the different models through the single endpoint.

Answer: D


NEW QUESTION # 62
You work for a biotech startup that is experimenting with deep learning ML models based on properties of biological organisms. Your team frequently works on early-stage experiments with new architectures of ML models, and writes custom TensorFlow ops in C++. You train your models on large datasets and large batch sizes. Your typical batch size has 1024 examples, and each example is about 1 MB in size. The average size of a network with all weights and embeddings is 20 GB. What hardware should you choose for your models?

  • A. A cluster with 2 a2-megagpu-16g machines, each with 16 NVIDIA Tesla A100 GPUs (640 GB GPU memory in total), 96 vCPUs, and 1.4 TB RAM
  • B. A cluster with 4 n1-highcpu-96 machines, each with 96 vCPUs and 86 GB RAM
  • C. A cluster with 2 n1-highcpu-64 machines, each with 8 NVIDIA Tesla V100 GPUs (128 GB GPU memory in total), and a n1-highcpu-64 machine with 64 vCPUs and 58 GB RAM
  • D. A cluster with an n1-highcpu-64 machine with a v2-8 TPU and 64 GB RAM

Answer: A


NEW QUESTION # 63
You are an ML engineer at a bank. You have developed a binary classification model using AutoML Tables to predict whether a customer will make loan payments on time. The output is used to approve or reject loan requests. One customer's loan request has been rejected by your model, and the bank's risks department is asking you to provide the reasons that contributed to the model's decision. What should you do?

  • A. Vary features independently to identify the threshold per feature that changes the classification.
  • B. Use the feature importance percentages in the model evaluation page.
  • C. Use the correlation with target values in the data summary page.
  • D. Use local feature importance from the predictions.

Answer: B


NEW QUESTION # 64
A data scientist wants to use Amazon Forecast to build a forecasting model for inventory demand for a retail company. The company has provided a dataset of historic inventory demand for its products as a .csv file stored in an Amazon S3 bucket. The table below shows a sample of the dataset.

How should the data scientist transform the data?

  • A. Use a Jupyter notebook in Amazon SageMaker to transform the data into the optimized protobuf recordIO format. Upload the dataset in this format to Amazon S3.
  • B. Use ETL jobs in AWS Glue to separate the dataset into a target time series dataset and an item metadata dataset. Upload both datasets as .csv files to Amazon S3.
  • C. Use AWS Batch jobs to separate the dataset into a target time series dataset, a related time series dataset, and an item metadata dataset. Upload them directly to Forecast from a local machine.
  • D. Use a Jupyter notebook in Amazon SageMaker to separate the dataset into a related time series dataset and an item metadata dataset. Upload both datasets as tables in Amazon Aurora.

Answer: D


NEW QUESTION # 65
Your team is building a convolutional neural network (CNN)-based architecture from scratch. The preliminary experiments running on your on-premises CPU-only infrastructure were encouraging, but have slow convergence. You have been asked to speed up model training to reduce time-to-market. You want to experiment with virtual machines (VMs) on Google Cloud to leverage more powerful hardware. Your code does not include any manual device placement and has not been wrapped in Estimator model-level abstraction. Which environment should you train your model on?

  • A. A Deep Learning VM with an n1-standard-2 machine and 1 GPU with all libraries pre-installed.
  • B. AVM on Compute Engine and 8 GPUs with all dependencies installed manually.
  • C. AVM on Compute Engine and 1 TPU with all dependencies installed manually.
  • D. A Deep Learning VM with more powerful CPU e2-highcpu-16 machines with all libraries pre-installed.

Answer: C


NEW QUESTION # 66
You work as an ML engineer at a social media company, and you are developing a visual filter for users' profile photos. This requires you to train an ML model to detect bounding boxes around human faces. You want to use this filter in your company's iOS-based mobile phone application. You want to minimize code development and want the model to be optimized for inference on mobile phones. What should you do?

  • A. Train a model using AutoML Vision and use the "export for Coral" option.
  • B. Train a model using AutoML Vision and use the "export for Core ML" option.
  • C. Train a custom TensorFlow model and convert it to TensorFlow Lite (TFLite).
  • D. Train a model using AutoML Vision and use the "export for TensorFlow.js" option.

Answer: B


NEW QUESTION # 67
You are a data scientist at an industrial equipment manufacturing company. You are developing a regression model to estimate the power consumption in the company's manufacturing plants based on sensor data collected from all of the plants. The sensors collect tens of millions of records every day. You need to schedule daily training runs for your model that use all the data collected up to the current date. You want your model to scale smoothly and require minimal development work. What should you do?

  • A. Develop a regression model using BigQuery ML.
  • B. Develop a custom scikit-learn regression model, and optimize it using Vertex Al Training
  • C. Develop a custom PyTorch regression model, and optimize it using Vertex Al Training
  • D. Develop a custom TensorFlow regression model, and optimize it using Vertex Al Training.

Answer: A

Explanation:
BigQuery ML is a powerful tool that allows you to build and deploy machine learning models directly within BigQuery, Google's fully-managed, serverless data warehouse. It allows you to create regression models using SQL, which is a familiar and easy-to-use language for many data scientists. It also allows you to scale smoothly and require minimal development work since you don't have to worry about cluster management and it's fully-managed by Google.
BigQuery ML also allows you to run your training on the same data where it's stored, this will minimize data movement, and thus minimize cost and time.
Reference:
BigQuery ML
BigQuery ML for regression
BigQuery ML for scalability


NEW QUESTION # 68
You are building a real-time prediction engine that streams files which may contain Personally Identifiable Information (Pll) to Google Cloud. You want to use the Cloud Data Loss Prevention (DLP) API to scan the files. How should you ensure that the Pll is not accessible by unauthorized individuals?

  • A. Periodically conduct a bulk scan of that bucket using the DLP API, and move the data to either the Sensitive or Non-Sensitive bucket
  • B. Create two buckets of data Sensitive and Non-sensitive Write all data to the Non-sensitive bucket Periodically conduct a bulk scan of that bucket using the DLP API, and move the sensitive data to the Sensitive bucket
  • C. Stream all files to Google CloudT and then write the data to BigQuery Periodically conduct a bulk scan of the table using the DLP API.
  • D. Create three buckets of data: Quarantine, Sensitive, and Non-sensitive Write all data to the Quarantine bucket.
  • E. Stream all files to Google Cloud, and write batches of the data to BigQuery While the data is being written to BigQuery conduct a bulk scan of the data using the DLP API.

Answer: C


NEW QUESTION # 69
You are training an LSTM-based model on Al Platform to summarize text using the following job submission script:

You want to ensure that training time is minimized without significantly compromising the accuracy of your model. What should you do?

  • A. Modify the 'scale-tier' parameter
  • B. Modify the 'learning rate' parameter
  • C. Modify the batch size' parameter
  • D. Modify the 'epochs' parameter

Answer: A

Explanation:
https://cloud.google.com/ai-platform/training/docs/machine-types#scale_tiers Google may optimize the configuration of the scale tiers for different jobs over time, based on customer feedback and the availability of cloud resources. Each scale tier is defined in terms of its suitability for certain types of jobs. Generally, the more advanced the tier, the more machines are allocated to the cluster, and the more powerful the specifications of each virtual machine. As you increase the complexity of the scale tier, the hourly cost of training jobs, measured in training units, also increases. See the pricing page to calculate the cost of your job.


NEW QUESTION # 70
Your team is working on an NLP research project to predict political affiliation of authors based on articles they have written. You have a large training dataset that is structured like this:

You followed the standard 80%-10%-10% data distribution across the training, testing, and evaluation subsets. How should you distribute the training examples across the train-test-eval subsets while maintaining the 80-10-10 proportion?
A)

B)

C)

D)

  • A. Option B
  • B. Option A
  • C. Option D
  • D. Option C

Answer: A

Explanation:
If we just put inside the Training set , Validation set and Test set , randomly Text, Paragraph or sentences the model will have the ability to learn specific qualities about The Author's use of language beyond just his own articles. Therefore the model will mixed up different opinions. Rather if we divided things up a the author level, so that given authors were only on the training data, or only in the test data or only in the validation data. The model will find more difficult to get a high accuracy on the test validation (What is correct and have more sense!). Because it will need to really focus in author by author articles rather than get a single political affiliation based on a bunch of mixed articles from different authors. https://developers.google.com/machine-learning/crash-course/18th-century-literature For example, suppose you are training a model with purchase data from a number of stores. You know, however, that the model will be used primarily to make predictions for stores that are not in the training data. To ensure that the model can generalize to unseen stores, you should segregate your data sets by stores. In other words, your test set should include only stores different from the evaluation set, and the evaluation set should include only stores different from the training set. https://cloud.google.com/automl-tables/docs/prepare#ml-use


NEW QUESTION # 71
You are building a linear model with over 100 input features, all with values between -1 and 1. You suspect that many features are non-informative. You want to remove the non-informative features from your model while keeping the informative ones in their original form. Which technique should you use?

  • A. After building your model, use Shapley values to determine which features are the most informative.
  • B. Use L1 regularization to reduce the coefficients of uninformative features to 0.
  • C. Use Principal Component Analysis to eliminate the least informative features.
  • D. Use an iterative dropout technique to identify which features do not degrade the model when removed.

Answer: A


NEW QUESTION # 72
......


Career Bonuses

The Google Professional Machine Learning Engineer certification proves that the successful candidates possess sufficient knowledge and skills to design and create scalable solutions for optimal performance. Some of the job roles that these individuals can consider include a Data Engineer, a Senior Data Engineer, a Machine Learning Engineer, a Technical Solutions Engineer, a Software Engineer, and a Cloud Infrastructure Engineer, among others. The median salary that the certificate holders can count on is around $140,000 per annum.


Google Professional Machine Learning Engineer exam is an advanced-level certification offered by Google Cloud. Google Professional Machine Learning Engineer certification is aimed at individuals who have extensive experience in machine learning and are seeking to become experts in this field. Professional-Machine-Learning-Engineer exam is designed to test the candidate's proficiency in designing, building, and deploying machine learning models using Google Cloud technologies.

 

Pass Your Google Exam with Professional-Machine-Learning-Engineer Exam Dumps: https://certtree.2pass4sure.com/Google-Cloud-Certified/Professional-Machine-Learning-Engineer-actual-exam-braindumps.html