Get Latest Jan-2022 Conduct effective penetration tests using ActualPDF Professional-Data-Engineer exam [Q127-Q147]

Get Latest [Jan-2022] Conduct effective penetration tests using ActualPDF Professional-Data-Engineer

Penetration testers simulate Professional-Data-Engineer exam PDF

Google Professional Data Engineer Practice Test Questions, Google Professional Data Engineer Exam Practice Test Questions

The Google Professional Data Engineer certification is designed to evaluate the candidates’ skills in designing data processing systems and ensuring solution quality. It is also created to measure their competence in building and operationalizing data processing systems and operationalizing ML models. The potential applicants must complete a single exam to get certified.

For more info visit:

Google-provided tutorials Community-provided tutorials Google-Data-Engineer-Practice-Test

NEW QUESTION 127
Your company's on-premises Apache Hadoop servers are approaching end-of-life, and IT has decided to migrate the cluster to Google Cloud Dataproc. A like-for-like migration of the cluster would require 50 TB of Google Persistent Disk per node. The CIO is concerned about the cost of using that much block storage. You want to minimize the storage cost of the migration. What should you do?

A. Tune the Cloud Dataproc cluster so that there is just enough disk for all data.
B. Use preemptible virtual machines (VMs) for the Cloud Dataproc cluster.
C. Migrate some of the cold data into Google Cloud Storage, and keep only the hot data in Persistent Disk.
D. Put the data into Google Cloud Storage.

Answer: B

NEW QUESTION 128
MJTelco needs you to create a schema in Google Bigtable that will allow for the historical analysis of the last 2 years of records. Each record that comes in is sent every 15 minutes, and contains a unique identifier of the device and a data record. The most common query is for all the data for a given device for a given day. Which schema should you use?

A. Rowkey: dateColumn data: device_id, data_point
B. Rowkey: date#data_pointColumn data: device_id
C. Rowkey: data_pointColumn data: device_id, date
D. Rowkey: device_idColumn data: date, data_point
E. Rowkey: date#device_idColumn data: data_point

Answer: C

NEW QUESTION 129
The Dataflow SDKs have been recently transitioned into which Apache service?

A. Apache Beam
B. Apache Hadoop
C. Apache Kafka
D. Apache Spark

Answer: A

Explanation:
Dataflow SDKs are being transitioned to Apache Beam, as per the latest Google directive Reference: https://cloud.google.com/dataflow/docs/

NEW QUESTION 130
Which of the following statements is NOT true regarding Bigtable access roles?

A. To give a user access to only one table in a project, grant the user the Bigtable Editor role for that table.
B. To give a user access to only one table in a project, you must configure access through your application.
C. Using IAM roles, you cannot give a user access to only one table in a project, rather than all tables in a project.
D. You can configure access control only at the project level.

Answer: A

Explanation:
For Cloud Bigtable, you can configure access control at the project level. For example, you can grant the ability to:
Read from, but not write to, any table within the project.
Read from and write to any table within the project, but not manage instances.
Read from and write to any table within the project, and manage instances.
Reference: https://cloud.google.com/bigtable/docs/access-control

NEW QUESTION 131
To give a user read permission for only the first three columns of a table, which access control method would you use?

A. Predefined role
B. Primitive role
C. Authorized view
D. It's not possible to give access to only the first three columns of a table.

Answer: C

Explanation:
An authorized view allows you to share query results with particular users and groups without giving them read access to the underlying tables. Authorized views can only be created in a dataset that does not contain the tables queried by the view.
When you create an authorized view, you use the view's SQL query to restrict access to only the rows and columns you want the users to see.
Reference: https://cloud.google.com/bigquery/docs/views#authorized-views

NEW QUESTION 132
You are designing the database schema for a machine learning-based food ordering service that will
predict what users want to eat. Here is some of the information you need to store:
The user profile: What the user likes and doesn't like to eat

The user account information: Name, address, preferred meal times

The order information: When orders are made, from where, to whom

The database will be used to store all the transactional data of the product. You want to optimize the data
schema. Which Google Cloud Platform product should you use?

A. BigQuery
B. Cloud SQL
C. Cloud Datastore
D. Cloud Bigtable

Answer: A

NEW QUESTION 133
You are a retailer that wants to integrate your online sales capabilities with different in-home assistants, such as Google Home. You need to interpret customer voice commands and issue an order to the backend systems.
Which solutions should you choose?

A. Dialogflow Enterprise Edition
B. Cloud Speech-to-Text API
C. Cloud AutoML Natural Language
D. Cloud Natural Language API

Answer: A

NEW QUESTION 134
You are planning to use Google's Dataflow SDK to analyze customer data such as displayed below. Your project requirement is to extract only the customer name from the data source and then write to an output PCollection.
Tom,555 X street
Tim,553 Y street
Sam, 111 Z street
Which operation is best suited for the above data processing requirement?

A. Data extraction
B. Sink API
C. Source API
D. ParDo

Answer: D

Explanation:
Explanation
In Google Cloud dataflow SDK, you can use the ParDo to extract only a customer name of each element in your PCollection.
Reference: https://cloud.google.com/dataflow/model/par-do

NEW QUESTION 135
You work for a manufacturing plant that batches application log files together into a single log file once a day at 2:00 AM. You have written a Google Cloud Dataflow job to process that log file. You need to make sure the log file in processed once per day as inexpensively as possible. What should you do?

A. Manually start the Cloud Dataflow job each morning when you get into the office.
B. Configure the Cloud Dataflow job as a streaming job so that it processes the log data immediately.
C. Change the processing job to use Google Cloud Dataproc instead.
D. Create a cron job with Google App Engine Cron Service to run the Cloud Dataflow job.

Answer: D

NEW QUESTION 136
The _________ for Cloud Bigtable makes it possible to use Cloud Bigtable in a Cloud Dataflow pipeline.

A. Cloud Dataflow connector
B. BigQuery Data Transfer Service
C. DataFlow SDK
D. BiqQuery API

Answer: A

Explanation:
Explanation
The Cloud Dataflow connector for Cloud Bigtable makes it possible to use Cloud Bigtable in a Cloud Dataflow pipeline. You can use the connector for both batch and streaming operations.
Reference: https://cloud.google.com/bigtable/docs/dataflow-hbase

NEW QUESTION 137
Which of these operations can you perform from the BigQuery Web UI?

A. Upload multiple files using a wildcard.
B. Upload a 20 MB file.
C. Load data with nested and repeated fields.
D. Upload a file in SQL format.

Answer: C

Explanation:
Explanation
You can load data with nested and repeated fields using the Web UI.
You cannot use the Web UI to:
- Upload a file greater than 10 MB in size
- Upload multiple files at the same time
- Upload a file in SQL format
All three of the above operations can be performed using the "bq" command.
Reference: https://cloud.google.com/bigquery/loading-data

NEW QUESTION 138
An external customer provides you with a daily dump of data from their database. The data flows into Google Cloud Storage GCS as comma-separated values (CSV) files. You want to analyze this data in Google BigQuery, but the data could have rows that are formatted incorrectly or corrupted. How should you build this pipeline?

A. Import the data into BigQuery using the gcloud CLI and set max_bad_records to 0.
B. Run a Google Cloud Dataflow batch pipeline to import the data into BigQuery, and push errors to another dead-letter table for analysis.
C. Enable BigQuery monitoring in Google Stackdriver and create an alert.
D. Use federated data sources, and check data in the SQL query.

Answer: B

NEW QUESTION 139
You work for a car manufacturer and have set up a data pipeline using Google Cloud Pub/Sub to capture
anomalous sensor events. You are using a push subscription in Cloud Pub/Sub that calls a custom HTTPS
endpoint that you have created to take action of these anomalous events as they occur. Your custom
HTTPS endpoint keeps getting an inordinate amount of duplicate messages. What is the most likely cause
of these duplicate messages?

A. Your custom endpoint is not acknowledging messages within the acknowledgement deadline.
B. The Cloud Pub/Sub topic has too many messages published to it.
C. Your custom endpoint has an out-of-date SSL certificate.
D. The message body for the sensor event is too large.

Answer: C

NEW QUESTION 140
Flowlogistic Case Study
Company Overview
Flowlogistic is a leading logistics and supply chain provider. They help businesses throughout the world manage their resources and transport them to their final destination. The company has grown rapidly, expanding their offerings to include rail, truck, aircraft, and oceanic shipping.
Company Background
The company started as a regional trucking company, and then expanded into other logistics market. Because they have not updated their infrastructure, managing and tracking orders and shipments has become a bottleneck. To improve operations, Flowlogistic developed proprietary technology for tracking shipments in real time at the parcel level. However, they are unable to deploy it because their technology stack, based on Apache Kafka, cannot support the processing volume. In addition, Flowlogistic wants to further analyze their orders and shipments to determine how best to deploy their resources.
Solution Concept
Flowlogistic wants to implement two concepts using the cloud:
* Use their proprietary technology in a real-time inventory-tracking system that indicates the location of their loads
* Perform analytics on all their orders and shipment logs, which contain both structured and unstructured data, to determine how best to deploy resources, which markets to expand info. They also want to use predictive analytics to learn earlier when a shipment will be delayed.
Existing Technical Environment
Flowlogistic architecture resides in a single data center:
* Databases
* 8 physical servers in 2 clusters
* SQL Server - user data, inventory, static data
* 3 physical servers
* Cassandra - metadata, tracking messages
10 Kafka servers - tracking message aggregation and batch insert
* Application servers - customer front end, middleware for order/customs
* 60 virtual machines across 20 physical servers
* Tomcat - Java services
* Nginx - static content
* Batch servers
Storage appliances
* iSCSI for virtual machine (VM) hosts
* Fibre Channel storage area network (FC SAN) - SQL server storage
* Network-attached storage (NAS) image storage, logs, backups
* 10 Apache Hadoop /Spark servers
* Core Data Lake
* Data analysis workloads
* 20 miscellaneous servers
* Jenkins, monitoring, bastion hosts,
Business Requirements
* Build a reliable and reproducible environment with scaled panty of production.
* Aggregate data in a centralized Data Lake for analysis
* Use historical data to perform predictive analytics on future shipments
* Accurately track every shipment worldwide using proprietary technology
* Improve business agility and speed of innovation through rapid provisioning of new resources
* Analyze and optimize architecture for performance in the cloud
* Migrate fully to the cloud if all other requirements are met
Technical Requirements
* Handle both streaming and batch data
* Migrate existing Hadoop workloads
* Ensure architecture is scalable and elastic to meet the changing demands of the company.
* Use managed services whenever possible
* Encrypt data flight and at rest
* Connect a VPN between the production data center and cloud environment SEO Statement We have grown so quickly that our inability to upgrade our infrastructure is really hampering further growth and efficiency. We are efficient at moving shipments around the world, but we are inefficient at moving data around.
We need to organize our information so we can more easily understand where our customers are and what they are shipping.
CTO Statement
IT has never been a priority for us, so as our data has grown, we have not invested enough in our technology. I have a good staff to manage IT, but they are so busy managing our infrastructure that I cannot get them to do the things that really matter, such as organizing our data, building the analytics, and figuring out how to implement the CFO' s tracking technology.
CFO Statement
Part of our competitive advantage is that we penalize ourselves for late shipments and deliveries. Knowing where out shipments are at all times has a direct correlation to our bottom line and profitability. Additionally, I don't want to commit capital to building out a server environment.
Flowlogistic wants to use Google BigQuery as their primary analysis system, but they still have Apache Hadoop and Spark workloads that they cannot move to BigQuery. Flowlogistic does not know how to store the data that is common to both workloads. What should they do?

A. Store the common data in BigQuery and expose authorized views.
B. Store he common data in the HDFS storage for a Google Cloud Dataproc cluster.
C. Store the common data in BigQuery as partitioned tables.
D. Store the common data encoded as Avro in Google Cloud Storage.

Answer: A

NEW QUESTION 141
Which of the following are examples of hyperparameters? (Select 2 answers.)

A. Number of hidden layers
B. Number of nodes in each hidden layer
C. Biases
D. Weights

Answer: A,B

Explanation:
If model parameters are variables that get adjusted by training with existing data, your hyperparameters are the variables about the training process itself. For example, part of setting up a deep neural network is deciding how many "hidden" layers of nodes to use between the input layer and the output layer, as well as how many nodes each layer should use. These variables are not directly related to the training data at all.
They are configuration variables. Another difference is that parameters change during a training job, while the hyperparameters are usually constant during a job.
Weights and biases are variables that get adjusted during the training process, so they are not hyperparameters.
Reference: https://cloud.google.com/ml-engine/docs/hyperparameter-tuning-overview

NEW QUESTION 142
How would you query specific partitions in a BigQuery table?

A. Use the DAY column in the WHERE clause
B. Use the EXTRACT(DAY) clause
C. Use the __PARTITIONTIME pseudo-column in the WHERE clause
D. Use DATE BETWEEN in the WHERE clause

Answer: C

Explanation:
Partitioned tables include a pseudo column named _PARTITIONTIME that contains a date- based timestamp for data loaded into the table. To limit a query to particular partitions (such as Jan 1st and 2nd of 2017), use a clause similar to this:
WHERE _PARTITIONTIME BETWEEN TIMESTAMP('2017-01-01') AND
TIMESTAMP('2017-01-02')
Reference: https://cloud.google.com/bigquery/docs/partitioned-
tables#the_partitiontime_pseudo_column

NEW QUESTION 143
Flowlogistic's CEO wants to gain rapid insight into their customer base so his sales team can be better informed in the field. This team is not very technical, so they've purchased a visualization tool to simplify the creation of BigQuery reports. However, they've been overwhelmed by all the data in the table, and are spending a lot of money on queries trying to find the data they need. You want to solve their problem in the most cost-effective way. What should you do?

A. Export the data into a Google Sheet for virtualization.
B. Create an additional table with only the necessary columns.
C. Create identity and access management (IAM) roles on the appropriate columns, so only they appear in a query.
D. Create a view on the table to present to the virtualization tool.

Answer: D

NEW QUESTION 144
Which Cloud Dataflow / Beam feature should you use to aggregate data in an unbounded data source every hour based on the time when the data entered the pipeline?

A. An hourly watermark
B. The with Allowed Lateness method
C. A processing time trigger
D. An event time trigger

Answer: C

Explanation:
When collecting and grouping data into windows, Beam uses triggers to determine when to emit the aggregated results of each window.
Processing time triggers. These triggers operate on the processing time - the time when the data element is processed at any given stage in the pipeline.
Event time triggers. These triggers operate on the event time, as indicated by the timestamp on each data element. Beam's default trigger is event time-based.
Reference: https://beam.apache.org/documentation/programming-guide/#triggers

NEW QUESTION 145
You are building a new application that you need to collect data from in a scalable way. Data arrives continuously from the application throughout the day, and you expect to generate approximately 150 GB of JSON data per day by the end of the year. Your requirements are:
* Decoupling producer from consumer
* Space and cost-efficient storage of the raw ingested data, which is to be stored indefinitely
* Near real-time SQL query
* Maintain at least 2 years of historical data, which will be queried with SQ Which pipeline should you use to meet these requirements?

A. Create an application that provides an API. Write a tool to poll the API and write data to Cloud Storage as gzipped JSON files.
B. Create an application that publishes events to Cloud Pub/Sub, and create a Cloud Dataflow pipeline that transforms the JSON event payloads to Avro, writing the data to Cloud Storage and BigQuery.
C. Create an application that writes to a Cloud SQL database to store the data. Set up periodic exports of the database to write to Cloud Storage and load into BigQuery.
D. Create an application that publishes events to Cloud Pub/Sub, and create Spark jobs on Cloud Dataproc to convert the JSON data to Avro format, stored on HDFS on Persistent Disk.

Answer: A

NEW QUESTION 146
You are working on a niche product in the image recognition domain. Your team has developed a model that is dominated by custom C++ TensorFlow ops your team has implemented. These ops are used inside your main training loop and are performing bulky matrix multiplications. It currently takes up to several days to train a model. You want to decrease this time significantly and keep the cost low by using an accelerator on Google Cloud. What should you do?

A. Stay on CPUs, and increase the size of the cluster you're training your model on.
B. Use Cloud TPUs without any additional adjustment to your code.
C. Use Cloud TPUs after implementing GPU kernel support for your customs ops.
D. Use Cloud GPUs after implementing GPU kernel support for your customs ops.

Answer: C

NEW QUESTION 147
......

Introduction

Data engineers are responsible for finding trends in data sets and developing algorithms to help make raw data more useful to the enterprise. This IT role requires a significant set of technical skills, including a deep knowledge of SQL database design and multiple programming languages They collect, transform, and visualize data. The Data Engineer designs, builds, maintains, and troubleshoots data processing systems with a particular emphasis on the security, reliability, fault-tolerance,scalability, fidelity, and efficiency of such systems.

Tested Material Used To Professional-Data-Engineer Test Engine: https://pass4sure.actualpdf.com/Professional-Data-Engineer-real-questions.html

Get Latest Jan-2022 Conduct effective penetration tests using ActualPDF Professional-Data-Engineer exam [Q127-Q147]

Google Professional Data Engineer Practice Test Questions, Google Professional Data Engineer Exam Practice Test Questions

For more info visit:

Introduction

Related Articles

Useful Links

Latest test insides dumps

Contact Us