FlashGenius Logo FlashGenius
Login Sign Up

Databricks Certified Data Engineer Associate Practice Questions: Data Governance and Security Domain

Test your Databricks Certified Data Engineer Associate knowledge with 10 practice questions from the Data Governance and Security domain. Includes detailed explanations and answers.

Databricks Certified Data Engineer Associate Practice Questions

Master the Data Governance and Security Domain

Test your knowledge in the Data Governance and Security domain with these 10 practice questions. Each question is designed to help you prepare for the Databricks Certified Data Engineer Associate certification exam with detailed explanations to reinforce your learning.

Question 1

A retail company is launching a formal data governance program. The CIO says, "We already have firewalls, encryption, and strict IAM policies, so governance is basically covered." The data governance lead disagrees and explains that additional elements are needed beyond security controls. Which activity best illustrates data governance rather than data security?

A) Configuring database encryption for all customer tables

B) Defining business-approved rules for how customer address data must be validated and corrected

C) Restricting VPN access to the corporate network to approved devices

D) Enforcing multi-factor authentication for all analytics users

Show Answer & Explanation

Correct Answer: B

Explanation:

Data governance focuses on how data is managed, defined, and used, including quality standards and business rules. Defining business-approved rules for validating and correcting customer address data is a governance activity centered on data quality and standardization. The other options are technical security controls that protect access and confidentiality but do not define how data should be managed or used.

Question 2

A media company uses serverless compute for interactive BI dashboards and ad-hoc analytics on a governed data lakehouse. Over the last week, finance noticed a sudden spike in serverless compute charges. At the same time, analysts report occasional query throttling and longer wait times during peak hours. The platform team confirms that RBAC, row-level security, and masking policies are correctly configured. They also see that a new marketing campaign has led to many more concurrent dashboard users and several complex ad-hoc queries. Which action should the platform team prioritize to address both the cost spike and performance issues while preserving governance?

A) Implement cost and usage monitoring with alerts and per-team budgets for serverless workloads, then review and optimize the most expensive and highly concurrent queries.

B) Disable row-level security and masking policies to reduce overhead, assuming this will lower serverless costs and reduce throttling.

C) Switch all workloads from serverless to a single large provisioned cluster without monitoring, assuming fixed capacity will automatically control cost and performance.

D) Increase serverless concurrency limits indefinitely so that all queries run immediately, which will eliminate throttling and reduce cost per query.

Show Answer & Explanation

Correct Answer: A

Explanation:

The spike is driven by increased concurrency and complex queries. Implementing monitoring, budgets, and alerts provides visibility and governance over serverless usage, and optimizing expensive or highly concurrent queries can reduce both cost and throttling. Governance controls remain intact because they are independent of the compute model.

Question 3

A vendor has been consuming data from your Delta Sharing share for several months using an open recipient. Their contract has ended, and your legal team requires that the vendor must not be able to access any new data from your environment. However, you understand that the vendor may already have downloaded some of the data into their own systems. What is the most appropriate action to take in your Databricks environment to meet the legal requirement?

A) Revoke or delete the open recipient associated with the vendor so they can no longer query the shared data.

B) Delete the underlying Unity Catalog tables so the vendor cannot access any of the data they previously received.

C) Change the storage location of the underlying Delta tables so that the vendor’s existing credentials no longer work.

D) Rotate the vendor’s open recipient token but keep the share and recipient active so they can still query historical data only.

Show Answer & Explanation

Correct Answer: A

Explanation:

Revoking or deleting the open recipient immediately prevents the vendor from making any future queries via Delta Sharing, satisfying the requirement that they cannot access new data. This does not delete data they have already copied, but it stops further access from your environment. Deleting tables or changing storage locations is unnecessary and disruptive, and rotating the token while keeping the recipient active would still allow ongoing access.

Question 4

A global e-commerce company has introduced a four-level data classification scheme: Public, Internal, Confidential, and Restricted. They have labeled their customer transaction tables as "Confidential" but have not yet changed any technical settings. Which next step best demonstrates using classification to drive concrete controls?

A) Publishing a slide deck explaining the four classification levels to all employees

B) Requiring that all "Confidential" tables, including customer transactions, are encrypted at rest and have access limited to specific roles

C) Renaming all "Confidential" tables with a CONF_ prefix in the database

D) Archiving all "Confidential" data to cheaper storage to reduce infrastructure costs

Show Answer & Explanation

Correct Answer: B

Explanation:

A classification scheme is effective only when it drives specific controls. Requiring encryption at rest and restricting access to defined roles for "Confidential" tables directly ties the label to concrete security measures. Simply communicating the scheme (A), renaming tables (C), or archiving for cost reasons (D) does not ensure that confidential data receives stronger protection.

Question 5

An analytics team reports frequent inconsistencies in key business metrics between departments, even though they are all sourcing data from the same enterprise data warehouse. A data catalog exists, but many critical tables lack clear business definitions, owners, or data quality rules. Users are starting to distrust the data. From a governance perspective, what is the most effective action to address this issue?

A) Increase warehouse compute resources so that queries run faster and reduce user frustration

B) Assign data stewards and data owners to critical domains to define standard business definitions, data quality rules, and accountability

C) Turn off the data catalog because incomplete metadata is confusing users

D) Grant all users administrative access so they can fix data issues directly in the warehouse

Show Answer & Explanation

Correct Answer: B

Explanation:

The core problem is a lack of governance over definitions and data quality, not platform performance. Assigning data owners and stewards to critical domains establishes accountability for standardizing business definitions, defining data quality rules, and maintaining metadata in the catalog, which directly addresses inconsistent metrics and trust issues.

Question 6

An insurance company has implemented role-based access control for its cloud data warehouse. Actuaries have a role that grants them access to detailed policy and claims data, including some sensitive attributes. An internal audit finds that several actuaries also have permissions to modify access control policies and grant roles to others, because they were added as administrators to speed up onboarding of new team members. From a governance and security standpoint, what is the best remediation?

A) Remove all access to sensitive data from actuaries and require them to request extracts from IT for each analysis

B) Keep actuaries as administrators but require them to log all access changes in a shared spreadsheet for transparency

C) Separate duties by revoking administrative privileges from actuaries, keeping their analytical data access via roles, and assigning access administration to a designated technical or security team

D) Do nothing, because actuaries need flexibility and are trusted professionals

Show Answer & Explanation

Correct Answer: C

Explanation:

Good governance and security practice requires segregation of duties and least privilege. Actuaries should retain the analytical access they need via roles, but administrative privileges to grant or modify access should be handled by a designated technical or security team. Removing all sensitive access (A) is overly restrictive, logging changes in a spreadsheet (B) does not fix the core risk, and doing nothing (D) ignores segregation of duties principles.

Question 7

A healthcare analytics team wants to use production patient data in a non-production environment to test new machine learning models. Regulations require that individuals must not be identifiable in test environments. The team proposes to replace patient IDs with a hash of the ID, without any additional changes. As the data privacy steward, what should you recommend?

A) Approve the proposal because hashing the patient IDs makes the data anonymous and compliant

B) Reject the proposal and require full anonymization so that individuals cannot be re-identified, even when combined with other attributes

C) Approve the proposal but add encryption at rest to the test environment to ensure anonymity

D) Approve the proposal as long as access to the test environment is restricted to data scientists

Show Answer & Explanation

Correct Answer: B

Explanation:

Hashing identifiers alone typically results in pseudonymization, not anonymization, because records can still be linked to individuals when combined with other attributes. Where regulations require that individuals must not be identifiable in test environments, data must be anonymized so that re-identification is not reasonably possible, even when other data is available. Option B correctly reflects this requirement.

Question 8

A healthcare analytics team wants to share a dataset with an external research partner. The dataset includes patient demographics, diagnosis codes, and a persistent patient ID that can be linked back to the hospital’s systems. The partner needs to perform longitudinal analysis over time but must not be able to identify individual patients. Which approach best balances analytical usefulness with privacy requirements?

A) Provide the full dataset as-is since the partner is under contract and uses secure networks

B) Replace patient IDs with randomly generated tokens that only the hospital can map back to real identities

C) Mask all diagnosis codes with generic placeholders such as 'Condition A', 'Condition B', etc.

D) Remove only names and addresses and keep the original patient IDs for easier analysis

Show Answer & Explanation

Correct Answer: B

Explanation:

Replacing patient IDs with randomly generated tokens that only the hospital can reverse is pseudonymization. It preserves the ability to track patients over time for longitudinal analysis while preventing the external partner from directly identifying individuals, achieving a good balance between utility and privacy.

Question 9

In a financial services company, a new customer analytics platform is being built. The head of infrastructure claims that because their team manages the databases, they are the data owners and will decide who can access customer data. The Chief Risk Officer disagrees. According to good data governance practice, who should be accountable for defining access and usage requirements for customer data?

A) The infrastructure/DBA team, because they control the physical databases

B) The cloud provider, because they host the data platform

C) A designated business data owner from the customer operations or product area

D) The data engineering team, because they design the data pipelines

Show Answer & Explanation

Correct Answer: C

Explanation:

Data owners are typically business stakeholders who are accountable for how data is used, including access, quality, and retention requirements. For customer data, this is usually a leader from customer operations, product, or a similar business domain. Infrastructure/DBA and engineering teams act as custodians implementing controls, and the cloud provider is responsible only for platform infrastructure under the shared responsibility model.

Question 10

A healthcare analytics platform uses both provisioned clusters and serverless compute. Data engineers run scheduled ELT pipelines on provisioned clusters, and analysts use serverless SQL for interactive queries. All datasets contain regulated patient information protected by RBAC, row-level security, masking, and audit logging. The team plans to move some ELT pipelines to serverless compute to reduce cluster management overhead. These pipelines must: - Run under a managed identity or service principal with least-privilege access, - Maintain existing data governance controls, - Avoid surprises in performance when large backfills are triggered. Which design best meets these requirements?

A) Run ELT pipelines on serverless compute using a dedicated service principal with only the required data permissions, keep all existing governance policies on the datasets, and define monitoring and alerting for job duration and resource usage.

B) Run ELT pipelines on serverless compute using a highly privileged shared admin account so that permissions do not block any transformations, and disable masking to simplify access.

C) Keep all ELT pipelines on provisioned clusters because serverless cannot use managed identities or service principals and cannot support existing governance controls.

D) Move ELT pipelines to serverless compute and rely on serverless to automatically optimize performance and enforce all necessary governance, so explicit RBAC and row-level security are no longer needed.

Show Answer & Explanation

Correct Answer: A

Explanation:

Using a dedicated service principal or managed identity with least-privilege access aligns with secure automation practices. Governance controls such as RBAC, row-level security, masking, and audit logging remain on the datasets and apply regardless of compute. Monitoring job duration and resource usage helps manage performance variability, especially during large backfills.

Ready to Accelerate Your Databricks Certified Data Engineer Associate Preparation?

Join thousands of professionals who are advancing their careers through expert certification preparation with FlashGenius.

  • ✅ Unlimited practice questions across all Databricks Certified Data Engineer Associate domains
  • ✅ Full-length exam simulations with real-time scoring
  • ✅ AI-powered performance tracking and weak area identification
  • ✅ Personalized study plans with adaptive learning
  • ✅ Mobile-friendly platform for studying anywhere, anytime
  • ✅ Expert explanations and study resources
Start Free Practice Now

Already have an account? Sign in here

About Databricks Certified Data Engineer Associate Certification

The Databricks Certified Data Engineer Associate certification validates your expertise in data governance and security and other critical domains. Our comprehensive practice questions are carefully crafted to mirror the actual exam experience and help you identify knowledge gaps before test day.

Practice Resources for Databricks DEA Certification

Strengthen your DB-DEA prep with focused practice questions across the most important exam domains.

Recommended Guide

Databricks Data Engineer Associate: Your Complete 2026 Guide

Preparing for the DB-DEA exam? This complete guide covers exam structure, key topics, study strategy, and real-world preparation tips to help you pass on your first attempt.

  • ✔️ Full exam breakdown (latest blueprint)
  • ✔️ Key domains and high-weight topics
  • ✔️ Study roadmap + preparation strategy
  • ✔️ Tips to avoid common exam mistakes
📘 Read the Complete Guide