Databricks Certified Data Engineer Associate Practice Questions: Data Governance and Security Domain
Test your Databricks Certified Data Engineer Associate knowledge with 10 practice questions from the Data Governance and Security domain. Includes detailed explanations and answers.
Databricks Certified Data Engineer Associate Practice Questions
Master the Data Governance and Security Domain
Test your knowledge in the Data Governance and Security domain with these 10 practice questions. Each question is designed to help you prepare for the Databricks Certified Data Engineer Associate certification exam with detailed explanations to reinforce your learning.
Question 1
A retail company is launching a formal data governance program. The CIO says, "We already have firewalls, encryption, and strict IAM policies, so governance is basically covered." The data governance lead disagrees and explains that additional elements are needed beyond security controls. Which activity best illustrates data governance rather than data security?
Show Answer & Explanation
Correct Answer: B
Data governance focuses on how data is managed, defined, and used, including quality standards and business rules. Defining business-approved rules for validating and correcting customer address data is a governance activity centered on data quality and standardization. The other options are technical security controls that protect access and confidentiality but do not define how data should be managed or used.
Question 2
A media company uses serverless compute for interactive BI dashboards and ad-hoc analytics on a governed data lakehouse. Over the last week, finance noticed a sudden spike in serverless compute charges. At the same time, analysts report occasional query throttling and longer wait times during peak hours. The platform team confirms that RBAC, row-level security, and masking policies are correctly configured. They also see that a new marketing campaign has led to many more concurrent dashboard users and several complex ad-hoc queries. Which action should the platform team prioritize to address both the cost spike and performance issues while preserving governance?
Show Answer & Explanation
Correct Answer: A
The spike is driven by increased concurrency and complex queries. Implementing monitoring, budgets, and alerts provides visibility and governance over serverless usage, and optimizing expensive or highly concurrent queries can reduce both cost and throttling. Governance controls remain intact because they are independent of the compute model.
Question 3
A vendor has been consuming data from your Delta Sharing share for several months using an open recipient. Their contract has ended, and your legal team requires that the vendor must not be able to access any new data from your environment. However, you understand that the vendor may already have downloaded some of the data into their own systems. What is the most appropriate action to take in your Databricks environment to meet the legal requirement?
Show Answer & Explanation
Correct Answer: A
Revoking or deleting the open recipient immediately prevents the vendor from making any future queries via Delta Sharing, satisfying the requirement that they cannot access new data. This does not delete data they have already copied, but it stops further access from your environment. Deleting tables or changing storage locations is unnecessary and disruptive, and rotating the token while keeping the recipient active would still allow ongoing access.
Question 4
A global e-commerce company has introduced a four-level data classification scheme: Public, Internal, Confidential, and Restricted. They have labeled their customer transaction tables as "Confidential" but have not yet changed any technical settings. Which next step best demonstrates using classification to drive concrete controls?
Show Answer & Explanation
Correct Answer: B
A classification scheme is effective only when it drives specific controls. Requiring encryption at rest and restricting access to defined roles for "Confidential" tables directly ties the label to concrete security measures. Simply communicating the scheme (A), renaming tables (C), or archiving for cost reasons (D) does not ensure that confidential data receives stronger protection.
Question 5
An analytics team reports frequent inconsistencies in key business metrics between departments, even though they are all sourcing data from the same enterprise data warehouse. A data catalog exists, but many critical tables lack clear business definitions, owners, or data quality rules. Users are starting to distrust the data. From a governance perspective, what is the most effective action to address this issue?
Show Answer & Explanation
Correct Answer: B
The core problem is a lack of governance over definitions and data quality, not platform performance. Assigning data owners and stewards to critical domains establishes accountability for standardizing business definitions, defining data quality rules, and maintaining metadata in the catalog, which directly addresses inconsistent metrics and trust issues.
Question 6
An insurance company has implemented role-based access control for its cloud data warehouse. Actuaries have a role that grants them access to detailed policy and claims data, including some sensitive attributes. An internal audit finds that several actuaries also have permissions to modify access control policies and grant roles to others, because they were added as administrators to speed up onboarding of new team members. From a governance and security standpoint, what is the best remediation?
Show Answer & Explanation
Correct Answer: C
Good governance and security practice requires segregation of duties and least privilege. Actuaries should retain the analytical access they need via roles, but administrative privileges to grant or modify access should be handled by a designated technical or security team. Removing all sensitive access (A) is overly restrictive, logging changes in a spreadsheet (B) does not fix the core risk, and doing nothing (D) ignores segregation of duties principles.
Question 7
A healthcare analytics team wants to use production patient data in a non-production environment to test new machine learning models. Regulations require that individuals must not be identifiable in test environments. The team proposes to replace patient IDs with a hash of the ID, without any additional changes. As the data privacy steward, what should you recommend?
Show Answer & Explanation
Correct Answer: B
Hashing identifiers alone typically results in pseudonymization, not anonymization, because records can still be linked to individuals when combined with other attributes. Where regulations require that individuals must not be identifiable in test environments, data must be anonymized so that re-identification is not reasonably possible, even when other data is available. Option B correctly reflects this requirement.
Question 8
A healthcare analytics team wants to share a dataset with an external research partner. The dataset includes patient demographics, diagnosis codes, and a persistent patient ID that can be linked back to the hospital’s systems. The partner needs to perform longitudinal analysis over time but must not be able to identify individual patients. Which approach best balances analytical usefulness with privacy requirements?
Show Answer & Explanation
Correct Answer: B
Replacing patient IDs with randomly generated tokens that only the hospital can reverse is pseudonymization. It preserves the ability to track patients over time for longitudinal analysis while preventing the external partner from directly identifying individuals, achieving a good balance between utility and privacy.
Question 9
In a financial services company, a new customer analytics platform is being built. The head of infrastructure claims that because their team manages the databases, they are the data owners and will decide who can access customer data. The Chief Risk Officer disagrees. According to good data governance practice, who should be accountable for defining access and usage requirements for customer data?
Show Answer & Explanation
Correct Answer: C
Data owners are typically business stakeholders who are accountable for how data is used, including access, quality, and retention requirements. For customer data, this is usually a leader from customer operations, product, or a similar business domain. Infrastructure/DBA and engineering teams act as custodians implementing controls, and the cloud provider is responsible only for platform infrastructure under the shared responsibility model.
Question 10
A healthcare analytics platform uses both provisioned clusters and serverless compute. Data engineers run scheduled ELT pipelines on provisioned clusters, and analysts use serverless SQL for interactive queries. All datasets contain regulated patient information protected by RBAC, row-level security, masking, and audit logging. The team plans to move some ELT pipelines to serverless compute to reduce cluster management overhead. These pipelines must: - Run under a managed identity or service principal with least-privilege access, - Maintain existing data governance controls, - Avoid surprises in performance when large backfills are triggered. Which design best meets these requirements?
Show Answer & Explanation
Correct Answer: A
Using a dedicated service principal or managed identity with least-privilege access aligns with secure automation practices. Governance controls such as RBAC, row-level security, masking, and audit logging remain on the datasets and apply regardless of compute. Monitoring job duration and resource usage helps manage performance variability, especially during large backfills.
Ready to Accelerate Your Databricks Certified Data Engineer Associate Preparation?
Join thousands of professionals who are advancing their careers through expert certification preparation with FlashGenius.
- ✅ Unlimited practice questions across all Databricks Certified Data Engineer Associate domains
- ✅ Full-length exam simulations with real-time scoring
- ✅ AI-powered performance tracking and weak area identification
- ✅ Personalized study plans with adaptive learning
- ✅ Mobile-friendly platform for studying anywhere, anytime
- ✅ Expert explanations and study resources
Already have an account? Sign in here
About Databricks Certified Data Engineer Associate Certification
The Databricks Certified Data Engineer Associate certification validates your expertise in data governance and security and other critical domains. Our comprehensive practice questions are carefully crafted to mirror the actual exam experience and help you identify knowledge gaps before test day.
Practice Resources for Databricks DEA Certification
Strengthen your DB-DEA prep with focused practice questions across the most important exam domains.
Databricks Data Engineer Associate: Your Complete 2026 Guide
Preparing for the DB-DEA exam? This complete guide covers exam structure, key topics, study strategy, and real-world preparation tips to help you pass on your first attempt.
- ✔️ Full exam breakdown (latest blueprint)
- ✔️ Key domains and high-weight topics
- ✔️ Study roadmap + preparation strategy
- ✔️ Tips to avoid common exam mistakes