Building Better Data Security using BI Analytics
Updated: Sep 5, 2021
Securing data access is a pathway problem, and that makes it hard. On a toy database presented in a blog article, you might have two or three pathways, a clean setup, well-known users. In the real world of N users, you have a factorial number of paths.
Nailing down authentication is a good start, but it is only a start. Sensitive data has monetary value, and securing the data starts with authentication and then creating and monitoring trusted paths to ensure only trusted paths access the data and detect any deviation from normal pathways.
Trusted path analysis starts with gathering information on the pathways of trusted users because they are the weakest link in securing your data.
IP Source Address
Data SELECT Access Grants
Sensitive Data Destinations
Build Security Dashboards to Understand Trusted Paths
Build the Trusted Path Analysis by creating a Dashboard containing important security questions on your database.
Find Trusted Paths by asking Questions and Grouping by Domain
In the brainstorming process, I like to create domain groups for each of the questions I would like to answer. The goal is to improve the data navigation experience by grouping questions by domain.
This step is brainstorming. You may throw out some of these questions as impractical or you may consolidate some of the questions into a single question. The idea is to get a list of questions and then build data constructs to answer those questions.
How many Administrative Users do I have?
How many overall Users?
What person are users not using MFA?
How many service users do I have?
How many service users have administrative permissions?
What new IP addresses are accessing my data?
What new applications are accessing my data?
What are the counts of users by data type?
What are my recent failed login counts grouped by IP?
What users have recently modified data and on what tables?
What are the top queries?
Are there any new queries this week?
Do I have any new OWASP security alerts?
What Cloud Data Lakes Contain my Data?
What external keys access my data lakes?
What is the recent DDL count by database and object?
Create SQL Views to answer Security Questions
Start off by creating views to answer the questions. By using views, you push the logic into the database so that you can maintain single, consistent sources of truth across any applications that access the data.
Question 1: What Authorizations do my Users have?
To answer question 1, we use the data gathered from the stored procedures that synchronize the SNOWFLAKE internal schemas to an ADMIN schema.
The DDL to create the view contains business logic delineating SUPER_USER, ADMIN_USER, POWER_USER, and READ_ONLY_USER.
To ensure the view supports your security needs, build a dashboard of the view on an Admin BI tool and share it with your colleagues. Here, I am using my current favorite BI tool, Metabase, but the use of views should make your choice of BI tool arbitrary.
Here I am viewing my view dashboard in Metabase to get an overview of the users and their relative authorization levels.
Take it a Step Further and Make your Views Actionable
A common security objective for a Snowflake administrator is to separate roles into Functional and Access roles. Functional roles correspond to job descriptions and the associated application: BI_DEVELOPER and BI_SERVICE for example. Access roles have to do with data. Separating the roles makes it easy to provision new employees.
Suppose we have several applications, then each of the apps might have a DEVELOPER and a SERVICE role.
BI - A BI Application (in this case Metabase)
ML - A Machine Learning Application
ETL - The ETL system
We associate each of the Functional roles with Access roles on the data.
Functional and Access Role Heirarchy
Say the database is called CORE then we have CORE_CRUD and CORE_READ_ONLY. For other databases or schemas, an Access role deals specifically with how we access the data.
We want to ensure users only have Functional roles and Functional roles only have Access roles.
The questions we want to ask in order to migrate to these new roles are:
What are all the roles in my database?
What Functional Roles use what Access Roles
What Functional and Default Roles use data directly?
What Access roles use what data?
What Access roles reference other Access roles?
What Access roles reference Functional roles?
What are all the roles in my database?
I have a set of SQL Scripts that create a Snowflake based on a role. Here is a set of SQL scripts that create a table ADMIN.SNOWFLAKE_ROLE for building actionable role queries in a BI tool
What Functional Roles use what Access Roles?
This question is about understanding the relationship between the functional role (Admin Analyst) and the access level (Admin Schema Read Only). Here is the SQL to take a snapshot of the database internals.
This question ultimately should tell what Functional roles are configured correctly, because it only shows the Functional roles that connect to an Access role.
What Roles Violate the Functional and Access Role Hierarchy?
We want to know what roles in our database are not in compliance with the rules of the Functional and Access Role Hierarchy.
Functional Roles should assign Users to Access Roles
Access Roles should assign Functional Roles to system and data privileges
To visualize what rules are in compliance, create a view that summarizes any roles by compliance. We break out the base system roles of Snowflake (ACCOUNTADMIN, SYSADMIN, ..) and roles we create (OWLMTN_ADMIN, ADMIN_READ_ONLY, ..)
Once we have the access roles, then we can remediate creating functional roles with the same users as the access roles, assign the functional role to the access role, and revoke the user grant on the access role.