Skip to content

You are viewing documentation for Immuta version 2021.5.

For the latest version, view our documentation for Immuta SaaS or the latest self-hosted version.

Native Workspaces

Audience: Project members

Content Summary: This page provides an overview of native workspaces in projects.

Immuta Native Workspace Types:

  • Cloudera
  • Databricks
  • EMR
  • Snowflake

Prerequisite

Overview

Users with different levels of access can work together securely in a project; project workspaces offer further utility, allowing users to not only read data but also to write data back to Immuta.

With project equalization enabled, project members' levels of access will balance out to match each other, ensuring that no one has more or less access than the person they are working with and protecting against data leaks. After an Application Administrator connects a Cloudera, EMR, Databricks, or Snowflake integration, project owners can create project workspaces. Within workspaces members can query equalized data, collaborate, and write data back to Immuta, all within their native technology. The project equalization restricts data access to the equalized project and only the data sources within that project, guaranteeing that no data written to the project workspace will leak information.

Once that derived data is ready to be shared outside the workspace, it can be exposed as a derived data source in Immuta. The derived data source will inherit policies from its parent source(s), and it will then be available through Immuta outside the project.

Configuring Workspaces

Each of the workspaces outlined below must be configured by an Application Admin before the workspaces can be enabled by a project owner. To do so, Application Admins configure Immuta to a root location for all data to be written to. Then, when a user creates an Immuta project, it will automatically generate a subfolder in that root path and remote database associated with the project. Immuta only supports a single root location, so all projects will write to a subdirectory under this single root location.

Note: If an administrator changes the default directory, the Immuta user must have full access to that directory. Once any workspace is created, this directory can no longer be modified.

Administrators can place a configuration value in the cluster configuration (core-site.xml) to mark that cluster as unavailable for use as a workspace.

Immuta Native Workspace Types

Each of the Immuta native workspace types is described below and includes details about the configuration options and available data source types, when applicable. To learn how to create a native workspace, navigate to Create a Native Workspace.

Cloudera

This workspace allows native access to data on cluster without having to go through the Immuta SparkSession or Immuta Query Engine.

Accessing Data

Users will only be able to access the directory and database created for the workspace when acting under the project. The Immuta Spark SQL Session will apply policies to the data, so any data written to the workspace will already be compliant with the restrictions of the equalized project, where all members see data at the same level of access. When users are ready to write data back to Immuta, they should use the SparkSQL session to copy data into the workspace.

Workspace Configuration Options

  • Cloudera HDFS
  • Cloudera S3A

Available Data Source Types

  • Amazon S3 (Cloudera S3A)
  • Apache HDFS (Cloudera HDFS)

For more details, see Native Hadoop Workspaces.

For installation instructions, see Cloudera Native Workspace Configuration.

Databricks

Databricks Cluster Configuration

Before creating a workspace, the cluster must send its configuration to Immuta; to do this, run a simple query on the cluster (i.e., show tables). Otherwise, an error message will occur when you attempt to create a workspace.

This workspace allows native access to data on cluster without having to go through the Immuta SparkSession or Immuta Query Engine.

Accessing Data

Users will only be able to access the directory and database created for the workspace when acting under the project. The Immuta Spark SQL Session will apply policies to the data, so any data written to the workspace will already be compliant with the restrictions of the equalized project, where all members see data at the same level of access. When users are ready to write data back to Immuta, they should use the SparkSQL session to copy data into the workspace.

When acting in the workspace project, users can read data using calls like spark.read.parquet("immuta:///some/path/to/a/workspace").

To write delta lake data to a workspace and then expose that delta table as a data source in Immuta, you must specify a table when creating the derived data source (rather than a directory) in the workspace for the data source.

Workspace Configuration Options

  • AWS S3
  • Microsoft Azure
  • Google Cloud Platform

For more details, see Native Databricks Workspaces.

For installation instructions, see the App Settings tutorial.

EMR

This workspace allows native access to data on cluster without having to go through the Immuta SparkSession or Immuta Query Engine.

Accessing Data

Users will only be able to access the directory and database created for the workspace when acting under the project. The Immuta Spark SQL Session will apply policies to the data, so any data written to the workspace will already be compliant with the restrictions of the equalized project, where all members see data at the same level of access. When users are ready to write data back to Immuta, they should use the SparkSQL session to copy data into the workspace.

Workspace Configuration Options

  • EMR HDFS
  • EMR S3

Available Data Source Types

  • Apache HDFS (EMR HDFS)
  • Amazon S3 (EMR S3)

For more details, see Native Hadoop Workspaces.

For installation instructions, see Native Workspace Configuration for EMR.

Snowflake

This workspace allows native access to data within Snowflake.

Accessing Data

Users will only be able to access the data sources in the workspace when acting under the project, represented as a Snowflake Session Context. Immuta will apply policies to the data and render them as secure views within a project-specific schema. Any data written to the workspace will already be compliant with the restrictions of the equalizated project, where all members see data at the same level of access.

Derived Data

As users write derived data within the workspace, it inherits all appropriate policies. That data can then be shared outside the project as a derived data source. Additionally, derived data sources use the credentials of the Immuta system Snowflake account, which will allow them to persist after a workspace is disconnected.

For more details, see Snowflake Workspaces.

For installation instructions, see the Snowflake integration guide.

Disabling Immuta Native Workspaces

Workspaces can be temporarily disconnected by disabling the project.

Disable Workspace

Alternatively, workspaces can be permanently deleted using one of these methods:

  • permanently deleted, while the data used by derived data sources is preserved. Note: If you created a derived data source that references a view on top of a table in Snowflake that isn't a derived data source, that table will be deleted and break the derived data source.
  • permanently deleted with all data purged.

    Delete Workspace