Information
By default, data exchanged between worker nodes in an Azure Databricks cluster is not encrypted. To ensure that data is encrypted at all times, whether at rest or in transit, you can create an initialization script that configures your clusters to encrypt traffic between worker nodes using AES 256-bit encryption over a TLS 1.3 connection.
- Protects sensitive data during transit between cluster nodes, mitigating risks of data interception or unauthorized access.
- Aligns with organizational security policies and compliance requirements that mandate encryption of data in transit.
- Enhances overall security posture by ensuring that all inter-node communications within the cluster are encrypted.
NOTE: Nessus has not performed this check. Please review the benchmark to ensure target compliance.
Solution
Create a JKS keystore:
- Generate a Java KeyStore (JKS) file that will be used for SSL/TLS encryption.
- Upload the keystore file to a secure directory in DBFS (e.g. /dbfs//jetty_ssl_driver_keystore.jks).
Develop an init script:
<xhtml:ol start="3"> - Create an init script that performs the following tasks:
- Retrieves the JKS keystore file and password.
- Derives a shared encryption secret from the keystore.
- Configures Spark driver and executor settings to enable encryption.
- Example init script: #!/bin/bashset -euo pipefailkeystore_dbfs_file="/dbfs/<keystore-directory>/jetty_ssl_driver_keystore.jks"max_attempts=30while [ ! -f ${keystore_dbfs_file} ]; do if [ "$max_attempts" == 0 ]; then echo "ERROR: Unable to find the file : $keystore_dbfs_file. Failing the script." exit 1 fi sleep 2s ((max_attempts--))donesasl_secret=$(sha256sum $keystore_dbfs_file | cut -d' ' -f1)if [ -z "${sasl_secret}" ]; then echo "ERROR: Unable to derive the secret. Failing the script." exit 1filocal_keystore_file="$DB_HOME/keys/jetty_ssl_driver_keystore.jks"local_keystore_password="gb1gQqZ9ZIHS"if [[ $DB_IS_DRIVER = "TRUE" ]]; then driver_conf=${DB_HOME}/driver/conf/spark-branch.conf echo "Configuring driver conf at $driver_conf" if [ ! -e $driver_conf ]; then echo "spark.authenticate true" >> $driver_conf echo "spark.authenticate.secret $sasl_secret" >> $driver_conf echo "spark.authenticate.enableSaslEncryption true" >> $driver_conf echo "spark.network.crypto.enabled true" >> $driver_conf echo "spark.network.crypto.keyLength 256" >> $driver_conf echo "spark.network.crypto.keyFactoryAlgorithm PBKDF2WithHmacSHA1" >> $driver_conf echo "spark.io.encryption.enabled true" >> $driver_conf echo "spark.ssl.enabled true" >> $driver_conf echo "spark.ssl.keyPassword $local_keystore_password" >> $driver_conf echo "spark.ssl.keyStore $local_keystore_file" >> $driver_conf echo "spark.ssl.keyStorePassword $local_keystore_password" >> $driver_conf echo "spark.ssl.protocol TLSv1.3" >> $driver_conf fifiexecutor_conf=${DB_HOME}/conf/spark.executor.extraJavaOptionsecho "Configuring executor conf at $executor_conf"if [ ! -e $executor_conf ]; then echo "-Dspark.authenticate=true" >> $executor_conf echo "-Dspark.authenticate.secret=$sasl_secret" >> $executor_conf echo "-Dspark.authenticate.enableSaslEncryption=true" >> $executor_conf echo "-Dspark.network.crypto.enabled=true" >> $executor_conf echo "-Dspark.network.crypto.keyLength=256" >> $executor_conf echo "-Dspark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1" >> $executor_conf echo "-Dspark.io.encryption.enabled=true" >> $executor_conf echo "-Dspark.ssl.enabled=true" >> $executor_conf echo "-Dspark.ssl.keyPassword=$local_keystore_password" >> $executor_conf echo "-Dspark.ssl.keyStore=$local_keystore_file" >> $executor_conf echo "-Dspark.ssl.keyStorePassword=$local_keystore_password" >> $executor_conf echo "-Dspark.ssl.protocol=TLSv1.3" >> $executor_conffi
- Save.
Impact:
- Enabling encryption may introduce a performance penalty due to the computational overhead associated with encrypting and decrypting traffic. This can result in longer query execution times, especially for data-intensive operations.
- Implementing encryption requires creating and managing init scripts, which adds complexity to cluster configuration and maintenance.
- The shared encryption secret is derived from the hash of the keystore stored in DBFS. If the keystore is updated or rotated, all running clusters must be restarted to prevent authentication failures between Spark workers and drivers.