Securing Sensitive Data: A Technical Guide to Encrypting Folders with Python
A Practical Approach to Encrypting Sensitive Data Using Python and pypyr
Data security is now a crucial concern for businesses, software, and individuals due to the increasing storage of sensitive information in digital form. This has led to a significant rise in the risk of data breaches and theft. GitHub is a popular platform for version control and collaboration, but storing unprotected sensitive data on it can have serious security consequences.
One effective solution to mitigate these risks is to encrypt folders containing sensitive data before uploading them to GitHub. Encryption involves transforming data into a secure format that can only be accessed with a specific key. By encrypting folders, you can ensure that unauthorized access to the data contained within them will be useless because it will be unreadable.
In this article, you will learn how to use Python and pypyr
, a simple yet robust automation tool, to encrypt folders. You will be guided through setting up your environment and writing Python scripts and pypyr
pipelines for encryption and decryption. You will also learn how to automate securing your sensitive data, including integrating the pipelines into your workflow, such as writing a Dockerfile. By the end of this article, you will have the knowledge and tools necessary to protect your data and guarantee its security.
Background
Encryption is the act of transforming plain text or data into a coded or ciphered form that can be read only by authorized individuals or entities. This technique is utilized to secure sensitive information like passwords, financial data, and personal information from being accessed, stolen, or intercepted by unauthorized individuals. Encryption requires the use of an algorithm and a key to convert the original data into an unreadable format. The encrypted data can only be converted back into its original form using the appropriate key.
Objectives
The objectives of this article are:
To provide an overview of encryption and how it can help protect sensitive data.
To guide readers on how to encrypt and decrypt folders using Python and
pypyr
.To demonstrate the benefits of automating the encryption and decryption process with
pypyr
.To offer best practices and tips for ensuring data security.
Requirements
To encrypt and decrypt folders with Python, you'll need several tools and modules. These include:
Python 3.x: The latest version of Python is recommended.
pypyr
: A simple yet powerful automation tool that allows you to create and run pipelines to automate tasks.cryptography
: A Python library for encrypting and decrypting data using various encryption algorithms.A text editor or IDE: Any text editor or integrated development environment (IDE) that supports Python development can be used.
Using pypyr
is a good choice for automating the encryption and decryption process for several reasons. Firstly, pypyr
is easy to install and use, even for those with little knowledge of Python, as the pipelines are written in YAML
format. Secondly, pypyr
allows you to create reusable pipelines, making it easy to automate repetitive tasks. Lastly, pypyr
integrates well with other tools and platforms, making it a versatile choice for automation tasks.
Setting Up the Environment
To start, ensure that you have Python 3.x installed on your computer. You can acquire the most recent version of Python by visiting the official website.
Afterward, open a bash terminal or PowerShell and navigate to your work directory and run the following commands.
Create and Activate a virtual environment;
$ python -m venv venv
$ source venv/bin/activate
Next, you need to install the required libraries. run;
$ pip install pypyr
$ pip install cryptography
This command installs pypyr
and cryptography
, which you'll use for writing pipelines and encrypting and decrypting folders.
Lastly, you'll need a secret key. Using the cryptography
module, generate a secret key that'd be used to encrypt and decrypt folders. Folders can only the decrypted using the key used to encrypt them.
# command to generate secret key
$ python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())
Be sure to store the secret key that was generated in a secure location, as it will be needed later in the script.
Keep in mind that whoever has access to the key can decrypt the folder and view its contents, which is a situation you want to avoid.
Now that the environment is configured, you can proceed with encrypting and decrypting folders.
The Script
In this section, you'll focus on writing the Python script to encrypt and decrypt folders/directories. you'll use this script in the pypyr
pipeline. The script will contain two distinct functions/methods.
The
encrypt_dir
function to encrypt folder/dir using the secret key.The
decrypt_dir
function to decrypt folder/dir using the secret key.
To start, create a Python file in your current work directory. You can name it whatever you want. I'll name mine crypt.py
Create a crypt.py
file in your current work directory.
For a better understanding, let's populate the file contents incrementally.
Firstly, import the required libraries;
import os
import pathlib
from cryptography.fernet import Fernet
To utilize operating system-dependent features in a portable manner, the os
module can be used while the pathlib
module has classes for representing filesystem paths with appropriate semantics for different operating systems.
Secondly, include the secret key in the script as follows:
secret_key:str = "0TY8Cdx3qAQGk3z6c9PxtQKgoUx3WZWDDaFpG-RJBT0="
This key will be employed for encrypting and decrypting folders.
However, it is not considered good software practice to embed confidential data into your code. In this case, we have done so to avoid unnecessary complexity. It is recommended to keep your secrets in a .env file and load them into your program during runtime.
Thirdly, the function to perform encryption on a folder;
def encrypt_dir(input_dir, output_dir):
# get key
key = secret_key
# create Fernet object with key
fer = Fernet(key)
# folder to be encrypted
input_dir = pathlib.Path(input_dir)
# encrypted folder
output_dir = pathlib.Path(output_dir)
# create output dir if does not exist
output_dir.mkdir(exist_ok=True, parents=True)
# iterate over input dir and encrypt content
for path in input_dir.glob("*"):
_path_bytes = path.read_bytes()
data = fer.encrypt(_path_bytes)
rel_path = path.relative_to(input_dir)
dest_path = output_dir / rel_path
#write encrypted data to ouput dir
dest_path.write_bytes(data)
The function encrypt_dir(input_dir, output_dir)
performs the following tasks:
It accepts two arguments as input,
input_dir
andoutput_dir
, which represents the input and output directories, respectively.It retrieves the key from a global variable,
secret_key
.It creates a
Fernet
object using the key and converts the input and output directories intopathlib.Path
objects.It creates the output directory if it does not already exist.
It iterates over the files in the input directory using the
glob
method and encrypts the contents of each file using theFernet
object.It writes the encrypted data to the output directory with the same relative file paths as in the input directory.
To sum up, this function encrypts the data in all the files present in a directory using the Fernet symmetric encryption algorithm and saves the encrypted data to another directory with the same file paths as the input directory.
Lastly, the function to perform decryption on a folder;
def decrypt_dir(input_dir, output_dir):
key = secret_key
fer = Fernet(key)
input_dir = pathlib.Path(input_dir)
output_dir = pathlib.Path(output_dir)
output_dir.mkdir(exist_ok=True, parents=True)
# iterate over input dir and decrypt content
for path in input_dir.glob("*"):
_path_bytes = path.read_bytes()
data = fer.decrypt(_path_bytes)
rel_path = path.relative_to(input_dir)
dest_path = output_dir / rel_path
#write decrypted data to ouput dir
dest_path.write_bytes(data)
Overall, this function takes an input directory containing encrypted files, decrypts them using the Fernet encryption algorithm, and writes the decrypted files to an output directory.
Now you have a script that uses the cryptography
library to encrypt and decrypt folders. You can make use of it as it is in your work, but imagine a scenario of encrypting and decrypting multiple folders or writing a Docker file, this would involve having to copy the script, import the script, call the required function, and direct the output. You can rather write a simple pipeline to automate the process.
The Pipeline
A pipeline is a set of instructions that automate a series of tasks or steps. In this case, the pipeline automates the process of encrypting and decrypting a folder.
pypyr
is a powerful tool that allows you to define and execute pipelines easily. With this, we can write our encryption and decryption logic as reusable steps and then chain them together to form a pipeline. By doing this, we can simplify the encryption and decryption process and make it more manageable.
Encryption Pipeline
Create an encrypt.yaml
file in your work directory and add the following code;
steps:
- name: pypyr.steps.pyimport
in:
pyImport: |
import crypt
- name: pypyr.steps.set
in:
set:
toEncrypt:
- input_dir: <folder to be encrypted>
output_dir: <output directory>
- input_dir: <folder to be encrypted>
output_dir: <output directory>
- name: pypyr.steps.py
run: !py crypt.encrypt_dir(i["input_dir"], i["output_dir"])
foreach: "{toEncrypt}"
The pipeline has three steps:
import step: This step imports the
crypt
module which contains the encryption function.set argument step: This step sets a variable called
toEncrypt
to a list of dictionaries. Each dictionary in the list represents a folder that needs to be encrypted. The dictionary contains two keys,input_dir
andoutput_dir
, which specifies the input and output directories path for each folder.py call step: This step runs the
encrypt_dir()
function from thecrypt
module on each folder in thetoEncrypt
list. Theforeach
parameter tellspypyr
to run this step once for each dictionary in thetoEncrypt
list, passing theinput_dir
andoutput_dir
values from the dictionary as arguments to theencrypt_dir()
function.
In summary, the pipeline imports the crypt
module, sets a list of folders to be encrypted and then runs the encrypt_dir()
function on each folder
Decryption Pipeline
Create a decrypt.yaml
file in your work directory and add the following code;
steps:
- name: pypyr.steps.pyimport
in:
pyImport: |
import crypt
- name: pypyr.steps.set
in:
set:
toDecrypt:
- input_dir: <folder to be decrypted>
output_dir: <output directory>
- input_dir: <folder to be decrypted>
output_dir: <output directory>
- name: pypyr.steps.py
run: !py crypt.decrypt_dir(i["secured_dir"], i["output_dir"])
foreach: "{toDecrypt}"
The decrypt pipeline follows the same steps as the encrypt pipeline. Only that in this case it's decrypting the folder.
In summary, the pipeline imports the crypt
module, sets a list of folders to be decrypted, and then runs the decrypt_dir()
function on each folder
Now you have a simple and maintainable pipeline you can use in your work.
Run the Pipeline
To run the pipeline, add the following line to the file or script you intend to use or execute.
python -m pypyr encrypt #to encrypt
python -m pypyr decrypt #to decrypt
E.G, in the case of Dockerfile;
You can also run the following command from the CLI;
$ python -m pypyr encrypt #to encrypt
$ python -m pypyr decrypt #to decrypt
You can also override variables set in the pipeline when running the command;
python -m pypyr encrypt input-dir output-dir
However and where ever you happen to execute the pipeline command, make sure it's been executed in your work directory.
Conclusion
Safeguarding data is of utmost importance in today's digital era, and it is crucial to take the necessary measures to protect confidential information. Using encryption as a means of securing data is a simple yet effective way to do so.
Python and pypyr make the automation of the encryption and decryption process uncomplicated. By following the step-by-step instructions presented in this article, you can easily encrypt and decrypt folders, guaranteeing the confidentiality and security of your sensitive data.
Remember to always keep your encryption key secure and change it frequently for maximum protection. Additionally, keep your sensitive data in encrypted form, even during development, and refrain from sharing unencrypted data with unauthorized individuals.
By adhering to these best practices and employing the techniques outlined in this article, you can ensure the security of your confidential data on GitHub and other platforms.
Source Code: https://github.com/princewilling/folder-encryption-python