Securing Sensitive Data: A Technical Guide to Encrypting Folders with Python

A Practical Approach to Encrypting Sensitive Data Using Python and pypyr

Securing Sensitive Data: A Technical Guide to Encrypting Folders with Python

Data security is now a crucial concern for businesses, software, and individuals due to the increasing storage of sensitive information in digital form. This has led to a significant rise in the risk of data breaches and theft. GitHub is a popular platform for version control and collaboration, but storing unprotected sensitive data on it can have serious security consequences.

One effective solution to mitigate these risks is to encrypt folders containing sensitive data before uploading them to GitHub. Encryption involves transforming data into a secure format that can only be accessed with a specific key. By encrypting folders, you can ensure that unauthorized access to the data contained within them will be useless because it will be unreadable.

In this article, you will learn how to use Python and pypyr, a simple yet robust automation tool, to encrypt folders. You will be guided through setting up your environment and writing Python scripts and pypyr pipelines for encryption and decryption. You will also learn how to automate securing your sensitive data, including integrating the pipelines into your workflow, such as writing a Dockerfile. By the end of this article, you will have the knowledge and tools necessary to protect your data and guarantee its security.

Background

Encryption is the act of transforming plain text or data into a coded or ciphered form that can be read only by authorized individuals or entities. This technique is utilized to secure sensitive information like passwords, financial data, and personal information from being accessed, stolen, or intercepted by unauthorized individuals. Encryption requires the use of an algorithm and a key to convert the original data into an unreadable format. The encrypted data can only be converted back into its original form using the appropriate key.

Objectives

The objectives of this article are:

  • To provide an overview of encryption and how it can help protect sensitive data.

  • To guide readers on how to encrypt and decrypt folders using Python and pypyr.

  • To demonstrate the benefits of automating the encryption and decryption process with pypyr.

  • To offer best practices and tips for ensuring data security.

Requirements

To encrypt and decrypt folders with Python, you'll need several tools and modules. These include:

  • Python 3.x: The latest version of Python is recommended.

  • pypyr: A simple yet powerful automation tool that allows you to create and run pipelines to automate tasks.

  • cryptography: A Python library for encrypting and decrypting data using various encryption algorithms.

  • A text editor or IDE: Any text editor or integrated development environment (IDE) that supports Python development can be used.

Using pypyr is a good choice for automating the encryption and decryption process for several reasons. Firstly, pypyr is easy to install and use, even for those with little knowledge of Python, as the pipelines are written in YAML format. Secondly, pypyr allows you to create reusable pipelines, making it easy to automate repetitive tasks. Lastly, pypyr integrates well with other tools and platforms, making it a versatile choice for automation tasks.

Setting Up the Environment

To start, ensure that you have Python 3.x installed on your computer. You can acquire the most recent version of Python by visiting the official website.

Afterward, open a bash terminal or PowerShell and navigate to your work directory and run the following commands.

Create and Activate a virtual environment;

$ python -m venv venv
$ source venv/bin/activate

Next, you need to install the required libraries. run;

$ pip install pypyr 
$ pip install cryptography

This command installs pypyr and cryptography, which you'll use for writing pipelines and encrypting and decrypting folders.

Lastly, you'll need a secret key. Using the cryptography module, generate a secret key that'd be used to encrypt and decrypt folders. Folders can only the decrypted using the key used to encrypt them.

# command to generate secret key
$ python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())

Be sure to store the secret key that was generated in a secure location, as it will be needed later in the script.

Keep in mind that whoever has access to the key can decrypt the folder and view its contents, which is a situation you want to avoid.

Now that the environment is configured, you can proceed with encrypting and decrypting folders.

The Script

In this section, you'll focus on writing the Python script to encrypt and decrypt folders/directories. you'll use this script in the pypyr pipeline. The script will contain two distinct functions/methods.

  • The encrypt_dir function to encrypt folder/dir using the secret key.

  • The decrypt_dir function to decrypt folder/dir using the secret key.

To start, create a Python file in your current work directory. You can name it whatever you want. I'll name mine crypt.py

Create a crypt.py file in your current work directory.

For a better understanding, let's populate the file contents incrementally.

Firstly, import the required libraries;

import os
import pathlib
from cryptography.fernet import Fernet

To utilize operating system-dependent features in a portable manner, the os module can be used while the pathlib module has classes for representing filesystem paths with appropriate semantics for different operating systems.

Secondly, include the secret key in the script as follows:

secret_key:str = "0TY8Cdx3qAQGk3z6c9PxtQKgoUx3WZWDDaFpG-RJBT0="

This key will be employed for encrypting and decrypting folders.

However, it is not considered good software practice to embed confidential data into your code. In this case, we have done so to avoid unnecessary complexity. It is recommended to keep your secrets in a .env file and load them into your program during runtime.

Thirdly, the function to perform encryption on a folder;

 def encrypt_dir(input_dir, output_dir):
    # get key
    key = secret_key

    # create Fernet object with key
    fer = Fernet(key)

    # folder to be encrypted
    input_dir = pathlib.Path(input_dir)
    # encrypted folder
    output_dir = pathlib.Path(output_dir)
    # create output dir if does not exist
    output_dir.mkdir(exist_ok=True, parents=True)

    # iterate over input dir and encrypt content
    for path in input_dir.glob("*"):
        _path_bytes = path.read_bytes()
        data = fer.encrypt(_path_bytes)
        rel_path = path.relative_to(input_dir)
        dest_path = output_dir / rel_path
        #write encrypted data to ouput dir
        dest_path.write_bytes(data)

The function encrypt_dir(input_dir, output_dir) performs the following tasks:

  1. It accepts two arguments as input, input_dir and output_dir, which represents the input and output directories, respectively.

  2. It retrieves the key from a global variable, secret_key.

  3. It creates a Fernet object using the key and converts the input and output directories into pathlib.Path objects.

  4. It creates the output directory if it does not already exist.

  5. It iterates over the files in the input directory using the glob method and encrypts the contents of each file using the Fernet object.

  6. It writes the encrypted data to the output directory with the same relative file paths as in the input directory.

To sum up, this function encrypts the data in all the files present in a directory using the Fernet symmetric encryption algorithm and saves the encrypted data to another directory with the same file paths as the input directory.

Lastly, the function to perform decryption on a folder;

def decrypt_dir(input_dir, output_dir):
    key = secret_key

    fer = Fernet(key)

    input_dir = pathlib.Path(input_dir)
    output_dir = pathlib.Path(output_dir)
    output_dir.mkdir(exist_ok=True, parents=True)

    # iterate over input dir and decrypt content
    for path in input_dir.glob("*"):
        _path_bytes = path.read_bytes()
        data = fer.decrypt(_path_bytes)
        rel_path = path.relative_to(input_dir)
        dest_path = output_dir / rel_path
        #write decrypted data to ouput dir
        dest_path.write_bytes(data)

Overall, this function takes an input directory containing encrypted files, decrypts them using the Fernet encryption algorithm, and writes the decrypted files to an output directory.

Now you have a script that uses the cryptography library to encrypt and decrypt folders. You can make use of it as it is in your work, but imagine a scenario of encrypting and decrypting multiple folders or writing a Docker file, this would involve having to copy the script, import the script, call the required function, and direct the output. You can rather write a simple pipeline to automate the process.

The Pipeline

A pipeline is a set of instructions that automate a series of tasks or steps. In this case, the pipeline automates the process of encrypting and decrypting a folder.

pypyr is a powerful tool that allows you to define and execute pipelines easily. With this, we can write our encryption and decryption logic as reusable steps and then chain them together to form a pipeline. By doing this, we can simplify the encryption and decryption process and make it more manageable.

Encryption Pipeline

Create an encrypt.yaml file in your work directory and add the following code;

steps:
  - name: pypyr.steps.pyimport
    in:
      pyImport: |
        import crypt
  - name: pypyr.steps.set
    in:
      set:
        toEncrypt:
          - input_dir: <folder to be encrypted>
            output_dir: <output directory>
          - input_dir: <folder to be encrypted>
            output_dir: <output directory>
  - name: pypyr.steps.py
    run: !py crypt.encrypt_dir(i["input_dir"], i["output_dir"])
    foreach: "{toEncrypt}"

The pipeline has three steps:

  1. import step: This step imports the crypt module which contains the encryption function.

  2. set argument step: This step sets a variable called toEncrypt to a list of dictionaries. Each dictionary in the list represents a folder that needs to be encrypted. The dictionary contains two keys, input_dir and output_dir, which specifies the input and output directories path for each folder.

  3. py call step: This step runs the encrypt_dir() function from the crypt module on each folder in the toEncrypt list. The foreach parameter tells pypyr to run this step once for each dictionary in the toEncrypt list, passing the input_dir and output_dir values from the dictionary as arguments to the encrypt_dir() function.

In summary, the pipeline imports the crypt module, sets a list of folders to be encrypted and then runs the encrypt_dir() function on each folder

Decryption Pipeline

Create a decrypt.yaml file in your work directory and add the following code;

steps:
  - name: pypyr.steps.pyimport
    in:
      pyImport: |
        import crypt
  - name: pypyr.steps.set
    in:
      set:
        toDecrypt:
          - input_dir: <folder to be decrypted>
            output_dir: <output directory>
          - input_dir: <folder to be decrypted>
            output_dir: <output directory>
  - name: pypyr.steps.py
    run: !py crypt.decrypt_dir(i["secured_dir"], i["output_dir"])
    foreach: "{toDecrypt}"

The decrypt pipeline follows the same steps as the encrypt pipeline. Only that in this case it's decrypting the folder.

In summary, the pipeline imports the crypt module, sets a list of folders to be decrypted, and then runs the decrypt_dir() function on each folder

Now you have a simple and maintainable pipeline you can use in your work.

Run the Pipeline

To run the pipeline, add the following line to the file or script you intend to use or execute.

python -m pypyr encrypt #to encrypt
python -m pypyr decrypt #to decrypt

E.G, in the case of Dockerfile;

You can also run the following command from the CLI;

$ python -m pypyr encrypt #to encrypt
$ python -m pypyr decrypt #to decrypt

You can also override variables set in the pipeline when running the command;

python -m pypyr encrypt input-dir output-dir

However and where ever you happen to execute the pipeline command, make sure it's been executed in your work directory.

Conclusion

Safeguarding data is of utmost importance in today's digital era, and it is crucial to take the necessary measures to protect confidential information. Using encryption as a means of securing data is a simple yet effective way to do so.

Python and pypyr make the automation of the encryption and decryption process uncomplicated. By following the step-by-step instructions presented in this article, you can easily encrypt and decrypt folders, guaranteeing the confidentiality and security of your sensitive data.

Remember to always keep your encryption key secure and change it frequently for maximum protection. Additionally, keep your sensitive data in encrypted form, even during development, and refrain from sharing unencrypted data with unauthorized individuals.

By adhering to these best practices and employing the techniques outlined in this article, you can ensure the security of your confidential data on GitHub and other platforms.

Source Code: https://github.com/princewilling/folder-encryption-python

References and Further Reading