Data Analysis with NumPy

Learn how to use the NumPy library to perform basic image processing, converting images into numerical arrays and applying filters.

Our Project: Basic Image Analysis Tool

We will create a script that loads an image, analyzes it to extract basic information (dimensions, average color values), and applies simple filters like grayscale conversion and color inversion.

Core Technologies We'll Use:

Python

NumPy

Pillow

Matplotlib

Step 1 / 6

Step 1: Environment Setup and Library Installation

We prepare our workspace by creating a virtual environment and installing the necessary libraries: NumPy, Pillow, and Matplotlib.

For this project, we will need three main libraries. Let's set up our environment correctly.

1. Create Folder and Virtual Environment

Open your terminal and run the following commands:

# Create project folder
mkdir numpy_image_analyzer
cd numpy_image_analyzer

# Create and activate virtual environment
python -m venv venv
# For Windows: .\venv\Scripts\activate
# For macOS/Linux: source venv/bin/activate

2. Install Libraries

With the virtual environment activated, we install the libraries using pip. Pillow is the modern version of the PIL (Python Imaging Library).

pip install numpy Pillow matplotlib

3. Create File and Prepare Image

Create an empty Python file named image_analyzer.py. Then, find any image in JPEG or PNG format and save it in the same folder (numpy_image_analyzer) with a simple name, e.g., my_image.png. This is the image we will analyze.

Step 2 / 6

Step 2: Loading an Image and Converting to a NumPy Array

We write the code to open an image file and convert it into a three-dimensional NumPy array, which is the basis for any further processing.

The core of image processing with NumPy is representing the image as a numerical array. A color image is essentially a three-dimensional array: (height x width x color_channels). The color channels are usually 3 (Red, Green, Blue - RGB).

We will use the Pillow library (imported as PIL) to open the image file and the np.array() function to convert it into an array.


# image_analyzer.py
import numpy as np
from PIL import Image

def analyze_image(image_path):
    """
    Loads an image from the path and converts it to a NumPy array.
    """
    try:
        with Image.open(image_path) as img:
            img_array = np.array(img)
            print("Image loaded and converted to NumPy array successfully!")
            return img_array
    except FileNotFoundError:
        print(f"Error: The file '{image_path}' was not found.")
        return None

if __name__ == '__main__':
    # We will create an image programmatically if it doesn't exist
    try:
        with open('my_image.png', 'rb') as f:
            pass
        image_file = 'my_image.png'
    except FileNotFoundError:
        print("Creating an image 'my_image.png' for testing...")
        img = Image.new('RGB', (60, 30), color = 'red')
        img.save('my_image.png')
        image_file = 'my_image.png'
    
    image_data = analyze_image(image_file)
    
    if image_data is not None:
        print(f"Array dimensions: {image_data.shape}")
        print(f"Data type: {image_data.dtype}")

Step 3 / 6

Step 3: Extracting Basic Information

We leverage NumPy's properties and functions to extract and print useful information about the image, such as its dimensions and average color values.

Now that we have the image as a NumPy array, we can use its powerful functions to analyze it. We will extend the analyze_image function to calculate and print some interesting statistics.


# image_analyzer.py (Updated)
import numpy as np
from PIL import Image

def analyze_image(image_path):
    try:
        with Image.open(image_path) as img:
            img_array = np.array(img)
    except FileNotFoundError:
        print(f"Error: The file '{image_path}' was not found.")
        return None

    print("--- Basic Image Analysis ---")
    print(f"Image dimensions (Height, Width, Channels): {img_array.shape}")
    print(f"Data type: {img_array.dtype}")
    print(f"Minimum pixel value: {img_array.min()}")
    print(f"Maximum pixel value: {img_array.max()}")
    
    # The axis=(0, 1) tells mean to calculate the average along the height and width axes
    mean_colors = np.mean(img_array, axis=(0, 1))
    print(f"Average color values (R, G, B): {mean_colors.astype(int)}")
    
    return img_array

# (The if __name__ ... block remains the same)

Step 4 / 6

Step 4: Applying Filters - Grayscale

We implement our first filter, converting the color image to grayscale using numerical operations on the NumPy array.

Converting to grayscale is a classic filter. A simple way to achieve this is to calculate the average of the three channel values (R, G, B) for each pixel. The result will be a two-dimensional array.


# Inside the analyze_image function, after printing the statistics...

# The mean(axis=2) calculates the average along the third axis (the color channels)
grayscale_array = img_array.mean(axis=2).astype(np.uint8)

print("\nGrayscale array created.")
print(f"New dimensions: {grayscale_array.shape}")

return img_array, grayscale_array # We now return both arrays

Tip!

There are more accurate mathematical formulas for grayscale conversion that give different weights to each channel (e.g., 0.299*R + 0.587*G + 0.114*B). The average is a good and simple approach.

Step 5 / 6

Step 5: Applying More Filters

We create additional filters, such as color inversion (invert) and brightness increase, by leveraging NumPy's vectorized operations.

Let's add two more classic filters. The beauty of NumPy is that these complex operations can be expressed in a single line of code, thanks to broadcasting and vectorized operations.

Invert: We subtract the value of each pixel from the maximum possible value (255).
Brightness: We add a constant value to each pixel, using the np.clip function to ensure that the values do not exceed the valid range [0, 255].


# Inside the analyze_image function...

inverted_array = 255 - img_array

brightness_increase = 50
# We need to temporarily convert to int for the addition to avoid overflow
bright_array = np.clip(img_array.astype(int) + brightness_increase, 0, 255).astype(np.uint8)

print("Invert and brightness arrays created.")

# Return all the arrays
return img_array, grayscale_array, inverted_array, bright_array

Step 6 / 6

Step 6: Visualizing Results with Matplotlib

We use the Matplotlib library to display the original image and its processed versions in a grid for easy comparison.

After creating all the new arrays, we need to visualize them. Matplotlib is the ideal tool for this. We will use the plt.subplots(2, 2) function to create a 2x2 grid. In each area, we will display one of our images with imshow(). It is important to pass the cmap='gray' argument to imshow for the grayscale image so that it is displayed correctly in shades of gray.


# image_analyzer.py (Final version)
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

# ... define analyze_image function as before ...

# ... inside the __main__ block ...
if __name__ == '__main__':
    # ... (code to create test image) ...
    results = analyze_image(image_file)
    
    if results:
        original, grayscale, inverted, bright = results
        
        # Create a 2x2 grid for the images
        fig, axes = plt.subplots(2, 2, figsize=(10, 10))
        
        axes[0, 0].imshow(original)
        axes[0, 0].set_title("Original Image")
        axes[0, 0].axis('off') # Hide the axes

        axes[0, 1].imshow(grayscale, cmap='gray')
        axes[0, 1].set_title("Grayscale")
        axes[0, 1].axis('off')

        axes[1, 0].imshow(inverted_array)
        axes[1, 0].set_title("Inverted Colors")
        axes[1, 0].axis('off')

        axes[1, 1].imshow(bright_array)
        axes[1, 1].set_title("Increased Brightness")
        axes[1, 1].axis('off')

        plt.tight_layout() # Improves appearance
        plt.show() # Displays the window with the plots

Project Completion & Next Steps

Congratulations! You have completed the path and now have the full code for the project.

This is the final, complete code for the application. You can copy it, run it locally on your computer (after installing the necessary libraries with `pip`), and experiment by adding your own features!


# image_analyzer.py
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

def analyze_image(image_path):
    """
    Φορτώνει μια εικόνα, την αναλύει και εφαρμόζει φίλτρα.
    """
    try:
        # Φόρτωση εικόνας και μετατροπή σε NumPy array
        with Image.open(image_path) as img:
            img_array = np.array(img)
    except FileNotFoundError:
        print(f"Σφάλμα: Το αρχείο '{image_path}' δεν βρέθηκε.")
        return

    print("--- Βασική Ανάλυση Εικόνας ---")
    print(f"Διαστάσεις εικόνας (Ύψος, Πλάτος, Κανάλια): {img_array.shape}")
    print(f"Τύπος δεδομένων: {img_array.dtype}")
    print(f"Ελάχιστη τιμή pixel: {img_array.min()}")
    print(f"Μέγιστη τιμή pixel: {img_array.max()}")
    
    # Υπολογισμός μέσης τιμής για κάθε χρωματικό κανάλι (R, G, B)
    mean_colors = np.mean(img_array, axis=(0, 1))
    print(f"Μέσες τιμές χρωμάτων (R, G, B): {mean_colors.astype(int)}")

    # --- Εφαρμογή Φίλτρων ---
    grayscale_array = img_array.mean(axis=2).astype(np.uint8)
    inverted_array = 255 - img_array
    brightness_increase = 50
    bright_array = np.clip(img_array.astype(int) + brightness_increase, 0, 255).astype(np.uint8)

    # --- Εμφάνιση Αποτελεσμάτων με Matplotlib ---
    fig, axes = plt.subplots(2, 2, figsize=(10, 10))
    
    axes[0, 0].imshow(img_array)
    axes[0, 0].set_title("Αρχική Εικόνα")
    axes[0, 0].axis('off')

    axes[0, 1].imshow(grayscale_array, cmap='gray')
    axes[0, 1].set_title("Κλίμακα του Γκρι")
    axes[0, 1].axis('off')

    axes[1, 0].imshow(inverted_array)
    axes[1, 0].set_title("Αναστροφή Χρωμάτων")
    axes[1, 0].axis('off')

    axes[1, 1].imshow(bright_array)
    axes[1, 1].set_title("Αυξημένη Φωτεινότητα")
    axes[1, 1].axis('off')

    plt.tight_layout()
    plt.show()

if __name__ == '__main__':
    try:
        Image.new('RGB', (100, 100), color = 'red').save('test_image.png')
        print("\nΔημιουργήθηκε η 'test_image.png' για επίδειξη.")
        image_to_process = 'test_image.png'
    except Exception:
        image_to_process = 'test_image.png' 

    analyze_image(image_to_process)