🟡 Creating AI-Generated Art with Style Transfer

stemaway · November 6, 2024, 4:36pm

Creating AI-Generated Art with Style Transfer

Objective

Develop an application that generates artistic images by applying the style of one image to the content of another using Neural Style Transfer. This project focuses on utilizing existing models to create AI-generated art, exploring the creative applications of generative AI without building models from scratch.

Learning Outcomes

By completing this project, you will:

Understand the concept of Neural Style Transfer.
Learn to use pre-trained models for image processing tasks.
Develop skills in image manipulation and visualization.
Build an application or tool that allows users to generate art.
Explore the intersection of AI and creativity.

Prerequisites and Theoretical Foundations

1. Basic Python Programming

Libraries: Familiarity with PyTorch or TensorFlow.
Image Processing: Understanding of image arrays and transformations.

2. Understanding of Convolutional Neural Networks (CNNs)

Pre-trained Models: Knowledge of models like VGG19.
Feature Extraction: Using CNNs to extract content and style representations.

3. Experience with Jupyter Notebooks or IDEs

Visualization: Ability to display images and results within notebooks.

Tools Required

Programming Language: Python 3.7+
Libraries:
- PyTorch: (pip install torch torchvision)
- Matplotlib: For visualization (pip install matplotlib)
- Pillow: For image handling (pip install pillow)
Environment:
- Jupyter Notebook or any Python IDE.

Project Structure

neural_style_transfer/
│
├── style_transfer.py
├── content_image.jpg
├── style_image.jpg
└── output/
    └── generated_image.jpg

Steps and Tasks

1. Setting Up the Environment

Tasks:

Install Required Libraries.
Prepare Content and Style Images.

Implementation:

# Imports
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import torchvision.models as models
from PIL import Image
import matplotlib.pyplot as plt

2. Loading and Preprocessing Images

Tasks:

Define Image Transformations.
Load Images and Apply Transforms.
Ensure Images are the Same Size.

Implementation:

# Image loading function
def load_image(img_path, max_size=400):
    image = Image.open(img_path).convert('RGB')
    # Resize image
    size = max(image.size)
    if size > max_size:
        size = max_size
    transform = transforms.Compose([
        transforms.Resize(size),
        transforms.ToTensor(),
    ])
    image = transform(image).unsqueeze(0)
    return image

content_img = load_image('content_image.jpg')
style_img = load_image('style_image.jpg')

3. Defining the Model and Layers

Tasks:

Use a Pre-trained CNN (e.g., VGG19).
Identify Layers for Content and Style Extraction.
Build the Model for Style Transfer.

Implementation:

# Load pre-trained VGG19 model
vgg = models.vgg19(pretrained=True).features

# Freeze model parameters
for param in vgg.parameters():
    param.requires_grad_(False)

# Move model to device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
vgg.to(device)

# Define content and style layers
content_layers = ['conv4_2']
style_layers = ['conv1_1', 'conv2_1', 'conv3_1', 'conv4_1', 'conv5_1']

4. Extracting Features

Tasks:

Define Functions to Get Content and Style Features.
Compute Gram Matrix for Style Representation.

Implementation:

def get_features(image, model, layers):
    features = {}
    x = image.to(device)
    for name, layer in model._modules.items():
        x = layer(x)
        layer_name = f'conv{name}'
        if layer_name in layers:
            features[layer_name] = x
    return features

def gram_matrix(tensor):
    _, d, h, w = tensor.size()
    tensor = tensor.view(d, h * w)
    gram = torch.mm(tensor, tensor.t())
    return gram

5. Defining Loss Functions and Optimizer

Tasks:

Compute Content Loss.
Compute Style Loss.
Set Up Optimizer for the Generated Image.

Implementation:

# Get features
content_features = get_features(content_img, vgg, content_layers + style_layers)
style_features = get_features(style_img, vgg, content_layers + style_layers)

# Calculate gram matrices for style features
style_grams = {layer: gram_matrix(style_features[layer]) for layer in style_layers}

# Create a target image
target = content_img.clone().requires_grad_(True).to(device)

# Define optimizer
optimizer = optim.Adam([target], lr=0.003)

6. Running the Style Transfer

Tasks:

Iteratively Update the Target Image.
Compute Total Loss (content + style).
Display and Save the Output Image.

Implementation:

style_weights = {'conv1_1': 1.0, 'conv2_1': 0.75,
                 'conv3_1': 0.5, 'conv4_1': 0.25,
                 'conv5_1': 0.1}
content_weight = 1  # alpha
style_weight = 1e6  # beta

for epoch in range(1, 1001):
    target_features = get_features(target, vgg, content_layers + style_layers)
    content_loss = torch.mean((target_features['conv4_2'] - content_features['conv4_2']) ** 2)
    style_loss = 0
    for layer in style_layers:
        target_gram = gram_matrix(target_features[layer])
        style_gram = style_grams[layer]
        layer_style_loss = style_weights[layer] * torch.mean((target_gram - style_gram) ** 2)
        style_loss += layer_style_loss
    total_loss = content_weight * content_loss + style_weight * style_loss
    optimizer.zero_grad()
    total_loss.backward()
    optimizer.step()
    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Total Loss: {total_loss.item()}")

7. Displaying and Saving the Result

Tasks:

Convert Tensor to Image.
Display Using Matplotlib.
Save the Generated Image.

Implementation:

# Display image
def im_convert(tensor):
    image = tensor.to('cpu').clone().detach()
    image = image.squeeze(0)
    image = transforms.ToPILImage()(image)
    return image

final_image = im_convert(target)
final_image.save('output/generated_image.jpg')
plt.imshow(final_image)
plt.show()

Further Enhancements

Build a User Interface:
- Create a web app using Streamlit or Gradio for users to upload images.
Experiment with Different Models:
- Use alternative architectures or pre-trained models for different styles.
Optimize Performance:
- Implement techniques to reduce computation time.
Explore Real-Time Style Transfer:
- Investigate methods for faster inference suitable for video processing.

Conclusion

In this project, you have:

Applied neural style transfer to generate artistic images.
Used pre-trained models to leverage existing technology.
Developed an application that bridges AI and creativity.
Explored the practical usage of generative AI without extensive model training.

This project showcases how generative AI technologies can be used in creative applications, focusing on usage and user experience.