Best Practices for Deep Learning and Model Development in Python

cursor

Intermediate

This rule outlines essential principles and practices for deep learning, focusing on model development using Python libraries like PyTorch, Transformers, and Gradio.

Installation Instructions

Save this file in .cursor/rules directory

Rule Content

# Key Principles
- Provide concise, technical responses with accurate Python examples.
- Emphasize clarity, efficiency, and best practices in deep learning workflows.
- Use OOP for model architectures and functional programming for data processing.
- Ensure proper GPU utilization and mixed precision when applicable.
- Choose descriptive variable names reflecting their components.
- Adhere to PEP 8 style guidelines.

# Deep Learning and Model Development
- Use PyTorch as the primary framework.
- Create custom `nn.Module` classes for models.
- Leverage PyTorch's autograd for differentiation.
- Apply proper weight initialization and normalization techniques.
- Select appropriate loss functions and optimizers.

# Transformers and LLMs
- Utilize the Transformers library for pre-trained models and tokenizers.
- Correctly implement attention mechanisms and positional encodings.
- Use fine-tuning techniques like LoRA or P-tuning when suitable.
- Ensure proper tokenization and sequence handling for text data.

# Diffusion Models
- Implement diffusion models using the Diffusers library.
- Understand forward and reverse diffusion processes.
- Use appropriate noise schedulers and sampling methods.
- Implement various pipelines like `StableDiffusionPipeline` and `StableDiffusionXLPipeline`.

# Model Training and Evaluation
- Employ efficient data loading with PyTorch's `DataLoader`.
- Maintain proper train/validation/test splits and cross-validation.
- Implement early stopping and learning rate scheduling.
- Choose suitable evaluation metrics for tasks.
- Handle gradient clipping and NaN/Inf values appropriately.

# Gradio Integration
- Create interactive demos with Gradio for inference and visualization.
- Design user-friendly interfaces to showcase model capabilities.
- Ensure error handling and input validation in Gradio apps.

# Error Handling and Debugging
- Use try-except blocks for error-prone operations, especially in data loading and inference.
- Implement logging for training progress and errors.
- Utilize PyTorch's debugging tools like `autograd.detect_anomaly()` when necessary.

# Performance Optimization
- Use `DataParallel` or `DistributedDataParallel` for multi-GPU training.
- Implement gradient accumulation for large batch sizes.
- Apply mixed precision training with `torch.cuda.amp` when appropriate.
- Profile code to identify and optimize bottlenecks, particularly in data loading.

# Dependencies
- `torch`, `transformers`, `diffusers`, `gradio`, `numpy`, `tqdm`, `tensorboard` or `wandb`.

# Key Conventions
1. Start with a clear problem definition and dataset analysis.
2. Organize code modularly with separate files for models, data, training, and evaluation.
3. Use configuration files (e.g., YAML) for hyperparameters.
4. Implement experiment tracking and model checkpointing.
5. Use version control (e.g., git) for tracking code changes.

Refer to the official documentation of PyTorch, Transformers, Diffusers, and Gradio for best practices and current APIs.

Installation Instructions

Rule Content

Tags