- File Organizer Documentation
- Testing the File Organizer
- Maintenance Guide for File Organizer
- Decision Record and Implementation Notes
- Python script code
#File Organizer Documentation
A Python utility for organizing files from source directories into categorized target folders.
#Project Overview
#Purpose
This utility solves the problem of disorganized file directories by automatically sorting files based on their types or extensions. It addresses common issues like:
- Mixed file types in download or document folders
- Difficulty finding specific file types
- Managing large collections of files
- Batch organizing files while preserving their content and metadata
#Core Functionality
- Sort files by predefined categories (images, documents, etc.)
- Sort files by their extensions
- Copy or move files from source to destination
- Handle duplicate filenames
- Process hidden files and follow symbolic links if requested
- Track progress during file operations
#Design Principles
- Configurability: Users can customize how files are categorized
- Reliability: Careful handling of edge cases like duplicates and long paths
- Transparency: Clear feedback on what’s happening during operation
- Simplicity: Straightforward command-line interface
#Installation
The utility requires only Python standard library modules and no external dependencies.
- Download the script:
- Make the script executable (Linux/macOS):
#Usage
#Basic Command Structure
#Command Line Options
Option | Description |
---|---|
–source, -s | Source directory to organize files from (required) |
–target, -t | Target directory to organize files into (required) |
–organize-by | Organization method: ‘category’ or ‘extension’ (default: ‘category’) |
–no-timestamp | Disable adding timestamps to duplicate filenames |
–move | Move files instead of copying them |
–config, -c | Path to a JSON configuration file for custom categories |
-i, –include_hidden | Include hidden files and directories |
-l, –follow_links | Follow symbolic links during directory traversal |
-sk, –skip_existing | Skip existing files instead of timestamping |
#Configuration
#Default Categories
The utility uses these default file categories:
Category | File Extensions |
---|---|
images | .jpg, .jpeg, .png, .gif, .bmp, .webp |
documents | .pdf, .docx, .doc, .txt, .rtf, .odt, .xlsx, .xls, .csv, .pptx, .ppt |
videos | .mp4, .avi, .mkv, .mov, .wmv, .flv |
audio | .mp3, .wav, .flac, .aac, .ogg, .m4a |
archives | .zip, .rar, .tar, .gz, .bz2, .7z |
code | .py, .java, .c, .cpp, .h, .html, .css, .js, .xml, .json |
apps | .exe, .msi, .apk, .dmg |
other | Any file extension not listed above |
#Custom Categories
You can define your own categories using a JSON configuration file:
Example custom config file:
#Examples
- Basic organization by category:
- Organize by file extension:
- Move files instead of copying:
- Skip duplicate files instead of timestamping:
- Include hidden files and follow symbolic links:
#Troubleshooting
#Permission Errors
- Ensure you have read permissions for the source directory
- Ensure you have write permissions for the target directory
- On Unix systems, run with sudo for system directories (use with caution)
#Long Paths on Windows
The utility automatically handles long paths (>255 characters) on Windows by prefixing with \\?\
. If you still encounter issues:
- Use shorter directory names
- Move files to a less deeply nested location before organizing
#Performance with Large Directories
- For very large directories (thousands of files), the initial scan may take time
- Consider organizing subdirectories separately if performance is an issue
#Duplicate Files
When a file with the same name exists in the target directory:
- Default behavior: Add timestamp to filename
- With
--skip_existing
: Skip the file - With
--no-timestamp
: Overwrite existing file (use with caution)
#Development Notes
#Design Decisions
- File Operations (Copy vs. Move)
- Copy is the default to prevent accidental data loss
- Move functionality provided for efficiency when source files aren’t needed
- Categorization System
- Default categories cover common file types
- Custom categories supported via JSON for flexibility
- Extension-based organization added for users who prefer that system
- Handling Duplicates
- Timestamp approach preserves both old and new files
- Skip option added for incremental organization tasks
- Error Handling
- Individual file errors don’t halt the entire process
- Errors are reported but the utility continues processing other files
#Error Handling Strategy
The utility employs an “attempt and continue” error handling strategy:
- Each file operation is wrapped in a try/except block
- Errors with individual files are reported but don’t stop the overall process
- This ensures maximum files are processed even if some cause issues
#Security Considerations
- File Operations
- The utility doesn’t attempt to open or read file contents (only metadata)
- No execution of files occurs during organization
- When Using Move Operations
- Be aware that move operations permanently change your file system
- Always verify the target directory before using –move
#Testing
Refer to TESTING.md for detailed testing procedures and scenarios.
#Testing the File Organizer
This document outlines procedures for testing the File Organizer utility to ensure it functions correctly.
#Test Environment Setup
Create a test directory structure with various file types:
#Test Cases
#Test Case 1: Basic Category Organization
Purpose: Verify that files are correctly organized into category folders
Command:
Expected Results:
- Target directory should contain category folders: documents, images, videos, code, archives, other
- Files should be placed in their correct category folders:
- documents: document1.pdf, document2.docx
- images: image1.jpg, image2.png
- videos: video1.mp4
- code: script.py
- archives: archive.zip
- other: noextension
- Hidden files should be skipped (.hidden_file)
- All files should be copied, not moved (source files should still exist)
Verification:
#Test Case 2: Extension-based Organization
Purpose: Verify that files are correctly organized by their extensions
Command:
Expected Results:
- Target directory should contain extension folders: pdf, docx, jpg, png, mp4, py, zip, no_extension
- Files should be placed in their respective extension folders
- Files without extension should be in the no_extension folder
- All files should be copied, not moved
Verification:
#Test Case 3: Move Operation
Purpose: Verify that files are moved instead of copied when using the –move flag
Command:
Expected Results:
- Files should be moved to their category folders in the target directory
- Source directory should no longer contain the moved files
- Subdirectories in source should remain (unless empty on your OS)
Verification:
#Test Case 4: Duplicate File Handling with Timestamps
Purpose: Verify that duplicate files are handled correctly with timestamps
Preparation:
Command:
Expected Results:
- document1.pdf should be copied with a timestamp in the name (e.g., 2023-01-01T12-30-45-document1.pdf)
- Original document1.pdf should remain unchanged in target directory
- Console output should indicate that a timestamp was added
Verification:
#Test Case 5: Skip Existing Files
Purpose: Verify that existing files are skipped with the –skip_existing flag
Command:
Expected Results:
- document1.pdf should be skipped (not copied again)
- Console output should indicate that document1.pdf was skipped
- Other files should be processed normally
Verification:
#Test Case 6: Include Hidden Files
Purpose: Verify that hidden files are processed when using the –include_hidden flag
Command:
Expected Results:
- .hidden_file should be processed and copied to the “other” category
- Console output should indicate that .hidden_file was processed
Verification:
#Test Case 7: Custom Categories
Purpose: Verify that custom category configurations work correctly
Preparation:
Command:
Expected Results:
- Files should be organized according to the custom categories:
- text_files: document1.pdf, document2.docx
- media: image1.jpg, image2.png, video1.mp4
- code_files: script.py
- other: archive.zip, noextension
- Console output should indicate custom categories are being used
Verification:
#Test Result Interpretation
Each test case should result in files being organized according to the expected results. If any test fails:
- Check console output for error messages
- Verify file permissions in source and target directories
- Ensure test environment was set up correctly
- Check if target directories were created as expected
- Verify file contents to ensure they weren’t corrupted during copy/move
#Cleanup
After testing, remove the test environment:
#Example config
#Maintenance Guide for File Organizer
This document provides guidelines for maintaining and extending the File Organizer utility. It is designed for developers who may be unfamiliar with the original implementation but need to maintain or enhance the codebase.
#Project Structure
The File Organizer consists of:
file_organizer.py
: Main script containing all functionality- Configuration files: JSON files that define custom category mappings
#Code Architecture
The utility follows a simple procedural design with these core components:
- Argument Parsing: Uses
argparse
to process command-line options - Configuration Management: Loads and validates category definitions
- File Processing: Traverses directories and processes files
- File Operations: Handles copying, moving, and naming of files
#Key Functions
Function | Purpose | Implementation Notes |
---|---|---|
categorize_file_by_category() |
Maps files to categories | Performs simple extension lookup |
create_folders() |
Prepares target directory structure | Creates folders only when needed for ‘category’ mode |
handle_long_path() |
Handles Windows path limitations | Windows-specific fix for paths >255 chars |
sort_files() |
Main file processing logic | Contains the core logic and most complex function |
load_config_file() |
Loads custom category definitions | Includes fallback to defaults on error |
main() |
Entry point and argument processing | Sets up and initiates the process |
#Causality Chain
Understanding why certain implementation choices were made:
- Why copy files by default?
- To prevent accidental data loss
- Move operation is available but requires explicit flag
- Why use timestamps for duplicates?
- Preserves both original and new files
- Maintains file history
- Prevents unintentional overwrites
- Why separate extension handling?
- Some users prefer organization by extension
- Provides flexibility for different workflows
- Why include Windows long path handling?
- Windows has a 255 character path limitation
- Without this, deeply nested files would fail to process
#Common Maintenance Tasks
#Adding New File Categories
To add new categories to the default configuration:
- Modify the
DEFAULT_CATEGORIES_CONFIG
dictionary:
#Adding New Command Line Options
To add a new command line option:
- Add the option to the argument parser in
main()
: - Extract the option value:
- Pass the option to functions that need it:
- Update the function signatures and implementations to use the new option
#Error Handling
The current error handling strategy is:
- Individual file errors are caught and reported
- The process continues with the next file
- Overall process doesn’t terminate on individual file errors
When adding new functionality, maintain this pattern:
#Testing
When making changes, ensure you test:
- Basic functionality with default options
- Any specific options you’ve modified
- Edge cases like:
- Empty directories
- Files with unusual names or extremely long paths
- Very large directories
- Permission-restricted files
Follow the testing guide in TESTING.md to verify your changes.
#Performance Considerations
The utility was designed for moderate-sized directories. For very large directories (thousands of files), consider:
- Adding progress indicators for lengthy operations
- Implementing batch processing
- Adding resume capabilities for interrupted operations
#Security Considerations
When modifying the code, maintain these security principles:
- Never execute file contents
- Validate all user inputs, especially paths and configuration files
- Be careful with move operations that permanently alter file systems
- Maintain appropriate error handling to prevent information leakage
#Documentation Updates
When changing functionality, update these documentation components:
- Function docstrings in the code
- README.md for user-facing changes
- MAINTENANCE.md for developer-facing changes
- TESTING.md for new test cases ```
#Decision Record and Implementation Notes
#Key Design Decisions
#1. File Organization Approach
Decision: Implement two organization methods (category and extension)
Context: Different users have different preferences for file organization
Consequences: More flexible tool but more complex implementation and testing required
#2. Default Copy vs. Move
Decision: Make copy the default operation and move optional
Context: Moving files is destructive and could lead to data loss if not used carefully
Consequences: Safer operation but may require more disk space temporarily
#3. Duplicate File Handling
Decision: Implemented three strategies: timestamp, skip, or overwrite
Context: Users need different approaches based on their specific use cases
Consequences: More complexity but greater flexibility for different scenarios
#4. Error Handling Strategy
Decision: Catch and report individual file errors but continue processing
Context: A single problematic file shouldn’t prevent organizing all other files
Consequences: More robust operation but may mask underlying issues
#5. Custom Configuration System
Decision: Use JSON for category definitions
Context: Provides flexibility while using a standard format
Consequences: Requires error handling for invalid JSON but enables easy customization
#Implementation Notes
#Platform Compatibility
- Windows long path handling was added specifically to address the 255-character path limit
- The utility uses path handling that works across Windows, macOS, and Linux
- File metadata preservation is implemented using
shutil.copy2()
instead of regular copy
#Progress Reporting
- Real-time progress updates were implemented to provide feedback during long operations
- The counter system shows both files processed and total files for context
#Security Considerations
- The tool only examines file metadata, not contents
- No execution of files occurs during the organization process
- User input validation is performed for all paths and configuration options
#Performance Optimization
- Directory walking is optimized by filtering directories early when hidden files are excluded
- Folders are created only as needed in extension mode to minimize filesystem operations
#Maintenance Approach
- Code is documented thoroughly for third-party maintenance
- Functions have clear purposes and interfaces
- Error handling is consistent across the codebase
- Testing procedures cover both common and edge cases
#Third-Party Maintenance Guidelines
For developers maintaining this code:
- Understanding the Core Logic:
- The main functionality is in the
sort_files()
function - File categorization happens in
categorize_file_by_category()
- Configuration loading is handled by
load_config_file()
- The main functionality is in the
- Adding New Features:
- Maintain the existing error handling pattern
- Document all changes thoroughly
- Update tests to cover new functionality
- Consider backward compatibility
- Fixing Issues:
- Check for edge cases with unusual filenames or paths
- Verify platform-specific behavior (especially Windows long paths)
- Test with large directories and various file types
- Refactoring Guidelines:
- Maintain clear function purposes
- Preserve the current error handling strategy
- Ensure backward compatibility
- Update documentation to reflect changes