A Python-based image processing tool for automatically removing blue ink from scanned worksheets and documents, while preserving the original printed black text.
This tool was designed to clean up scanned worksheets that have been completed with blue ink, making it easy to:
Below is a side-by-side comparison showing the effectiveness of the blue ink removal process:
Original Scan (with blue ink) | Processed Result (blue ink removed) |
---|---|
![]() |
![]() |
As demonstrated in these images, the program effectively removes blue handwritten annotations while maintaining the quality of the original printed content.
You can install the required packages using:
pip install opencv-python numpy matplotlib
docs/
)image_path
: Path to your input image with blue inkoutput_path
: Where you want to save the cleaned imagepython clean.py
Or if using a Jupyter notebook:
jupyter notebook clean.ipynb
The program processes images in several carefully engineered stages:
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load the image
image_path = "docs/scan.jpeg" # Replace with your image path
output_path = "docs/output_cleaned.jpg" # Replace with desired output path
image = cv2.imread(image_path)
This section loads the required libraries and reads the input image.
# Convert to HSV color space for better blue detection
hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
![]() |
# Define range for blue color (adjust these values based on your image)
lower_blue = np.array([100, 50, 50]) # Lower bound for blue in HSV
upper_blue = np.array([130, 255, 255]) # Upper bound for blue in HSV
blue_mask = cv2.inRange(hsv_image, lower_blue, upper_blue)
![]() |
# Invert the blue mask to use it for removing blue ink
blue_mask_inv = cv2.bitwise_not(blue_mask)
![]() |
This stage detects blue ink by:
# Create a mask for black content (low intensity in all channels)
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
black_mask = cv2.threshold(gray_image, 110, 255, cv2.THRESH_BINARY_INV)[1] # Adjust threshold (110) as needed
![]() |
This stage identifies the black printed text by:
# Combine masks to keep black content and remove blue ink
combined_mask = cv2.bitwise_and(black_mask, blue_mask_inv)
![]() |
# Apply the mask to the original image to retain only black content
clean_image = cv2.bitwise_and(image, image, mask=combined_mask)
![]() |
This crucial step:
# Convert to grayscale for enhancement
clean_gray = cv2.cvtColor(clean_image, cv2.COLOR_BGR2GRAY)
![]() |
# Enhance contrast using CLAHE
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
enhanced = clahe.apply(clean_gray)
![]() |
# Denoise the image
denoised = cv2.fastNlMeansDenoising(enhanced, None, 10, 7, 21)
![]() |
# Apply binary thresholding to get black text on white background
_, binary = cv2.threshold(denoised, 200, 255, cv2.THRESH_BINARY)
# Apply morphological operations with an adjustable kernel
kernel_size = 51 # Start here, can increase up to 51 or 101
kernel = np.ones((kernel_size, kernel_size), np.uint8)
cleaned = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel, iterations=1)
cleaned = cv2.morphologyEx(cleaned, cv2.MORPH_OPEN, kernel, iterations=1)
This enhancement stage:
# Create the final output image (white background with black text)
output_image = np.ones_like(image) * 255
output_image[combined_mask > 0] = 0 # Set black regions
output_image = cv2.cvtColor(output_image, cv2.COLOR_BGR2GRAY)
# Convert to grayscale for simplicity
![]() |
This stage:
# Save the result
cv2.imwrite(output_path, output_image)
print(f"Image saved to {output_path}")
This final step saves the cleaned image to the specified output path.
You can tune the following parameters to optimize for different scanned documents:
lower_blue = np.array([100, 50, 50]) # Lower bound for blue in HSV
upper_blue = np.array([130, 255, 255]) # Upper bound for blue in HSV
black_mask = cv2.threshold(gray_image, 110, 255, cv2.THRESH_BINARY_INV)[1]
kernel_size = 51 # Size of the kernel for morphological operations
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
clipLimit
to control contrast enhancementdenoised = cv2.fastNlMeansDenoising(enhanced, None, 10, 7, 21)
This tool may not work effectively for all documents and may require fine-tuning depending on specific conditions. Below are some boundary conditions and limitations to consider:
Varied Blue Shades: The tool relies on HSV color ranges (default: 100-130 hue) to detect blue ink. If the blue ink varies significantly in shade (e.g., light blue pens or faded ink), the default range may miss some annotations. Fine-tuning the lower_blue and upper_blue values is necessary.
Low Contrast Documents: If the original black text has low contrast with the background (e.g., faded prints or colored paper), the black mask threshold (default: 110) may fail to distinguish text, requiring adjustment or manual intervention.
Complex Backgrounds: Documents with patterned or non-white backgrounds may confuse the thresholding and masking stages, leading to incomplete blue ink removal or text loss. Preprocessing to normalize the background or adjusting the black mask threshold can help.
High Noise Levels: Scanned images with significant noise (e.g., from poor scanning quality) may require stronger denoising (e.g., increasing the first parameter in fastNlMeansDenoising to 15-20) or larger kernel sizes, though this risks blurring fine details.
Overlapping Colors: If blue ink overlaps with black text or other colors, the tool may inadvertently remove parts of the black content. Careful tuning of the blue mask and black mask thresholds, or manual editing, may be needed.
Large Document Size: For very high-resolution scans, the default tileGridSize=(8, 8) in CLAHE or a 51x51 kernel may be insufficient or overly aggressive. Scaling these parameters proportionally to the image size is recommended.
To address these limitations, test the tool on a sample document and adjust the parameters in the “Tuning and Modification” section as needed. For complex cases, consider preprocessing the image (e.g., background removal) or using additional image processing techniques.
You can extend this tool to handle other ink colors by:
If your results aren’t optimal, try these adjustments:
Ravishankar Sivasubramaniam
Contributions to improve Blue Ink Remover are welcome. Here’s how you can contribute:
git checkout -b feature/amazing-feature
)git commit -m 'Add some amazing feature'
)git push origin feature/amazing-feature
)Please make sure to update tests as appropriate and adhere to the existing coding style.
This project is licensed under the MIT License - see below for details:
MIT License
Copyright (c) 2025 Ravishankar Sivasubramaniam
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.