watermark pdf and images

About writing shell scripts and making the most of your shell
Forum rules
Topics in this forum are automatically closed 6 months after creation.
Locked
Rootbeer
Level 3
Level 3
Posts: 115
Joined: Sun Nov 24, 2019 3:22 am

watermark pdf and images

Post by Rootbeer »

Is there a way to add a watermark on pdf and images?

Lets say that I have 10.000 pdf's that would need:

a) a textbased watermark with a serial number.
b) or an image-based watermark, but the image itself would need to be copied 10.000 times with an different serialnumber each time, and finally each round it is added as a watermark on all the pdf's.

1) the watermark needs to be added on all the pages inside the pdf.
2) the watermark needs to be added on a specific part on the pdf.

due to the size, (around 10.000 ), this have to be done using some type of automated script.
can this be done?
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 1 time in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
cday
Level 4
Level 4
Posts: 244
Joined: Mon Dec 02, 2019 12:18 pm
Location: Cheltenham, U.K.

Re: watermark pdf and images

Post by cday »

Are you familiar with scripting in Linux, or for that matter using Windows if you have that available and just want a solution?

I think what you are requesting could probably be done reasonably easily using the command line tool nconvert; it's cross-platform although I've only used it in Windows as scripting in Mint is still on a long to-do list.

Writing text, in your case a numeric value, directly onto an image is possible with control over the positioning and other parameters, that wouldn't strictly be 'watermarking' but would seem to meet your need. And it would be necessary to define a variable for the numeric which could be incremented, which should be easy enough in a script. The only way to confirm that there are no unanticipated issues would be to try it, though.

Your second option of writing a numeric onto an image that could then be watermarked onto an image, would probably also be possible with a similar reservation, but would of course be correspondingly more complex.

I'll be interested to see what more experienced Linux users might suggest, imagemagick is a better known and very comprehensive command line image processing tool, but I only ever had a very quick look at it.
User avatar
BenTrabetere
Level 7
Level 7
Posts: 1887
Joined: Sat Jul 19, 2014 12:04 am
Location: Hattiesburg, MS USA

Re: watermark pdf and images

Post by BenTrabetere »

Rootbeer wrote: Sun Jan 03, 2021 10:51 am Is there a way to add a watermark on pdf and images?

Lets say that I have 10.000 pdf's that would need:
I would use Imagemagick convert tool. Here is a bash script to insert a watermark and serial number to JPGs that should help to get you started. This is not my script - I shamelessly lifted it a long time ago and I did not note the author at the time.

You can change the appearance and position of the watermark by editing the options. It also works on PDFs if you change *.jpg to *.pdf.

Code: Select all

#!/bin/bash

# Change the working directory to the one specified as argument.
if ! cd "$*"; then
    echo "error: the folder '$*' doesn't exist."
    exit
fi

# Create a directory called "output" into the working directory.
mkdir output &> /dev/null

# Start counting in 1.
counter=1

# For each file that ends in .jpg:
for image in *.jpg; do  
    convert "$image" \
            -background transparent \
            -fill grey \
            -font ubuntu \
            -size 280x160 \
            -pointsize 28 \
            -gravity southeast \
            -annotate +0+0 "TEXT: $counter" \
            "output/$image"

    # Increment counter by one.
    ((counter++)) 
done
Patreon sponsor since August 2022
Image
Rootbeer
Level 3
Level 3
Posts: 115
Joined: Sun Nov 24, 2019 3:22 am

Re: watermark pdf and images

Post by Rootbeer »

BenTrabetere wrote: Sun Jan 03, 2021 1:11 pm I would use Imagemagick convert tool. Here is a bash script to insert a watermark and serial number to JPGs that should help to get you started. This is not my script - I shamelessly lifted it a long time ago and I did not note the author at the time.

You can change the appearance and position of the watermark by editing the options. It also works on PDFs if you change *.jpg to *.pdf.
IMAGES:
I had to make some changes, but it worked wonderful !

- At this time, I only have one picture, and I applied a watermark on it according to my list of customers, "customers.txt"

So the image got an watermark, (times a couple of thousand) with different serialnumbers to the outputfile /output/sn1000.png

PDF
I also tried to "cheat" by skipping watermarking the png-file directly and instead try to insert a watermark directly into the pdf by changing *.jpg to *.pdf like you described but it didn't work, got error message:

Code: Select all

convert-im6.q16: attempt to perform an operation not allowed by the security policy `PDF' @ error/constitute.c/IsCoderAuthorized/408.
convert-im6.q16: no images defined `output/mybook.pdf' @ error/convert.c/ConvertImageCommand/3258.
Perhaps it simply cannot be done adding watermark directly to the pdf ?

It would simplify if I only needed to have a book and directly add watermark in all pages in a specific area, like top right corner.

Alternatively, now when the watermarking script works for the png images, is there a way to simply insert an png image into the pdf? (to all pages, top-right corner)

Many thanks!
User avatar
BenTrabetere
Level 7
Level 7
Posts: 1887
Joined: Sat Jul 19, 2014 12:04 am
Location: Hattiesburg, MS USA

Re: watermark pdf and images

Post by BenTrabetere »

Rootbeer wrote: Sun Jan 03, 2021 3:30 pm PDF
I also tried to "cheat" by skipping watermarking the png-file directly and instead try to insert a watermark directly into the pdf by changing *.jpg to *.pdf like you described but it didn't work, got error message:

Code: Select all

convert-im6.q16: attempt to perform an operation not allowed by the security policy `PDF' @ error/constitute.c/IsCoderAuthorized/408.
convert-im6.q16: no images defined `output/mybook.pdf' @ error/convert.c/ConvertImageCommand/3258.
Perhaps it simply cannot be done adding watermark directly to the pdf ?
No, this is a problem with an ImageMagick security policy that is related to a (now patched) Ghostscript security issue. It is easy to correct. Start by confirming you have Ghostscript 9.24 or higher with

Code: Select all

gs --version
Next, edit your /etc/ImageMagick-6/policy.xml by entering

Code: Select all

xed admin:///etc/ImageMagick-6/policy.xml
Look for <policy domain="coder" rights="none" pattern="PDF" /> and replace it with
<policy domain="coder" rights="read|write" pattern="PDF" />
Alternatively, now when the watermarking script works for the png images, is there a way to simply insert an png image into the pdf? (to all pages, top-right corner)
Yes, it is possible to use an image as a watermark in a PDF or another image, but I do not possess the ImageMagick fu to explain how to do it. I can tell you the -gravity setting controls the location of the watermark Use -gravity NorthEast to place it in the Top/Right corner.
See: https://www.imagemagick.org/script/comm ... hp#gravity

Fred Weinhaus is a master of IM-fu. http://www.fmwconcepts.com/imagemagick/index.php
Patreon sponsor since August 2022
Image
Rootbeer
Level 3
Level 3
Posts: 115
Joined: Sun Nov 24, 2019 3:22 am

Re: watermark pdf and images

Post by Rootbeer »

I've done some experimenting.

I found that imagemagick with the convert -command was very bad, atleast regarding pdf:
When I inserted a text-only "watermark" inside a pdf (every page) the pdf grew from 4,1MB to 88MB.

Luckily I found another program called pdfstamp that could handle this much better. This program can insert a image-based watermark inside the pdf.

https://www.crossref.org/labs/pdfstamp/

I first used the convert -command to generate an png-image from a text-string:

convert -size 230x60 xc:transparent \
-font ubuntu \
-pointsize 60 \
-fill red \
-gravity center \
-annotate +0+0 "SN:1234" watermarkstamp.png

Now I have the watermark-stamp with the unique ID (generated from textfile within a loop).

And after that I used the pdfstamp -program to insert the watermarkstamp.png into the pdf-file (every page). The pdf grew from 4,2MB to 4,2MB.
User avatar
tuxoneseven
Level 2
Level 2
Posts: 89
Joined: Mon Dec 14, 2020 8:08 am

Re: watermark pdf and images

Post by tuxoneseven »

Rootbeer wrote: Tue Jan 05, 2021 9:38 am And after that I used the pdfstamp -program to insert the watermarkstamp.png into the pdf-file (every page). The pdf grew from 4,2MB to 4,2MB.
From 4.2MB to 4.2MB?
Rootbeer wrote: Tue Jan 05, 2021 9:38 am I found that imagemagick with the convert -command was very bad, atleast regarding pdf:
Some other options are

1. pdftk

Code: Select all

pdftk original.pdf stamp watermark.pdf output final.pdf

2. Use Python and PyPDF2

Code: Select all

# !/usr/bin/python
# Adding a watermark to a single-page PDF

import PyPDF2

input_file = "example.pdf"
output_file = "example-drafted.pdf"
watermark_file = "draft.pdf"

with open(input_file, "rb") as filehandle_input:
    # read content of the original file
    pdf = PyPDF2.PdfFileReader(filehandle_input)
    
    with open(watermark_file, "rb") as filehandle_watermark:
        # read content of the watermark
        watermark = PyPDF2.PdfFileReader(filehandle_watermark)
        
        # get first page of the original PDF
        first_page = pdf.getPage(0)
        
        # get first page of the watermark PDF
        first_page_watermark = watermark.getPage(0)
        
        # merge the two pages
        first_page.mergePage(first_page_watermark)
        
        # create a pdf writer object for the output file
        pdf_writer = PyPDF2.PdfFileWriter()
        
        # add page
        pdf_writer.addPage(first_page)
        
        with open(output_file, "wb") as filehandle_output:
            # write the watermarked file to the new file
            pdf_writer.write(filehandle_output)
3. Python with MuPDF and fitz

Code: Select all

# !/usr/bin/python

import fitz

input_file = "example.pdf"
output_file = "example-with-barcode.pdf"
barcode_file = "barcode.png"

# define the position (upper-right corner)
image_rectangle = fitz.Rect(450,20,550,120)

# retrieve the first page of the PDF
file_handle = fitz.open(input_file)
first_page = file_handle[0]

# add the image
first_page.insertImage(image_rectangle, fileName=barcode_file)

file_handle.save(output_file)
Active Python user - PyQt
Rootbeer
Level 3
Level 3
Posts: 115
Joined: Sun Nov 24, 2019 3:22 am

Re: watermark pdf and images

Post by Rootbeer »

tuxoneseven wrote: Sat Jan 09, 2021 6:54 am
3. Python with MuPDF and fitz

Code: Select all

# !/usr/bin/python

import fitz

input_file = "example.pdf"
output_file = "example-with-barcode.pdf"
barcode_file = "barcode.png"

# define the position (upper-right corner)
image_rectangle = fitz.Rect(450,20,550,120)

# retrieve the first page of the PDF
file_handle = fitz.open(input_file)
first_page = file_handle[0]

# add the image
first_page.insertImage(image_rectangle, fileName=barcode_file)

file_handle.save(output_file)

this one seems really good with the having the position defined.
How do I get the number of pages in the book?
I know almost nothing of programming python scripts unfortunately.

In thinking that I want to do a loop and put the watermark to all pages
Rootbeer
Level 3
Level 3
Posts: 115
Joined: Sun Nov 24, 2019 3:22 am

Re: watermark pdf and images

Post by Rootbeer »

also, I can't the python script to work, I'm getting errors that it contains lots of syntax error
User avatar
tuxoneseven
Level 2
Level 2
Posts: 89
Joined: Mon Dec 14, 2020 8:08 am

Re: watermark pdf and images

Post by tuxoneseven »

Use your package manager to install the module, something like

Code: Select all

sudo apt install python3-fitz
. If you do it using pip it starts to complain. Probably I'm overlooking something, but using the package manager it works, so why not do it the lazy way :lol:

Loops in Python are written like this

Code: Select all

for page in pages:
    statement
    statement
    statement
where it needs four spaces indention. There's a long explanation of for loops here. But in short, if there are not four spaces Python starts to complain.

I made the program below. Make a backup of your original pdf first, in case anything goes wrong. Save the file below as theprogram.py

Code: Select all

import fitz
doc = fitz.open("input.pdf")
rect = fitz.Rect(0, 0, 100, 100)

for page in doc:
    page.insertImage(rect, filename="logo.jpeg")

doc.save("output.pdf")
Then install the module

Code: Select all

python3-fitz
using the package manager. On Ubuntu I did

Code: Select all

sudo apt install python3-fitz
Change the file "input.pdf" to your file, "output.pdf" to your output file and "logo.jpeg" to your watermark, all files should be in the same directory.
Then run

Code: Select all

python3 theprogram.py
Active Python user - PyQt
Rootbeer
Level 3
Level 3
Posts: 115
Joined: Sun Nov 24, 2019 3:22 am

Re: watermark pdf and images

Post by Rootbeer »

tuxoneseven wrote: Sat Jan 09, 2021 10:53 am ....
Excellent! Thank you!

I tried to put the barcode to the top-right corner like you described:

image_rectangle = fitz.Rect(450,20,550,120)

but I'm getting errors:


Traceback (most recent call last):
File "./stamp.py", line 12, in <module>
page.insertImage(rect, filename="stamp.png")
File "/usr/lib/python3/dist-packages/fitz/utils.py", line 252, in insertImage
raise ValueError("rect must be finite and not empty")
ValueError: rect must be finite and not empty
User avatar
tuxoneseven
Level 2
Level 2
Posts: 89
Joined: Mon Dec 14, 2020 8:08 am

Re: watermark pdf and images

Post by tuxoneseven »

Is stamp.png in the same directory as the python file and is it a png? Try to see if it runs with the original parameters or a other formats (jpeg).
Active Python user - PyQt
Rootbeer
Level 3
Level 3
Posts: 115
Joined: Sun Nov 24, 2019 3:22 am

Re: watermark pdf and images

Post by Rootbeer »

tuxoneseven wrote: Sat Jan 09, 2021 1:24 pm Is stamp.png in the same directory as the python file and is it a png? Try to see if it runs with the original parameters or a other formats (jpeg).
Yes, and the script works fine if I have:
image_rectangle = fitz.Rect(0, 0, 100, 30)

but it doesn't work with:
image_rectangle = fitz.Rect(450,20,550,120)

I've also tried to experiment with these numbers but they simply don't make sense and suddenly I have the imagestamp enlarged and covers the whole PDF-page.

What does the numbers represent?
User avatar
tuxoneseven
Level 2
Level 2
Posts: 89
Joined: Mon Dec 14, 2020 8:08 am

Re: watermark pdf and images

Post by tuxoneseven »

Rootbeer wrote: Sat Jan 09, 2021 1:57 pm
tuxoneseven wrote: Sat Jan 09, 2021 1:24 pm Is stamp.png in the same directory as the python file and is it a png? Try to see if it runs with the original parameters or a other formats (jpeg).
Yes, and the script works fine if I have:
image_rectangle = fitz.Rect(0, 0, 100, 30)

but it doesn't work with:
image_rectangle = fitz.Rect(450,20,550,120)

I've also tried to experiment with these numbers but they simply don't make sense and suddenly I have the imagestamp enlarged and covers the whole PDF-page.

What does the numbers represent?
Try moving it a few pixels, perhaps the numbers mean something else. From the docs:
Rect represents a rectangle defined by four floating point numbers x0, y0, x1, y1. They are treated as being coordinates of two diagonally opposite points. The first two numbers are regarded as the “top left” corner Px0,y0 and Px1,y1 as the “bottom right” one
Active Python user - PyQt
Rootbeer
Level 3
Level 3
Posts: 115
Joined: Sun Nov 24, 2019 3:22 am

Re: watermark pdf and images

Post by Rootbeer »

tuxoneseven wrote: Sat Jan 09, 2021 2:15 pm
Rootbeer wrote: Sat Jan 09, 2021 1:57 pm
tuxoneseven wrote: Sat Jan 09, 2021 1:24 pm Is stamp.png in the same directory as the python file and is it a png? Try to see if it runs with the original parameters or a other formats (jpeg).
Yes, and the script works fine if I have:
image_rectangle = fitz.Rect(0, 0, 100, 30)

but it doesn't work with:
image_rectangle = fitz.Rect(450,20,550,120)

I've also tried to experiment with these numbers but they simply don't make sense and suddenly I have the imagestamp enlarged and covers the whole PDF-page.

What does the numbers represent?
Try moving it a few pixels, perhaps the numbers mean something else. From the docs:
Rect represents a rectangle defined by four floating point numbers x0, y0, x1, y1. They are treated as being coordinates of two diagonally opposite points. The first two numbers are regarded as the “top left” corner Px0,y0 and Px1,y1 as the “bottom right” one
I got it to work, thanks a million ! :)
Locked

Return to “Scripts & Bash”