Cleaning up and shrinking a PDF file

Quick to answer questions about finding your way around Linux Mint as a new user.
Forum rules
There are no such things as "stupid" questions. However if you think your question is a bit stupid, then this is the right place for you to post it. Stick to easy to-the-point questions that you feel people can answer fast. For long and complicated questions use the other forums in the support section.
Before you post read how to get help. Topics in this forum are automatically closed 6 months after creation.
Locked
Catdaddy

Cleaning up and shrinking a PDF file

Post by Catdaddy »

Although I have been using Linux Mint for 15 years, I am not familiar with many things, so I started this here. I have a PDF file, a scanned manual, that I need to: 1, clean up the background around the text so it doesn't print a gray page with black text (hard to see, and uses too much ink) AND 2, I want to shrink it afterwards to share it. It is 1.1 GB, and comparable manuals for later vehicles are 1/2 the size or smaller. Some MUCH smaller. Although I can use terminal, in a newbie cut n paste fashion, I would prefer to do this with a GUI interface. Can anybody point me to a suitable tutorial to do this? THANKS!!!
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 1 time in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
hcentaur13

Re: Cleaning up and shrinking a PDF file

Post by hcentaur13 »

shrinking a pdf means loosing content. the eBook reader clementine comes with an pdf editor, Use your repository to install it. Then use its editor to remove the content of the pdf you're not interested in.
User avatar
wallyUSA
Level 6
Level 6
Posts: 1439
Joined: Thu Jun 08, 2017 2:31 pm
Location: Top of Georgia

Re: Cleaning up and shrinking a PDF file

Post by wallyUSA »

Catdaddy wrote:... Can anybody point me to a suitable tutorial to do this? THANKS!!!
I use the smallpdf.com. Used it for several years to reduce pdf file sizes before I post on the web. Has several other file conversion options like 'pdf to jpg' and others. The free option allows two conversions per day.
> If your query has been resolved, edit your original post and add <SOLVED> to the beginning of the subject line. This may help others find solutions. <

Dell Latitude 7490 Mint 21.3 Ker 5.15.0-105 Cinn 6.0.4
lmuserx4849

Re: Cleaning up and shrinking a PDF file

Post by lmuserx4849 »

I have the same issue. I came across https://gist.github.com/hubgit/6078384. They are using a combination of pdftk, exiftool, qpdf. To shrink, most examples used ghostscript (gs), like https://askubuntu.com/questions/207447/ ... a-pdf-file. If you search, there are gui's to these commands if that is your preference. I had some success. I'm on 17.3, and the tools aren't current. I went to a different machine with current versions and got different results and it ran much faster. This is still a work-in-progress for me...
User avatar
GS3
Level 8
Level 8
Posts: 2384
Joined: Fri Jan 06, 2017 7:51 am

Re: Cleaning up and shrinking a PDF file

Post by GS3 »

This question is really not about Linux at all but is a very interesting question and I think people would benefit from knowing a bit more about computer graphics.

Scanning directly to PDF is an abomination which should never be done. It would only be forgivable for grannies scanning a photo of their baby taken decades ago.

Scanning graphics should be done first to the appropriate format depending on the type of graphic.

Black and white text and line drawings compress best when scanned at 300 DPI into TIFF-CCITT-Fax4. With this you can get perfectly readable pages which only take around 50 KBytes per letter/A4 page.

Not only that but B&W pages like that can be OCR'd which is really what should be done if possible.

Photo images compress best using 300 DPI into JPG with a compression level adequate for the picture in question.

In between we have palette-based images which would not be the result of a direct scan but a scan can be processed to yield a good palette based image which will compress best using PNG (or GIF if you must).

Once you have the best image for each page you can put it in a PDF if you absolutely must.

It really pays to understand the differences between compression methods and what type of images they are best suited for.

Scanning a B&W document direct to JPG and PDF is an abomination. I have little patience with people who send me a huge file of a document which can hardly be read when with a little training they could produce a file ten times more readable and one tenth the size.
Please do not use animated GIFs in avatars because many of us find them distracting and obnoxious. Thank you.
lmuserx4849

Re: Cleaning up and shrinking a PDF file

Post by lmuserx4849 »

GS3 wrote:This question is really not about Linux at all but is a very interesting question and I think people would benefit from knowing a bit more about computer graphics.

Scanning directly to PDF is an abomination which should never be done. It would only be forgivable for grannies scanning a photo of their baby taken decades ago.
...
GS3, what tools do you use on linux?

Can you even scan directly to PDF??? I'm looking at xsane addon in Gimp and you can export as pdf. I've used scanimage and you can only save as tiff, png, or jpeg. The saning options are independent of saved format. Probably best to do cleanup before combining images into a pdf. But can anything be done to an existing pdf?

A google search brought up gscan2pdf and a linux journal article, http://www.linuxjournal.com/article/9676.
I've never had much luck with OCR and tesseract.

Catdaddy, what application did you use to scan and create your pdf?
User avatar
GS3
Level 8
Level 8
Posts: 2384
Joined: Fri Jan 06, 2017 7:51 am

Re: Cleaning up and shrinking a PDF file

Post by GS3 »

lmuserx4849 wrote:
GS3 wrote:GS3, what tools do you use on linux?

Can you even scan directly to PDF???
I have never scanned directly to PDF but when I have received a photo in a PDF and complained to the sender I have sometimes been told that is what the scanner produced. I suspect some scanners have an "easy" setting for the computer illiterate and the "advantage" is that it allows several images to be scanned directly into one file. It probably depends on the scanner and its driver and software.

I have no experience with doing all this in Linux as I am trying to transition from Windows XP but finding it quite hard precisely because I need to learn new tools. With Windows the main program I use for this type of graphics work is Irfanview which I really like and I am quite efficient with it. Unfortunately there is no Linux version and for now I have settled with running Irfanview with WINE and it works reasonably well but normally I just go to my Windows computer where it runs much better.
Please do not use animated GIFs in avatars because many of us find them distracting and obnoxious. Thank you.
User avatar
Pilosopong Tasyo
Level 6
Level 6
Posts: 1432
Joined: Mon Jun 22, 2009 3:26 am
Location: Philippines

Re: Cleaning up and shrinking a PDF file

Post by Pilosopong Tasyo »

AFAIK, a fine-grained control to shrink a PDF file embedded with scanned text (as opposed to typed text) and images is a manual and potentially laborious process. But in the end, you may end up with a smaller and cleaner PDF file suited to your exact specifications.

You can use GIMP to import the PDF file, select the pages you want to retain and edit, open them as separate images instead of layers, and set the preferred output resolution. Higher resolution = more detail retained, which potentially means more disk space consumed, depending if the pages in question are in color, grayscale, or monochrome, and what file format and compression you plan on using to save the results.

Use the available tools in GIMP to make the adjustments (white balance, contrast, color corrections, convert to grayscale/monochrome, et al.). Which tool and what setting entirely depends on how fine-grained you want each page to come out.

Save them as separate files in sequential order (e.g. scan1.jpg, scan2.jpg, scan3.jpg, et al.). As mentioned earlier, certain image characteristics are better suited if saved in another file format, not just JPEG. As always, YMMV. Once you're done, you can use Imagemagick's convert CLI tool (e.g. convert scan*.jpg whatever.pdf) to combine those images and convert them as a unified PDF file.
o Give a man a fish and he will eat for a day. Teach him how to fish and he will eat for a lifetime!
o If an issue has been fixed, please edit your first post and add the word [SOLVED].
User avatar
GS3
Level 8
Level 8
Posts: 2384
Joined: Fri Jan 06, 2017 7:51 am

Re: Cleaning up and shrinking a PDF file

Post by GS3 »

Pilosopong Tasyo wrote:AFAIK, a fine-grained control to shrink a PDF file embedded with scanned text (as opposed to typed text) and images is a manual and potentially laborious process.
To the point that it is almost always better to re-scan it right. Saves time and the result will be better.
Please do not use animated GIFs in avatars because many of us find them distracting and obnoxious. Thank you.
User avatar
Pilosopong Tasyo
Level 6
Level 6
Posts: 1432
Joined: Mon Jun 22, 2009 3:26 am
Location: Philippines

Re: Cleaning up and shrinking a PDF file

Post by Pilosopong Tasyo »

GS3 wrote:To the point that it is almost always better to re-scan it right. Saves time and the result will be better.
Yes. However, if you're the recipient of such a PDF file, you'll probably have to clean it up one page at a time.
o Give a man a fish and he will eat for a day. Teach him how to fish and he will eat for a lifetime!
o If an issue has been fixed, please edit your first post and add the word [SOLVED].
User avatar
Spearmint2
Level 16
Level 16
Posts: 6900
Joined: Sat May 04, 2013 1:41 pm
Location: Maryland, USA

Re: Cleaning up and shrinking a PDF file

Post by Spearmint2 »

What I do is in my printer program, tell it to now print background image.
All things go better with Mint. Mint julep, mint jelly, mint gum, candy mints, pillow mints, peppermint, chocolate mints, spearmint,....
TI58C
Level 4
Level 4
Posts: 389
Joined: Tue Jul 18, 2017 5:57 am

Re: Cleaning up and shrinking a PDF file

Post by TI58C »

Catdaddy,

What isn't clear to me is this: pdf-files can be purely image, but can also contain the text in a separate layer. All depends on how it was scanned. My HP-officejet 8500 can do both.

If your pdf is image-only and of bad quality (dark grey background, black text), I think the GIMP-route explained above is your best option. Since you mention it is hard to read, sadly I think this is the case.

If it is light-grey bg + black text (that is readable / OCR-able), you might try PDFXChangeview. I use it with playonlinux. The free version includes an option to OCR the image-pdf and superimpose the extracted text in a separate layer. Then you can cut-and paste this text to a new pdf-file without the bg.
Yes, the prog is for win only. But I'm not a purist, use whatever is easiest.


If (or once) your pdf has a text-layer, you can also extract the text with pdftotext

For combining several pdf-files and several other things, I use pdftk.

Good luck,
Robert
Last edited by TI58C on Thu Aug 03, 2017 4:19 pm, edited 1 time in total.
Linux is like my late labrador lady-dog: loyal and loving if you treat her lady-like, disbehaving princess if you don't.
User avatar
Spearmint2
Level 16
Level 16
Posts: 6900
Joined: Sat May 04, 2013 1:41 pm
Location: Maryland, USA

Re: Cleaning up and shrinking a PDF file

Post by Spearmint2 »

Easiest way to determine a text layer is try and hilite it and do a copy/paste of it. If you can't, then it's image text, not actual text.
All things go better with Mint. Mint julep, mint jelly, mint gum, candy mints, pillow mints, peppermint, chocolate mints, spearmint,....
Catdaddy

Re: Cleaning up and shrinking a PDF file

Post by Catdaddy »

Catdaddy, what application did you use to scan and create your pdf?[/quote]

I didn't create it. It is an automotive manual I downloaded a while back. I'm trying to share it with others, but due to the immense file size, I can't easily do so. For now, I uploaded it to my Dropbox and share from there. would still ilke to fix it, though. Please forgive the delay in response, I had to have kidney surgery and I was down for a while.
Catdaddy

Re: Cleaning up and shrinking a PDF file

Post by Catdaddy »

Yes. However, if you're the recipient of such a PDF file, you'll probably have to clean it up one page at a time.[/quote]

Yeah, that ain't happening. It's over 1000 pages. I just put it in dropbox to share.
kukamuumuka

Re: Cleaning up and shrinking a PDF file

Post by kukamuumuka »

Not a GUI, but a script. Copy the PDF one folder where run the next script.

Code: Select all

convert -density 200 -trim *.pdf -quality 100 picture.jpg && mkdir PICTURES && find -iname "*.jpg" -exec mv {} PICTURES \; && convert -density 85 -trim PICTURES/*.jpg -quality 75 picture.pdf && rm -rf PICTURES
http://puolanka.info/goto/convert-and-r ... e-command/
Locked

Return to “Beginner Questions”