Cleaning up and shrinking a PDF file

All Gurus once were Newbies
Forum rules
There are no such things as "stupid" questions. However if you think your question is a bit stupid, then this is the right place for you to post it. Please stick to easy to-the-point questions that you feel people can answer fast. For long and complicated questions prefer the other forums within the support section.
Before you post please read how to get help
Catdaddy
Level 1
Level 1
Posts: 29
Joined: Wed Aug 17, 2011 8:46 pm

Cleaning up and shrinking a PDF file

Postby Catdaddy » Fri Jun 23, 2017 1:14 am

Although I have been using Linux Mint for 15 years, I am not familiar with many things, so I started this here. I have a PDF file, a scanned manual, that I need to: 1, clean up the background around the text so it doesn't print a gray page with black text (hard to see, and uses too much ink) AND 2, I want to shrink it afterwards to share it. It is 1.1 GB, and comparable manuals for later vehicles are 1/2 the size or smaller. Some MUCH smaller. Although I can use terminal, in a newbie cut n paste fashion, I would prefer to do this with a GUI interface. Can anybody point me to a suitable tutorial to do this? THANKS!!!

hcentaur13
Level 3
Level 3
Posts: 112
Joined: Sat May 18, 2013 5:13 pm

Re: Cleaning up and shrinking a PDF file

Postby hcentaur13 » Fri Jun 23, 2017 3:05 am

shrinking a pdf means loosing content. the eBook reader clementine comes with an pdf editor, Use your repository to install it. Then use its editor to remove the content of the pdf you're not interested in.

User avatar
wallyUSA
Level 3
Level 3
Posts: 196
Joined: Thu Jun 08, 2017 2:31 pm
Location: Georgia USA (Zulu-4/5)

Re: Cleaning up and shrinking a PDF file

Postby wallyUSA » Fri Jun 23, 2017 7:23 am

Catdaddy wrote:... Can anybody point me to a suitable tutorial to do this? THANKS!!!

I use the smallpdf.com. Used it for several years to reduce pdf file sizes before I post on the web. Has several other file conversion options like 'pdf to jpg' and others. The free option allows two conversions per day.
Mint 18.2 Sonya Cinnamon 3.4.6 Kernel: 4.10.0-33-generic x86_64 (64 bit) (in VBox 5.1.28)
Please, if your query has been resolved, edit your first post and add [SOLVED] to the title.

lmuserx4849
Level 4
Level 4
Posts: 461
Joined: Wed Dec 17, 2014 2:55 am

Re: Cleaning up and shrinking a PDF file

Postby lmuserx4849 » Tue Jul 11, 2017 2:54 am

I have the same issue. I came across https://gist.github.com/hubgit/6078384. They are using a combination of pdftk, exiftool, qpdf. To shrink, most examples used ghostscript (gs), like https://askubuntu.com/questions/207447/ ... a-pdf-file. If you search, there are gui's to these commands if that is your preference. I had some success. I'm on 17.3, and the tools aren't current. I went to a different machine with current versions and got different results and it ran much faster. This is still a work-in-progress for me...

GS3
Level 2
Level 2
Posts: 77
Joined: Fri Jan 06, 2017 7:51 am

Re: Cleaning up and shrinking a PDF file

Postby GS3 » Sat Jul 15, 2017 7:13 pm

This question is really not about Linux at all but is a very interesting question and I think people would benefit from knowing a bit more about computer graphics.

Scanning directly to PDF is an abomination which should never be done. It would only be forgivable for grannies scanning a photo of their baby taken decades ago.

Scanning graphics should be done first to the appropriate format depending on the type of graphic.

Black and white text and line drawings compress best when scanned at 300 DPI into TIFF-CCITT-Fax4. With this you can get perfectly readable pages which only take around 50 KBytes per letter/A4 page.

Not only that but B&W pages like that can be OCR'd which is really what should be done if possible.

Photo images compress best using 300 DPI into JPG with a compression level adequate for the picture in question.

In between we have palette-based images which would not be the result of a direct scan but a scan can be processed to yield a good palette based image which will compress best using PNG (or GIF if you must).

Once you have the best image for each page you can put it in a PDF if you absolutely must.

It really pays to understand the differences between compression methods and what type of images they are best suited for.

Scanning a B&W document direct to JPG and PDF is an abomination. I have little patience with people who send me a huge file of a document which can hardly be read when with a little training they could produce a file ten times more readable and one tenth the size.
HP Compaq Elite 8300 CMT - Linux Mint 18.2 Sonya - Kernel 4.4.0-53-generic X64 - Cinnamon 3.4.4 - Nemo

lmuserx4849
Level 4
Level 4
Posts: 461
Joined: Wed Dec 17, 2014 2:55 am

Re: Cleaning up and shrinking a PDF file

Postby lmuserx4849 » Sun Jul 16, 2017 4:58 am

GS3 wrote:This question is really not about Linux at all but is a very interesting question and I think people would benefit from knowing a bit more about computer graphics.

Scanning directly to PDF is an abomination which should never be done. It would only be forgivable for grannies scanning a photo of their baby taken decades ago.
...


GS3, what tools do you use on linux?

Can you even scan directly to PDF??? I'm looking at xsane addon in Gimp and you can export as pdf. I've used scanimage and you can only save as tiff, png, or jpeg. The saning options are independent of saved format. Probably best to do cleanup before combining images into a pdf. But can anything be done to an existing pdf?

A google search brought up gscan2pdf and a linux journal article, http://www.linuxjournal.com/article/9676.
I've never had much luck with OCR and tesseract.

Catdaddy, what application did you use to scan and create your pdf?

GS3
Level 2
Level 2
Posts: 77
Joined: Fri Jan 06, 2017 7:51 am

Re: Cleaning up and shrinking a PDF file

Postby GS3 » Sun Jul 16, 2017 8:00 am

lmuserx4849 wrote:
GS3 wrote:GS3, what tools do you use on linux?

Can you even scan directly to PDF???
I have never scanned directly to PDF but when I have received a photo in a PDF and complained to the sender I have sometimes been told that is what the scanner produced. I suspect some scanners have an "easy" setting for the computer illiterate and the "advantage" is that it allows several images to be scanned directly into one file. It probably depends on the scanner and its driver and software.

I have no experience with doing all this in Linux as I am trying to transition from Windows XP but finding it quite hard precisely because I need to learn new tools. With Windows the main program I use for this type of graphics work is Irfanview which I really like and I am quite efficient with it. Unfortunately there is no Linux version and for now I have settled with running Irfanview with WINE and it works reasonably well but normally I just go to my Windows computer where it runs much better.
HP Compaq Elite 8300 CMT - Linux Mint 18.2 Sonya - Kernel 4.4.0-53-generic X64 - Cinnamon 3.4.4 - Nemo

User avatar
Pilosopong Tasyo
Level 6
Level 6
Posts: 1418
Joined: Mon Jun 22, 2009 3:26 am
Location: Philippines

Re: Cleaning up and shrinking a PDF file

Postby Pilosopong Tasyo » Sun Jul 16, 2017 9:24 am

AFAIK, a fine-grained control to shrink a PDF file embedded with scanned text (as opposed to typed text) and images is a manual and potentially laborious process. But in the end, you may end up with a smaller and cleaner PDF file suited to your exact specifications.

You can use GIMP to import the PDF file, select the pages you want to retain and edit, open them as separate images instead of layers, and set the preferred output resolution. Higher resolution = more detail retained, which potentially means more disk space consumed, depending if the pages in question are in color, grayscale, or monochrome, and what file format and compression you plan on using to save the results.

Use the available tools in GIMP to make the adjustments (white balance, contrast, color corrections, convert to grayscale/monochrome, et al.). Which tool and what setting entirely depends on how fine-grained you want each page to come out.

Save them as separate files in sequential order (e.g. scan1.jpg, scan2.jpg, scan3.jpg, et al.). As mentioned earlier, certain image characteristics are better suited if saved in another file format, not just JPEG. As always, YMMV. Once you're done, you can use Imagemagick's convert CLI tool (e.g. convert scan*.jpg whatever.pdf) to combine those images and convert them as a unified PDF file.
Image
o Give a man a fish and he will eat for a day. Teach him how to fish and he will eat for a lifetime!
o If an issue has been fixed, please edit your first post and add the word [SOLVED].

GS3
Level 2
Level 2
Posts: 77
Joined: Fri Jan 06, 2017 7:51 am

Re: Cleaning up and shrinking a PDF file

Postby GS3 » Mon Jul 17, 2017 3:22 am

Pilosopong Tasyo wrote:AFAIK, a fine-grained control to shrink a PDF file embedded with scanned text (as opposed to typed text) and images is a manual and potentially laborious process.
To the point that it is almost always better to re-scan it right. Saves time and the result will be better.
HP Compaq Elite 8300 CMT - Linux Mint 18.2 Sonya - Kernel 4.4.0-53-generic X64 - Cinnamon 3.4.4 - Nemo

User avatar
Pilosopong Tasyo
Level 6
Level 6
Posts: 1418
Joined: Mon Jun 22, 2009 3:26 am
Location: Philippines

Re: Cleaning up and shrinking a PDF file

Postby Pilosopong Tasyo » Mon Jul 17, 2017 10:35 pm

GS3 wrote:To the point that it is almost always better to re-scan it right. Saves time and the result will be better.

Yes. However, if you're the recipient of such a PDF file, you'll probably have to clean it up one page at a time.
Image
o Give a man a fish and he will eat for a day. Teach him how to fish and he will eat for a lifetime!
o If an issue has been fixed, please edit your first post and add the word [SOLVED].

User avatar
Spearmint2
Level 14
Level 14
Posts: 5043
Joined: Sat May 04, 2013 1:41 pm
Location: Maryland, USA

Re: Cleaning up and shrinking a PDF file

Postby Spearmint2 » Mon Jul 17, 2017 11:46 pm

What I do is in my printer program, tell it to now print background image.
All things go better with Mint. Mint julep, mint jelly, mint gum, candy mints, pillow mints, peppermint, chocolate mints, spearmint,....

TI58C
Level 3
Level 3
Posts: 119
Joined: Tue Jul 18, 2017 5:57 am

Re: Cleaning up and shrinking a PDF file

Postby TI58C » Wed Jul 19, 2017 6:46 am

Catdaddy,

What isn't clear to me is this: pdf-files can be purely image, but can also contain the text in a separate layer. All depends on how it was scanned. My HP-officejet 8500 can do both.

If your pdf is image-only and of bad quality (dark grey background, black text), I think the GIMP-route explained above is your best option. Since you mention it is hard to read, sadly I think this is the case.

If it is light-grey bg + black text (that is readable / OCR-able), you might try PDFXChangeview. I use it with playonlinux. The free version includes an option to OCR the image-pdf and superimpose the extracted text in a separate layer. Then you can cut-and paste this text to a new pdf-file without the bg.
Yes, the prog is for win only. But I'm not a purist, use whatever is easiest.


If (or once) your pdf has a text-layer, you can also extract the text with pdftotext

For combining several pdf-files and several other things, I use pdftk.

Good luck,
Robert
Last edited by TI58C on Thu Aug 03, 2017 4:19 pm, edited 1 time in total.
Linux is like my late labrador lady-dog: loyal and loving if you treat her lady-like, disbehaving princess if you don't.

User avatar
Spearmint2
Level 14
Level 14
Posts: 5043
Joined: Sat May 04, 2013 1:41 pm
Location: Maryland, USA

Re: Cleaning up and shrinking a PDF file

Postby Spearmint2 » Wed Jul 19, 2017 12:55 pm

Easiest way to determine a text layer is try and hilite it and do a copy/paste of it. If you can't, then it's image text, not actual text.
All things go better with Mint. Mint julep, mint jelly, mint gum, candy mints, pillow mints, peppermint, chocolate mints, spearmint,....


Return to “Newbie Questions”

Who is online

Users browsing this forum: Bing [Bot], Mute Ant and 10 guests