File Content Search and Indexing?

Quick to answer questions about finding your way around LMDE as a new user.
Forum rules
There are no such things as "stupid" questions. However if you think your question is a bit stupid, then this is the right place for you to post it. Stick to easy to-the-point questions that you feel people can answer fast. For long and complicated questions use the other forums within the support section.
Before you post read how to get help. Topics in this forum are automatically closed 6 months after creation.
User avatar
limotux
Level 4
Level 4
Posts: 224
Joined: Wed Aug 25, 2010 2:55 pm

File Content Search and Indexing?

Post by limotux »

Hi guys,

I'm finally back after almost 8 years away. I hope I'm welcome.

Just installed, configured LMDE (made a separate partition for data, just in case)

I hope you can guide me more about file content indexing and search. I prefer to search through file manager rather than install a special software.

I'm on Cinnamon, a while ago (on Ubuntu) I found somewhere an option to include file content in indexing>
But in Cinnamon, from the menu the words "search" and "index" give no result.

How can I go about it?

Edit: Honestly I've seen some solutions (downloading software), but was a bit old. Hopefully there is something "built in"
Edit 2: I found:
"Mint devs also add improved search features to ‘Nemo’, the Cinnamon file manager. As well as searching file names the file manager is also able to search file contents for matches. It supports regular expressions and recursive folder searches, and can be configured to show favourited files first in results."

How to do that?
here https://www.omgubuntu.co.uk/2021/07/lin ... -whats-new
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 1 time in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
Moonstone Man
Level 16
Level 16
Posts: 6054
Joined: Mon Aug 27, 2012 10:17 pm

Re: File Content Search and Indexing?

Post by Moonstone Man »

limotux wrote: Fri Jul 16, 2021 12:22 pm "Mint devs also add improved search features to ‘Nemo’, the Cinnamon file manager. As well as searching file names the file manager is also able to search file contents for matches. It supports regular expressions and recursive folder searches, and can be configured to show favourited files first in results."

How to do that?
The linked article tells you no less than 12 times, "Linux Mint 20.2".
User avatar
limotux
Level 4
Level 4
Posts: 224
Joined: Wed Aug 25, 2010 2:55 pm

Re: File Content Search and Indexing?

Post by limotux »

Thanks pal
I know it mentions.

I just thought it is added already to nemo no matter which distro or version.
Thanks.

Just to report back
I tried Nautilus, and it does the job.

Thank you.
Last edited by limotux on Sat Jul 24, 2021 9:57 am, edited 1 time in total.
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
User avatar
axisofevil
Level 4
Level 4
Posts: 382
Joined: Mon Nov 14, 2011 12:22 pm

Re: File Content Search and Indexing?

Post by axisofevil »

Updates will go into LMDE4 "shortly".
User avatar
limotux
Level 4
Level 4
Posts: 224
Joined: Wed Aug 25, 2010 2:55 pm

Re: File Content Search and Indexing?

Post by limotux »

Wow… great news
I reinstalled the whole thing after I messed my previous installation
I was just curious playing around installing and removing lots of things!

Still, searching for files in file manager not ok (cant search text inside pdf, can only search “from her” not “everywhere” in dolphin) though I installed recoll and it’s working fine

Hopefully the update can make me search inside all files from file manager and search “everywhere”

Thanks all
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
User avatar
axisofevil
Level 4
Level 4
Posts: 382
Joined: Mon Nov 14, 2011 12:22 pm

Re: File Content Search and Indexing?

Post by axisofevil »

There are various programs which extract text from non-text files - Nemo 5.0 uses these to do text searches.
Even the LMDE4 xreader can search for text in PDF's - but its not integrated in any way.
User avatar
axisofevil
Level 4
Level 4
Posts: 382
Joined: Mon Nov 14, 2011 12:22 pm

Re: File Content Search and Indexing?

Post by axisofevil »

Breaking news!

Cinnamon 5.0.5 - including most of those Mint 20.2 changes are available for update now.
However, the feature allowing the renaming of multiple files was omitted (a progam called bulky)

clefebvre commented 7 minutes ago:
This is planned and not directly related to nemo. Bulky will be made available this week in LMDE though you will need to install it yourself.
User avatar
limotux
Level 4
Level 4
Posts: 224
Joined: Wed Aug 25, 2010 2:55 pm

Re: File Content Search and Indexing?

Post by limotux »

Wow… just to be sure I got got it right
After I update my LMDE I will have full text search:
- in pdf (mainly), in zip or rar?
- search “everywhere”?
- available in nemo so I can uninstall dolphin file manager?

Thats a dream come true as I don’t want KDE as in my latest experience (Kubuntu) it was so unresponsive and… well… prefer cinnamon
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
User avatar
axisofevil
Level 4
Level 4
Posts: 382
Joined: Mon Nov 14, 2011 12:22 pm

Re: File Content Search and Indexing?

Post by axisofevil »

Doesn't include EPUB - the format is too variable/complex.

I don't know the full list - but...
Look at the files in /usr/share/nemo/search-helpers/
Each one contains a program to run and a list of relevant mime-types.
It is possible to specify user written ones in ~/.local/share/nemo/search-helpers/

My current list reads:-

Code: Select all

exif.nemo_search_helper
id3.nemo_search_helper
libreoffice.nemo_search_helper
mso-doc.nemo_search_helper
mso.nemo_search_helper
mso-ppt.nemo_search_helper
mso-xls.nemo_search_helper
pdf2txt.nemo_search_helper
pdftotext.nemo_search_helper
ps2ascii.nemo_search_helper
untex.nemo_search_helper
Note that there is a priority setting which influences the choice of search_helper [there are two choices for PDF -> text].
User avatar
limotux
Level 4
Level 4
Posts: 224
Joined: Wed Aug 25, 2010 2:55 pm

Re: File Content Search and Indexing?

Post by limotux »

Great update

I just installed and updated
Nemo seems great in searching for a word in a pdf file.
I just noticed it is a bit slow, maybe needs time for indexing.

I'll leave it for a while and report back.

Edit: I double checked all helpers you mentioned, I have the same.
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
User avatar
limotux
Level 4
Level 4
Posts: 224
Joined: Wed Aug 25, 2010 2:55 pm

Re: File Content Search and Indexing?

Post by limotux »

I'll appreciate any help.

I left my laptop overnight. I have only like 10 pdf files. Nemo search for a word inside files brings the results after long time as if it is not indexed.
As I mentioned I have double checked the helper files are there.
I have Recoll installed and it brings results instantly.
What can be done to get nemo to bring results instant as Recoll?
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
User avatar
limotux
Level 4
Level 4
Posts: 224
Joined: Wed Aug 25, 2010 2:55 pm

Re: File Content Search and Indexing?

Post by limotux »

limotux wrote: Sat Jul 24, 2021 1:43 am I'll appreciate any help.

I left my laptop overnight. I have only like 10 pdf files. Nemo search for a word inside files brings the results after long time as if it is not indexed.
As I mentioned I have double checked the helper files are there.
I have Recoll installed and it brings results instantly.
What can be done to get nemo to bring results instant as Recoll?
Any help?!
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
User avatar
axisofevil
Level 4
Level 4
Posts: 382
Joined: Mon Nov 14, 2011 12:22 pm

Re: File Content Search and Indexing?

Post by axisofevil »

The default is to search for text strings or by using regular expressions; by default recursively.

Indexing gains its speed at the expense of many read/writes to disk which is great if you do lots of searching by key word.
User avatar
limotux
Level 4
Level 4
Posts: 224
Joined: Wed Aug 25, 2010 2:55 pm

Re: File Content Search and Indexing?

Post by limotux »

axisofevil wrote: Sun Jul 25, 2021 7:34 am The default is to search for text strings or by using regular expressions; by default recursively.

Indexing gains its speed at the expense of many read/writes to disk which is great if you do lots of searching by key word.
Welcome back axisofevil

It’s been 2 days and about 10 pdf files. Still it seems it is not indexing! I tried search for one word only!

What might be wrong? Why it is not working?
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
User avatar
axisofevil
Level 4
Level 4
Posts: 382
Joined: Mon Nov 14, 2011 12:22 pm

Re: File Content Search and Indexing?

Post by axisofevil »

There is NO indexing.
If you have 10 PDF files in a directory without too many sub-directory levels or you switch off the recursive search, then it will be very fast.

<edit> I tried it on a 1.6GB directory, with 200 large PDF files, searching for '20.04', got 24 hits - took about 5 minutes!
It seems nemo uses a background thread, so this limits the search to using a single core.
</edit>

I tried it on a smallish PDF directory and it was virtually instant.
If you hover your mouse over a result, it will show the first few lines of the contents...

If you select a result and press the space bar, it will show the first page as a preview, including images.
To get rid of the preview, press the space bar again.
( Actually, this function works under all circumstances, for most file types).

Nemo needs documentation.
User avatar
limotux
Level 4
Level 4
Posts: 224
Joined: Wed Aug 25, 2010 2:55 pm

Re: File Content Search and Indexing?

Post by limotux »

Sorry, excuse me for my ignorance. I've been away from linux for long time.
"There is NO indexing" how is that? how search inside the file is done? Seems a lot has changed in the 10 years I've been away! :roll: viewtopic.php?f=180&t=353724
Not too much files or sub directories. (Edit: thats for now, I have tons of pdfs and documents, will copy them later when I get everything working)
So it seems the "recursive" search (though I don't really understand it, for me a search is a search)

How to switch off "recursive" search?
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
User avatar
limotux
Level 4
Level 4
Posts: 224
Joined: Wed Aug 25, 2010 2:55 pm

Re: File Content Search and Indexing?

Post by limotux »

I believe I found it! The button to the right of the file name field. Right?

Still don’t understand the search with no indexing. Still trying to learn!
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
User avatar
axisofevil
Level 4
Level 4
Posts: 382
Joined: Mon Nov 14, 2011 12:22 pm

Re: File Content Search and Indexing?

Post by axisofevil »

Search inside a file is done by nemo running a program to "convert" each file to text, on the fly and this text is searched by the usual methods.
Note that some conversions don't actually extract much text e.g. image files.

Indexing would only be of use if you either stored a text version of all files, or more feasibly just stored keywords (and just searched by keyword).

Indeed, there is a index built of filenames (daily), which allows you to find files by partial or full name (locate command).
My index runs to about 13MB. That's just for the full pathname to my files. Not the content of the files.

Recursive searching is where you search a directories files + any sub-directories files + any sub-sub-directories... etc.
If you have lots of very large PDF's then you must expect it to be slow.
User avatar
limotux
Level 4
Level 4
Posts: 224
Joined: Wed Aug 25, 2010 2:55 pm

Re: File Content Search and Indexing?

Post by limotux »

I see.
Just to be sure I understand allow me to say what I got in my own words:
- every single time I run a search nemo "converts" every pdf file to searchable text then search this extracted text for the word I'm looking for.
If I got it correct, I don't understand why not index the old way, like what recoll is doing now.

Whats the point in doing it this way?

Doesn't tracker do indexing of file content? Is there -if it does- a frontend for tracker?

Please correct me if I'm wrong.
Thank you.
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
User avatar
axisofevil
Level 4
Level 4
Posts: 382
Joined: Mon Nov 14, 2011 12:22 pm

Re: File Content Search and Indexing?

Post by axisofevil »

How can you build an index of all text in a file?
Surely the index would be the same size (or bigger) as storing the whole file?

Basically, with unstructured data you can't identify key fields.
Locked

Return to “Beginner Questions”