File Content Search and Indexing?
Forum rules
There are no such things as "stupid" questions. However if you think your question is a bit stupid, then this is the right place for you to post it. Stick to easy to-the-point questions that you feel people can answer fast. For long and complicated questions use the other forums within the support section.
Before you post read how to get help. Topics in this forum are automatically closed 6 months after creation.
There are no such things as "stupid" questions. However if you think your question is a bit stupid, then this is the right place for you to post it. Stick to easy to-the-point questions that you feel people can answer fast. For long and complicated questions use the other forums within the support section.
Before you post read how to get help. Topics in this forum are automatically closed 6 months after creation.
File Content Search and Indexing?
Hi guys,
I'm finally back after almost 8 years away. I hope I'm welcome.
Just installed, configured LMDE (made a separate partition for data, just in case)
I hope you can guide me more about file content indexing and search. I prefer to search through file manager rather than install a special software.
I'm on Cinnamon, a while ago (on Ubuntu) I found somewhere an option to include file content in indexing>
But in Cinnamon, from the menu the words "search" and "index" give no result.
How can I go about it?
Edit: Honestly I've seen some solutions (downloading software), but was a bit old. Hopefully there is something "built in"
Edit 2: I found:
"Mint devs also add improved search features to ‘Nemo’, the Cinnamon file manager. As well as searching file names the file manager is also able to search file contents for matches. It supports regular expressions and recursive folder searches, and can be configured to show favourited files first in results."
How to do that?
here https://www.omgubuntu.co.uk/2021/07/lin ... -whats-new
I'm finally back after almost 8 years away. I hope I'm welcome.
Just installed, configured LMDE (made a separate partition for data, just in case)
I hope you can guide me more about file content indexing and search. I prefer to search through file manager rather than install a special software.
I'm on Cinnamon, a while ago (on Ubuntu) I found somewhere an option to include file content in indexing>
But in Cinnamon, from the menu the words "search" and "index" give no result.
How can I go about it?
Edit: Honestly I've seen some solutions (downloading software), but was a bit old. Hopefully there is something "built in"
Edit 2: I found:
"Mint devs also add improved search features to ‘Nemo’, the Cinnamon file manager. As well as searching file names the file manager is also able to search file contents for matches. It supports regular expressions and recursive folder searches, and can be configured to show favourited files first in results."
How to do that?
here https://www.omgubuntu.co.uk/2021/07/lin ... -whats-new
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 1 time in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
-
- Level 16
- Posts: 6054
- Joined: Mon Aug 27, 2012 10:17 pm
Re: File Content Search and Indexing?
The linked article tells you no less than 12 times, "Linux Mint 20.2".limotux wrote: ⤴Fri Jul 16, 2021 12:22 pm "Mint devs also add improved search features to ‘Nemo’, the Cinnamon file manager. As well as searching file names the file manager is also able to search file contents for matches. It supports regular expressions and recursive folder searches, and can be configured to show favourited files first in results."
How to do that?
Re: File Content Search and Indexing?
Thanks pal
I know it mentions.
I just thought it is added already to nemo no matter which distro or version.
Thanks.
Just to report back
I tried Nautilus, and it does the job.
Thank you.
I know it mentions.
I just thought it is added already to nemo no matter which distro or version.
Thanks.
Just to report back
I tried Nautilus, and it does the job.
Thank you.
Last edited by limotux on Sat Jul 24, 2021 9:57 am, edited 1 time in total.
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
- axisofevil
- Level 4
- Posts: 388
- Joined: Mon Nov 14, 2011 12:22 pm
Re: File Content Search and Indexing?
Updates will go into LMDE4 "shortly".
Re: File Content Search and Indexing?
Wow… great news
I reinstalled the whole thing after I messed my previous installation
I was just curious playing around installing and removing lots of things!
Still, searching for files in file manager not ok (cant search text inside pdf, can only search “from her” not “everywhere” in dolphin) though I installed recoll and it’s working fine
Hopefully the update can make me search inside all files from file manager and search “everywhere”
Thanks all
I reinstalled the whole thing after I messed my previous installation
I was just curious playing around installing and removing lots of things!
Still, searching for files in file manager not ok (cant search text inside pdf, can only search “from her” not “everywhere” in dolphin) though I installed recoll and it’s working fine
Hopefully the update can make me search inside all files from file manager and search “everywhere”
Thanks all
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
- axisofevil
- Level 4
- Posts: 388
- Joined: Mon Nov 14, 2011 12:22 pm
Re: File Content Search and Indexing?
There are various programs which extract text from non-text files - Nemo 5.0 uses these to do text searches.
Even the LMDE4
Even the LMDE4
xreader
can search for text in PDF's - but its not integrated in any way.- axisofevil
- Level 4
- Posts: 388
- Joined: Mon Nov 14, 2011 12:22 pm
Re: File Content Search and Indexing?
Breaking news!
Cinnamon 5.0.5 - including most of those Mint 20.2 changes are available for update now.
However, the feature allowing the renaming of multiple files was omitted (a progam called
clefebvre commented 7 minutes ago:
Cinnamon 5.0.5 - including most of those Mint 20.2 changes are available for update now.
However, the feature allowing the renaming of multiple files was omitted (a progam called
bulky
)clefebvre commented 7 minutes ago:
This is planned and not directly related to nemo. Bulky will be made available this week in LMDE though you will need to install it yourself.
Re: File Content Search and Indexing?
Wow… just to be sure I got got it right
After I update my LMDE I will have full text search:
- in pdf (mainly), in zip or rar?
- search “everywhere”?
- available in nemo so I can uninstall dolphin file manager?
Thats a dream come true as I don’t want KDE as in my latest experience (Kubuntu) it was so unresponsive and… well… prefer cinnamon
After I update my LMDE I will have full text search:
- in pdf (mainly), in zip or rar?
- search “everywhere”?
- available in nemo so I can uninstall dolphin file manager?
Thats a dream come true as I don’t want KDE as in my latest experience (Kubuntu) it was so unresponsive and… well… prefer cinnamon
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
- axisofevil
- Level 4
- Posts: 388
- Joined: Mon Nov 14, 2011 12:22 pm
Re: File Content Search and Indexing?
Doesn't include EPUB - the format is too variable/complex.
I don't know the full list - but...
Look at the files in
Each one contains a program to run and a list of relevant mime-types.
It is possible to specify user written ones in
My current list reads:-
Note that there is a priority setting which influences the choice of search_helper [there are two choices for PDF -> text].
I don't know the full list - but...
Look at the files in
/usr/share/nemo/search-helpers/
Each one contains a program to run and a list of relevant mime-types.
It is possible to specify user written ones in
~/.local/share/nemo/search-helpers/
My current list reads:-
Code: Select all
exif.nemo_search_helper
id3.nemo_search_helper
libreoffice.nemo_search_helper
mso-doc.nemo_search_helper
mso.nemo_search_helper
mso-ppt.nemo_search_helper
mso-xls.nemo_search_helper
pdf2txt.nemo_search_helper
pdftotext.nemo_search_helper
ps2ascii.nemo_search_helper
untex.nemo_search_helper
Re: File Content Search and Indexing?
Great update
I just installed and updated
Nemo seems great in searching for a word in a pdf file.
I just noticed it is a bit slow, maybe needs time for indexing.
I'll leave it for a while and report back.
Edit: I double checked all helpers you mentioned, I have the same.
I just installed and updated
Nemo seems great in searching for a word in a pdf file.
I just noticed it is a bit slow, maybe needs time for indexing.
I'll leave it for a while and report back.
Edit: I double checked all helpers you mentioned, I have the same.
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
Re: File Content Search and Indexing?
I'll appreciate any help.
I left my laptop overnight. I have only like 10 pdf files. Nemo search for a word inside files brings the results after long time as if it is not indexed.
As I mentioned I have double checked the helper files are there.
I have Recoll installed and it brings results instantly.
What can be done to get nemo to bring results instant as Recoll?
I left my laptop overnight. I have only like 10 pdf files. Nemo search for a word inside files brings the results after long time as if it is not indexed.
As I mentioned I have double checked the helper files are there.
I have Recoll installed and it brings results instantly.
What can be done to get nemo to bring results instant as Recoll?
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
Re: File Content Search and Indexing?
Any help?!limotux wrote: ⤴Sat Jul 24, 2021 1:43 am I'll appreciate any help.
I left my laptop overnight. I have only like 10 pdf files. Nemo search for a word inside files brings the results after long time as if it is not indexed.
As I mentioned I have double checked the helper files are there.
I have Recoll installed and it brings results instantly.
What can be done to get nemo to bring results instant as Recoll?
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
- axisofevil
- Level 4
- Posts: 388
- Joined: Mon Nov 14, 2011 12:22 pm
Re: File Content Search and Indexing?
The default is to search for text strings or by using regular expressions; by default recursively.
Indexing gains its speed at the expense of many read/writes to disk which is great if you do lots of searching by key word.
Indexing gains its speed at the expense of many read/writes to disk which is great if you do lots of searching by key word.
Re: File Content Search and Indexing?
Welcome back axisofevilaxisofevil wrote: ⤴Sun Jul 25, 2021 7:34 am The default is to search for text strings or by using regular expressions; by default recursively.
Indexing gains its speed at the expense of many read/writes to disk which is great if you do lots of searching by key word.
It’s been 2 days and about 10 pdf files. Still it seems it is not indexing! I tried search for one word only!
What might be wrong? Why it is not working?
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
- axisofevil
- Level 4
- Posts: 388
- Joined: Mon Nov 14, 2011 12:22 pm
Re: File Content Search and Indexing?
There is NO indexing.
If you have 10 PDF files in a directory without too many sub-directory levels or you switch off the recursive search, then it will be very fast.
<edit> I tried it on a 1.6GB directory, with 200 large PDF files, searching for '20.04', got 24 hits - took about 5 minutes!
It seems nemo uses a background thread, so this limits the search to using a single core.
</edit>
I tried it on a smallish PDF directory and it was virtually instant.
If you hover your mouse over a result, it will show the first few lines of the contents...
If you select a result and press the space bar, it will show the first page as a preview, including images.
To get rid of the preview, press the space bar again.
( Actually, this function works under all circumstances, for most file types).
Nemo needs documentation.
If you have 10 PDF files in a directory without too many sub-directory levels or you switch off the recursive search, then it will be very fast.
<edit> I tried it on a 1.6GB directory, with 200 large PDF files, searching for '20.04', got 24 hits - took about 5 minutes!
It seems nemo uses a background thread, so this limits the search to using a single core.
</edit>
I tried it on a smallish PDF directory and it was virtually instant.
If you hover your mouse over a result, it will show the first few lines of the contents...
If you select a result and press the space bar, it will show the first page as a preview, including images.
To get rid of the preview, press the space bar again.
( Actually, this function works under all circumstances, for most file types).
Nemo needs documentation.
Re: File Content Search and Indexing?
Sorry, excuse me for my ignorance. I've been away from linux for long time.
"There is NO indexing" how is that? how search inside the file is done? Seems a lot has changed in the 10 years I've been away! viewtopic.php?f=180&t=353724
Not too much files or sub directories. (Edit: thats for now, I have tons of pdfs and documents, will copy them later when I get everything working)
So it seems the "recursive" search (though I don't really understand it, for me a search is a search)
How to switch off "recursive" search?
"There is NO indexing" how is that? how search inside the file is done? Seems a lot has changed in the 10 years I've been away! viewtopic.php?f=180&t=353724
Not too much files or sub directories. (Edit: thats for now, I have tons of pdfs and documents, will copy them later when I get everything working)
So it seems the "recursive" search (though I don't really understand it, for me a search is a search)
How to switch off "recursive" search?
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
Re: File Content Search and Indexing?
I believe I found it! The button to the right of the file name field. Right?
Still don’t understand the search with no indexing. Still trying to learn!
Still don’t understand the search with no indexing. Still trying to learn!
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
- axisofevil
- Level 4
- Posts: 388
- Joined: Mon Nov 14, 2011 12:22 pm
Re: File Content Search and Indexing?
Search inside a file is done by nemo running a program to "convert" each file to text, on the fly and this text is searched by the usual methods.
Note that some conversions don't actually extract much text e.g. image files.
Indexing would only be of use if you either stored a text version of all files, or more feasibly just stored keywords (and just searched by keyword).
Indeed, there is a index built of filenames (daily), which allows you to find files by partial or full name (locate command).
My index runs to about 13MB. That's just for the full pathname to my files. Not the content of the files.
Recursive searching is where you search a directories files + any sub-directories files + any sub-sub-directories... etc.
If you have lots of very large PDF's then you must expect it to be slow.
Note that some conversions don't actually extract much text e.g. image files.
Indexing would only be of use if you either stored a text version of all files, or more feasibly just stored keywords (and just searched by keyword).
Indeed, there is a index built of filenames (daily), which allows you to find files by partial or full name (locate command).
My index runs to about 13MB. That's just for the full pathname to my files. Not the content of the files.
Recursive searching is where you search a directories files + any sub-directories files + any sub-sub-directories... etc.
If you have lots of very large PDF's then you must expect it to be slow.
Re: File Content Search and Indexing?
I see.
Just to be sure I understand allow me to say what I got in my own words:
- every single time I run a search nemo "converts" every pdf file to searchable text then search this extracted text for the word I'm looking for.
If I got it correct, I don't understand why not index the old way, like what recoll is doing now.
Whats the point in doing it this way?
Doesn't tracker do indexing of file content? Is there -if it does- a frontend for tracker?
Please correct me if I'm wrong.
Thank you.
Just to be sure I understand allow me to say what I got in my own words:
- every single time I run a search nemo "converts" every pdf file to searchable text then search this extracted text for the word I'm looking for.
If I got it correct, I don't understand why not index the old way, like what recoll is doing now.
Whats the point in doing it this way?
Doesn't tracker do indexing of file content? Is there -if it does- a frontend for tracker?
Please correct me if I'm wrong.
Thank you.
Lenovo G580
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
Desktop: Cinnamon 4.8.6 - LMDE 4 Debbie
In Love With Linux
- axisofevil
- Level 4
- Posts: 388
- Joined: Mon Nov 14, 2011 12:22 pm
Re: File Content Search and Indexing?
How can you build an index of all text in a file?
Surely the index would be the same size (or bigger) as storing the whole file?
Basically, with unstructured data you can't identify key fields.
Surely the index would be the same size (or bigger) as storing the whole file?
Basically, with unstructured data you can't identify key fields.