WGET copies unreadable lines

Questions about other topics - please check if your question fits better in another category before posting here
Forum rules
Before you post please read how to get help
Post Reply
JosVa
Level 1
Level 1
Posts: 10
Joined: Wed Jan 24, 2018 4:36 am
Location: Netherlands

WGET copies unreadable lines

Post by JosVa » Sun Jan 28, 2018 8:59 am

I have access to a solar power converter by secured WIFI using http://160.190.0.1/home.
I can copy/paste its content from Firefox by mouse click to home.html + home_bestanden folder. I can read its content, see time + Watt + KW amounts and copy/paste these into Calc to create graphs.
Using WGET to do this only stores home.html without any folder and useless content. The reason I want to use WGET is to perform this action within an half hour interval and pull out this data by a program automatically.
When I am using WGET on randomly choosen Internet sites it works just like mouse click copy/paste and produces frozen websites like by mouse click.
Any idea where it goes wrong?

These are the first lines I get from WGET where in the whole file only - home.html - is recognizable:
\8BZX\00home.html\00\ED\EDn\DB8\F2\B7\B8w`\D8b-#\F5G\92\A6\EDƖ\81n\A0A\AFM\B1\ED]\EF\AE\B2D\D9L$R\A5(\BB\DE /r\CFr\EFt\AFp3\A4$˲\9C\B4W,\D0b E\CEg\863CR\A3\BD\97\E7/>\FC\E3\DD

These are the first lines I get from copy/paste Firefox:
<!DOCTYPE html>
<html><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<title>Home</title>
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width">
<!--meta name="MobileOptimized" content="320"-->
<link href="Home_bestanden/layout.css" rel="stylesheet" type="text/css">

And the content of the href folder - home_bestanden - is:
dropdown.js zepto.js layout.css pic.png

User avatar
xenopeek
Level 24
Level 24
Posts: 22882
Joined: Wed Jul 06, 2011 3:58 am
Location: The Netherlands

Re: WGET copies unreadable lines

Post by xenopeek » Sun Jan 28, 2018 1:51 pm

Try using curl instead.

curl http://160.190.0.1/home -o result.html
Image

JosVa
Level 1
Level 1
Posts: 10
Joined: Wed Jan 24, 2018 4:36 am
Location: Netherlands

Re: WGET copies unreadable lines

Post by JosVa » Tue Jan 30, 2018 11:02 am

Will give it a try with cURL but think there is something else going on too. WGET --debug shows up with a line containing GZIP format. This would explain why the WGET outpufile besides the words home and html shows no readable info. Found in the Internet more examples and questions how to deal with that,

JosVa
Level 1
Level 1
Posts: 10
Joined: Wed Jan 24, 2018 4:36 am
Location: Netherlands

Re: WGET copies unreadable lines

Post by JosVa » Thu Feb 01, 2018 6:29 am

Got more info but ...... :(
cURL does not work either. For both WGET and cURL I find pages of settings but until now nothing does the job ...... beside one: WGET --debug.
There is one line that tells me that the website does something with gzip. Checked the output file that is named index.html with several un-zippers and found that 7-ZIP was able to open that index.html-file. Now it shows a zipped file called home.html that I can open and extract to get a file with html lines that I can read. Looks almost the same like the copy/paste version at mouseclick from Firefox.
But the next problem to work on is to get the required data too as only an uncolored and empty webpage is found without the Device Information nor the amount of KWatt produced.

User avatar
xenopeek
Level 24
Level 24
Posts: 22882
Joined: Wed Jul 06, 2011 3:58 am
Location: The Netherlands

Re: WGET copies unreadable lines

Post by xenopeek » Thu Feb 01, 2018 10:14 am

Try adding a user agent, like:
curl -A 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0' http://160.190.0.1/home -o result.html
Image

JosVa
Level 1
Level 1
Posts: 10
Joined: Wed Jan 24, 2018 4:36 am
Location: Netherlands

Re: WGET copies unreadable lines

Post by JosVa » Thu Feb 01, 2018 12:16 pm

First of all: thank you very, very much for helping me.
Run your cUTL command but the zipped output still shows up with only the frame as the folder is not saved too like with copy/pased-mouseclick from Firefox. Is that perhaps the reason no data is seen as it is stored in that folder in Java-formatt? But on the other hand: I can tell Firefox to store the site in TXT-format only and still see the data in the html lines.
Below is the output with data. Without it after <tr style=""> it shows up with <td></td><td></td><td></td><td class=""></td>
>>>>>>>>>>>>>>>
<td width="150"><span data-locale="sn">SN.</span></td>
<td width="120"><span data-locale="pacw">Pac(W)</span></td>
<td width="150"><span data-locale="etoday">E_Today(KWh)</span></td>
<td width="130"><span data-locale="status">Status</span></td>
<tr style="">
<td>BD36806011760040</td>
<td>158</td>
<td>0.70</td>
<td class="ok"></td>
etc.....
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Unfortunately the Chines developers of the Zeversolar convertor do not like us to have direct access to the generated data. They prefer me to link to them far away, send all data and then to see what happens over here.

Think I will have to find me a webscrapper that I can instruct to run every half hour saving the Firefox view in timestamped files to extract the data from there if necessary by hand. Using Wget or cURL is a bit more relaxed for extracting data and storing it in sequential lines immediately after downloading. And it is for free.......

JosVa
Level 1
Level 1
Posts: 10
Joined: Wed Jan 24, 2018 4:36 am
Location: Netherlands

Re: WGET copies unreadable lines

Post by JosVa » Thu Feb 15, 2018 5:32 am

In the end discovered that perhaps WGET and CURL could be able to activate the Java scripts but how this is done remains a secret. Found HTTRACK that does the trick without complaining and even thanking me for using it.
The command >>httrack 160.190.0.1<< showed after 5 seconds all I wanted to see in a mirror folder on my disk where a large amount of files with all details for further analysis is stored too. Even found that Java data is stored in a single line in home-2.html :D . Ones running in a loop and adding that line to a textfile now telling Calc to use only 2 values from it shows a power conversion graph within a few mouse clicks after a whole day of automated logging.

User avatar
Termy
Level 5
Level 5
Posts: 762
Joined: Mon Sep 04, 2017 8:49 pm
Location: UK
Contact:

Re: WGET copies unreadable lines

Post by Termy » Sat Feb 17, 2018 10:39 am

Are you using a version of wget at or newer than 1.19.2? Because they annoyingly (!!!) made it so wget by default requests compressed content, and guess what, it seems to be via gzip. Look in the man page for wget, as there's a flag you can use to tell the server not to offer compressed content; can't remember which one it is, I'm afraid.
Here to help.

I'm LearnLinux (LL) on YouTube: https://www.youtube.com/channel/UCfp-lN ... naEE6NtDSg
I'm also terminalforlife (TFL) on GitHub: https://github.com/terminalforlife

JosVa
Level 1
Level 1
Posts: 10
Joined: Wed Jan 24, 2018 4:36 am
Location: Netherlands

Re: WGET copies unreadable lines

Post by JosVa » Fri Feb 23, 2018 4:45 am

Termy thanks, but I switched definitely over to Httrack that is default supplying me with an overload of information. Unzipping the Wget content is no issue as 7zip does the trick in the background after renaming the *.html to *.7z and saving its content as *.html. But Wget doesn't give me the data I seek as the Java script to add it is not activated. I only get an empty and colorless home page where Httrack gives me the web page like Firefox shows. And by the way: the same for CURL that also doesn't start the Java script. And by the way: downloading Httrack you also get an GUI version that is called WINhttrack that can be used by copy/paste on mouse click.

Post Reply

Return to “Other topics”