pet project (python 3) help?

Forum rules
Before you post please read how to get help
Post Reply
User avatar
boyo1991
Level 1
Level 1
Posts: 21
Joined: Mon Jul 30, 2018 7:24 am

pet project (python 3) help?

Post by boyo1991 » Mon Jul 30, 2018 12:01 pm

Hey everyone, I am working on a project, a dumb project, but a project none the less, for the fun of it. I am trying to make a console based web browser. Yes, the code has recursion, don't pay it no mind, its how I prefer it, its for me after all in the end (lol) but here's what I am trying to do next, Im able to get the source code from pages, what I want next is that which takes out HTML tags, if there are line breaks, itll listen to that, just no objects or images. The ultimate goal is to allow for text boxes as input() methods and hyperlinks with that somehow would be absolutely amazing, but I am not too certain that would work, maybe something like showing the url of the hyperlink (which i know would be easily done)

well, here's the small amount of code I have set up.

Code: Select all

import urllib.request

#currently working on a text based browser, no frills, no images
#all in console.
#python 3

def main():
  #request url from user
  urlGo = input('url:')
  #get url loaded
  fp = urllib.request.urlopen(urlGo)
  #read url
  mybytes = fp.read()
 #decode it
  mystr = mybytes.decode("utf8")
 #display it
  print(mystr)
  main()

main()

User avatar
xenopeek
Level 24
Level 24
Posts: 23193
Joined: Wed Jul 06, 2011 3:58 am
Location: The Netherlands

Re: pet project (python 3) help?

Post by xenopeek » Mon Jul 30, 2018 4:30 pm

It seems like a reasonable first approach to just take out HTML tags but other console based web browsers do parse the HTML tags and even CSS to make the page look similar to how it would be displayed in a GUI web browser. Do you know w3m, elinks/links and lynx for example—you could use these as inspiration to see how they are displaying the web pages you're testing with.

You could skip on implementing CSS but at least parse the HTML tags so you display bold text as bold, you put * before each list item and so on. I've not used any of these but Python standard library module HTMLParser or the 3rd party modules PyQuery and Beautiful Soup sound like they might be of use for this. You might find better answers by looking/posting on a Python forum. There look to be many threads about HTML parsing in Python on StackOverflow for example.
Image

User avatar
MintBean
Level 9
Level 9
Posts: 2968
Joined: Fri Aug 07, 2015 6:54 am
Location: Blighty

Re: pet project (python 3) help?

Post by MintBean » Mon Jul 30, 2018 5:07 pm

It would be a useful exercise to remove the recursion from your main() and instead put in a proper loop. Loops are a skill you will definitely need to parse HTML.

Post Reply

Return to “Scripts & Bash”