pet project (python 3) help?

About writing shell scripts and making the most of your shell
Forum rules
Topics in this forum are automatically closed 6 months after creation.
Locked
boyo1991

pet project (python 3) help?

Post by boyo1991 »

Hey everyone, I am working on a project, a dumb project, but a project none the less, for the fun of it. I am trying to make a console based web browser. Yes, the code has recursion, don't pay it no mind, its how I prefer it, its for me after all in the end (lol) but here's what I am trying to do next, Im able to get the source code from pages, what I want next is that which takes out HTML tags, if there are line breaks, itll listen to that, just no objects or images. The ultimate goal is to allow for text boxes as input() methods and hyperlinks with that somehow would be absolutely amazing, but I am not too certain that would work, maybe something like showing the url of the hyperlink (which i know would be easily done)

well, here's the small amount of code I have set up.

Code: Select all

import urllib.request

#currently working on a text based browser, no frills, no images
#all in console.
#python 3

def main():
  #request url from user
  urlGo = input('url:')
  #get url loaded
  fp = urllib.request.urlopen(urlGo)
  #read url
  mybytes = fp.read()
 #decode it
  mystr = mybytes.decode("utf8")
 #display it
  print(mystr)
  main()

main()
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 1 time in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
User avatar
xenopeek
Level 25
Level 25
Posts: 29459
Joined: Wed Jul 06, 2011 3:58 am

Re: pet project (python 3) help?

Post by xenopeek »

It seems like a reasonable first approach to just take out HTML tags but other console based web browsers do parse the HTML tags and even CSS to make the page look similar to how it would be displayed in a GUI web browser. Do you know w3m, elinks/links and lynx for example—you could use these as inspiration to see how they are displaying the web pages you're testing with.

You could skip on implementing CSS but at least parse the HTML tags so you display bold text as bold, you put * before each list item and so on. I've not used any of these but Python standard library module HTMLParser or the 3rd party modules PyQuery and Beautiful Soup sound like they might be of use for this. You might find better answers by looking/posting on a Python forum. There look to be many threads about HTML parsing in Python on StackOverflow for example.
Image
MintBean

Re: pet project (python 3) help?

Post by MintBean »

It would be a useful exercise to remove the recursion from your main() and instead put in a proper loop. Loops are a skill you will definitely need to parse HTML.
Locked

Return to “Scripts & Bash”