Python sanitizing html from a string

Question:

Is there a way to escape all quotes and double quotes in a string?

For example if I have a string like this:

Hi my name is 'Shelby"

Is there a way to preprocess this to escape that string?

EDIT:

Maybe that wasn’t the best approach to the problem. So here’s what I’m actually trying to do, I have a tool that analyzes swf files, (namely swftools -> swfdump) But sometimes, some malicious swf files will contain html tags, and I’m outputting these results to a page. So is there a way to sanitize these html tags in python?

Sample of string:

 (    3 bytes) action: Push Lookup16:443 ("title_txt")
 (    0 bytes) action: GetMember
 (    6 bytes) action: Push Lookup16:444 ("htmlText") Lookup16:445 ("Please check your Log In info.")
 (    0 bytes) action: SetMember
 (   14 bytes) action: Push int:2 int:1 register:1 Lookup:30 ("login_mc")

For the part that says Please check your log info it’s supposed to say: font color = '#ff0000'

Asked By: Stupid.Fat.Cat

||

Answers:

If you’re just going for HTML sanitizing, you can try this:

This is probably the easiest approach if you want to add more escape types:

def escape(htmlstring):
    escapes = {'"': '"',
               ''': ''',
               '<': '&lt;',
               '>': '&gt;'}
    # This is done first to prevent escaping other escapes.
    htmlstring = htmlstring.replace('&', '&amp;')
    for seq, esc in escapes.iteritems():
        htmlstring = htmlstring.replace(seq, esc)
    return htmlstring

This replaces every instance of &, ', ", <, and > with their correct HTML escape codes.

More information on HTML escaping:

Wikipedia HTML Page

Every Escape imaginable

Happy Escaping!

Answered By: Alyssa Haroldsen

If you use a templating like Jinja or Genshi, they will do that for you already. All text which is embedded into the page will be properly escaped unless you explicitly tell it not to. When building web-pages, it might anyway be a good idea to use a templating-engine.

Answered By: exhuma

I think that the current approach is to use the html module.

import html
html.escape('Hi my name is 'Shelby"')
Out: 'Hi my name is &#x27;Shelby&quot;'
Answered By: G M
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.