Python sanitizing html from a string
Question:
Is there a way to escape all quotes and double quotes in a string?
For example if I have a string like this:
Hi my name is 'Shelby"
Is there a way to preprocess this to escape that string?
EDIT:
Maybe that wasn’t the best approach to the problem. So here’s what I’m actually trying to do, I have a tool that analyzes swf
files, (namely swftools
-> swfdump
) But sometimes, some malicious swf
files will contain html
tags, and I’m outputting these results to a page. So is there a way to sanitize these html
tags in python?
Sample of string:
( 3 bytes) action: Push Lookup16:443 ("title_txt")
( 0 bytes) action: GetMember
( 6 bytes) action: Push Lookup16:444 ("htmlText") Lookup16:445 ("Please check your Log In info.")
( 0 bytes) action: SetMember
( 14 bytes) action: Push int:2 int:1 register:1 Lookup:30 ("login_mc")
For the part that says Please check your log info
it’s supposed to say: font color = '#ff0000'
Answers:
If you’re just going for HTML sanitizing, you can try this:
This is probably the easiest approach if you want to add more escape types:
def escape(htmlstring):
escapes = {'"': '"',
''': ''',
'<': '<',
'>': '>'}
# This is done first to prevent escaping other escapes.
htmlstring = htmlstring.replace('&', '&')
for seq, esc in escapes.iteritems():
htmlstring = htmlstring.replace(seq, esc)
return htmlstring
This replaces every instance of &
, '
, "
, <
, and >
with their correct HTML escape codes.
More information on HTML escaping:
Happy Escaping!
I think that the current approach is to use the html module.
import html
html.escape('Hi my name is 'Shelby"')
Out: 'Hi my name is 'Shelby"'
Is there a way to escape all quotes and double quotes in a string?
For example if I have a string like this:
Hi my name is 'Shelby"
Is there a way to preprocess this to escape that string?
EDIT:
Maybe that wasn’t the best approach to the problem. So here’s what I’m actually trying to do, I have a tool that analyzes swf
files, (namely swftools
-> swfdump
) But sometimes, some malicious swf
files will contain html
tags, and I’m outputting these results to a page. So is there a way to sanitize these html
tags in python?
Sample of string:
( 3 bytes) action: Push Lookup16:443 ("title_txt") ( 0 bytes) action: GetMember ( 6 bytes) action: Push Lookup16:444 ("htmlText") Lookup16:445 ("Please check your Log In info.") ( 0 bytes) action: SetMember ( 14 bytes) action: Push int:2 int:1 register:1 Lookup:30 ("login_mc")
For the part that says Please check your log info
it’s supposed to say: font color = '#ff0000'
If you’re just going for HTML sanitizing, you can try this:
This is probably the easiest approach if you want to add more escape types:
def escape(htmlstring):
escapes = {'"': '"',
''': ''',
'<': '<',
'>': '>'}
# This is done first to prevent escaping other escapes.
htmlstring = htmlstring.replace('&', '&')
for seq, esc in escapes.iteritems():
htmlstring = htmlstring.replace(seq, esc)
return htmlstring
This replaces every instance of &
, '
, "
, <
, and >
with their correct HTML escape codes.
More information on HTML escaping:
Happy Escaping!
I think that the current approach is to use the html module.
import html
html.escape('Hi my name is 'Shelby"')
Out: 'Hi my name is 'Shelby"'