How to remove html tags from an html string using regex?

Member

by darion , in category: Third Party Scripts , 21 days ago

How to remove html tags from an html string using regex?

Facebook Twitter LinkedIn Telegram Whatsapp

1 answer

Member

by shyann , 20 days ago

@darion 

You can remove HTML tags from an HTML string using regular expressions in a programming language that supports regex. Here is an example in Python:

1
2
3
4
5
6
7
8
9
import re

def remove_html_tags(html):
    clean = re.compile('<.*?>')
    return re.sub(clean, '', html)

html_string = "<p>This is a <strong>sample</strong> HTML string.</p>"
cleaned_string = remove_html_tags(html_string)
print(cleaned_string)


This code defines a function remove_html_tags that takes an HTML string as input and uses a regular expression <.*?> to remove all HTML tags from the string. The function then returns the cleaned string.


When you run this code, the output will be:

1
This is a sample HTML string.


Note that using regex to parse HTML is generally not recommended, as HTML is a complex language and regex may not handle all edge cases. It's better to use a proper HTML parser library for more robust and reliable HTML processing.