Call: (480) 382-8464

Perl Remove Unwanted HTML

In case you want to remove the extra email such as spans, div, and other junk from a block of text, you can use HTML::Restrict like this:


#!/usr/bin/perl
use HTML::Restrict;

  my $hr = HTML::Restrict->new();
  $hr->set_rules({
    # allowed
    p  => [],
    li => [],
    ul => [],
    h4 => [],
    h3 => [],
    h2 => []
    
    # not allowed (everything by default is not allowed!)
    #img => [qw( alt / )]
    # h1 => [] 
  });

  foreach my $line(<DATA>){
    $line =~ s  "\&nbsp\;" "g;      # no space symbols
    $line =~ s  "\s+" "g;           # only 1 space, also remove tabs and anything that matches \s
    $line =~ s  "^\s+""g;           # trim leading spaces
    $line =~ s  "\s+$""g;           # trim training spaces
   
   print $hr->process( $line ) . "\n";
  }

__DATA__
Paste your code here below this line 

Leave a Reply

Your email address will not be published. Required fields are marked *

Main Offers/Services

Hosting and website related services offered by Digital Crunch:



Managed VPS Hosting

Server Updates, Module/Software Installs, Firewall, Security, Hosting Management, Wordpress - let us manage it for you

Hosted Email

Setup and Maintain Business Class, Secure Email Hosting including spam protection and 25GB of storage space

Linux Consulting

Server Updates, Module/Software Installs, Firewall, Security, Hosting, Wordpress and Apps, anything related to Linux

Email Marketing Services

We broadcast your content to your list, capture leads for you, follow up on schedules, make more sales for you

Website Maintenance

Content updates, menu changes, graphics changes, adding content, getting links, building traffic, building sales funnels

Hosting Tutorials

Tutorials we've written for other customers that like to get their hands dirty and learn about hosting