Categories
Perl

Perl Strip Out hrefs

I was importing text into a shopping cart that was provided by the vendor, but they included links back to all their products! I wanted the keywords, but not the links. HTML::TokeParser::Simple did the job:


#!/usr/bin/perl
# strip out all hrefs, keep the rest
use HTML::TokeParser::Simple;
my $parser = HTML::TokeParser::Simple->new(\*DATA);
while ( my $token = $parser->get_token ) {
  if ($token->is_start_tag('a')) {
    my $href = $token->get_attr('href');
    if (defined $href and $href !~ /^#/) {
      print $parser->get_trimmed_text('/a');
      $parser->get_token; # discard </a>
      next;
    }
  }
  print $token->as_is;
}
__DATA__
paste your html or text with html in it, here 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.