Call: (480) 382-8464

Perl Strip Out hrefs

I was importing text into a shopping cart that was provided by the vendor, but they included links back to all their products! I wanted the keywords, but not the links. HTML::TokeParser::Simple did the job:


#!/usr/bin/perl

# strip out all hrefs, keep the rest

use HTML::TokeParser::Simple;

my $parser = HTML::TokeParser::Simple->new(\*DATA);

while ( my $token = $parser->get_token ) {
  if ($token->is_start_tag('a')) {
    my $href = $token->get_attr('href');
    if (defined $href and $href !~ /^#/) {
      print $parser->get_trimmed_text('/a');
      $parser->get_token; # discard </a>
      next;
    }
  }
  print $token->as_is;
}

__DATA__
paste your html or text with html in it, here 

Leave a Reply

Your email address will not be published. Required fields are marked *

Main Offers/Services

Hosting and website related services offered by Digital Crunch:



Managed VPS Hosting

Server Updates, Module/Software Installs, Firewall, Security, Hosting Management, Wordpress - let us manage it for you

Hosted Email

Setup and Maintain Business Class, Secure Email Hosting including spam protection and 25GB of storage space

Linux Consulting

Server Updates, Module/Software Installs, Firewall, Security, Hosting, Wordpress and Apps, anything related to Linux

Email Marketing Services

We broadcast your content to your list, capture leads for you, follow up on schedules, make more sales for you

Website Maintenance

Content updates, menu changes, graphics changes, adding content, getting links, building traffic, building sales funnels

Hosting Tutorials

Tutorials we've written for other customers that like to get their hands dirty and learn about hosting