Simple RSS XML Grabber in PERL

Recently I decided to jump on the Web 2.0 bandwagon and get some RSS XML feeds for news headlines and add them to some of my more popular websites. Like usual, I was unable to find a simple script that I could add and include the basic text in my pages. So, as usual, I wrote one.

I figure it will be a hot little script for anyone looking to include a text based XML from any other website. I have kept it simple, it probably wont work on most feeds without some tweaking, but should be a simple install and fine for most people.

The script requires LWP::UserAgent which is part of the LWP package and should already be on most servers. I did not use any other modules to parse the XML but it would probably work better if I had. I know very little about XML but using a module that calls 3 more modules that each call another module to parse a simple text file just seemed like a pointless waste of cpu time.

All I wanted was the article tile, the link to the full article and the description of the story. It just doesn't seem that complicated and I did not want to make it more than it was. A simple fetch and print using LWP.

There is only one variable to configure, that is the actual url of the xml file you want to access. Or use the second version of the script which will allow you to dynamically define the xml file in multiple pages.

download the perl script rss.cgi

See The Sample Web Page Output - We have installed a sample XML file in this directory and the script is running to access the file. You can compare the working script to the script you install on your own website. They should look the same if everything is working. You wont want to access our feed, since it will never update. But at least this way we can provide a fully configured working script.

To install the script, save the file rss.txt and rename to or rss.cgi so it can run on your website.

Make sure the first line of the program points to your perl compliler.
#!/usr/bin/perl is the default on most servers

Upload the file in ASC format

chmod 0755

Then just access the script using your web browser.

Once you have it working, replace the test url with the actual url of your desired rss feed and you can start displaying updated headlines on your own website.

RSS Fetch and Print - Version 2

A second version of the script is set up to use a querry string to define the url of the xml file. The advantage of this script over the more simplified version is that one script can do the work for all the xml files you want to access. You could actually have hundreds of pages of content, all dynamic using one simple script.

To call the script use any shtml page with the include virtual tag. You could do the same thing using jsp, asp or php, but since I am perl only, I have no idea how.

<!--#include virtual="rss.cgi?" -->

Just replace the test url with the actual url you want to include in your web page. And that is it. Build 100 pages with the urls of each xml file and you instantly have a huge website with dynamic up to date content.

download the perl script rss2.cgi

I have added a block in the script to prevent the script from being accessed directly using a querry string. That will prevent the script from being used as a proxy accessing outside pages. This is important to prevent abuse of the script and your website.

You can further the security by requiring that the accessing page is part of your domain. But I am trying to keep the code as simple as possible so you can expand on it as needed.

You can do that by replacing the lines

print "Location: http://$ENV{'HTTP_HOST'}\n\n";

with someting containing the exact url of the script

$scripturl = ""; # the url of the script on your server
if ("$ENV{'HTTP_HOST'}$ENV{'REQUEST_URI'}" =~ /$scripturl/){
print "Location: http://$ENV{'HTTP_HOST'}\n\n";

There is plenty of room to customize this script and make it better. But it is a quick simple solution when I have seen nothing as simple in perl.