![]() ![]() ![]() ![]() Links on this page: MRTG MRTG home page MRTG Webring MRTG Webring Hub RRDTool RRDTool home page 14all 14all home page SNMP4tpc SNMP4tpc home page Rateup2rrd Paul's step by step instructions on how to move from rateup to RRDTool for MRTG's backend. Talker Paul's scripts for doing traffic ranking, aka top talkers |
14all (sorry, no fancy icon created for this yet)
I have been using MRTG for about 3 years now, and I'm very impressed with it. I am constantly finding new uses for it. I recently switched all my backend data storage over to RRDTool, and probably will not implement MRTG again without it.
A brief history of how I got started in this...In the beginning, I was responsible for Enterprise Network Management at a large bank. I still am, but I digress. My first task was to help out the router guys make their lives a little easier. An intern was doing something like "show interface blah" every 5 minutes and entering the in and out values all freaking day. She had this very impressive spreadsheet, but didn't know how to create a graph. Since I knew Excel, without knowing what she was doing, I showed her how to graph all these numbers she had in this HUGE spreadsheet. After I showed her how to create the graph, I asked her what she was doing. She showed me (especially since 5 minutes had passed since I had been there) and I thought for a minute. I suggested that I write a perl script to grab the numbers and mail them to her. So I did that and it worked pretty good. And now, we had made a huge jump forward. She could do something else productive, and we were getting stats 24 hours a day. Only problem was, someone still had to import the numbers from an e-mail to the spreadsheet. So I started looking on the internet for something a little better. Enter the Multi Router Traffic Grapher, or MRTG, for short.So MRTG started out as a very simple way to grab stats off my employers internet routers and display the results in a graph on our corporate intranet. That way, anyone that wanted to see what was going on with our connection, could just go there to see what kind of saturation our connection was getting. That eliminated the need for my cheesy perl script, someone to import the data to a spreadsheet, someone to create the graph based on said data, convert it to html, and post it on our corporate intranet. And it continued...After I showed it to some of the senior management, they decided they needed to know all about some other information that had previously never been asked (bandwidth utilization) about - the "health" of our Wide Area Network connections. So i figured "Eh, no big deal, it's only a couple more targets." Bad idea, because pretty soon, we were monitoring every possible connection under the sun.MRTG - it's not just for routers anymoreThen I stumbled on to Garth William's site and started doing some pretty neat trending and utilization with our Netware and NT servers. I showed it to some people, and they mentioned it to senior management. So I demo'd what I was doing to them. Another big mistake. So I started monitoring more servers. Basically, now I monitor about 20,000 different targets (this includes the above mentioned connections as well). All updated every 5 minutes.The final stages of basic implementationSo here I was, monitoring all this good stuff, but the machine that MRTG was running on was getting really overloaded. I could tell because I'd log in to the box every so often and make a note of what the results of my uptime command were. (Sound familiar?) It then dawned on me that I wasn't even monitoring my own machine! Argh!! So then I went over to sourceforge.net and grabbed net-snmp (previously ucd-snmp) and implemented that on my RedHat linux machine. Notice that there is no need for snmp to be implemented on the box that does the data collection. It would be a good idea, but it is certainly not necessary. Then I started monitoring the box. Let me tell you, ignorance was bliss. I had no idea how busy this machine was until I started looking at the cpu utilization. Averaging 40% load, cyclical peaks on the hour to about 60%(more about that later), and peaking at 95%, I knew I needed to do something fast, and I couldn't replace the box or add another one to the farm.Work smarter, not harderSo I started looking for ways to improve performance. I knew that there were a couple options, but I really didn't know what to do. I looked at slicing up the run times so that my different mrtg processes were not all running at the same time. Tried it, it helped, but not much. So I thought about asking our web hosting group if I could put the web hosting duties on to one of their more powerful machines. Nope, no can do. Plus, I realized then that it wasn't the hosting that was killing the box, it was all the backend processes. Recreating every html page for every target sure does take up a lot of resources. So does redrawing the daily graphs every 5 minutes. So it hit me like a ton of bricks - switch to RRDTool. I did the monitor that was only known to me first, the one that did the NIC and CPU utilization for my mrtg server. I couldn't really tell if it was going to help, but since MRTG-3 is going to use RRDTool anyway, I figured I'd try it out. Since I was using data that only I cared about, I could really screw it up and would not catch any trouble for it. I played with a bunch of frontends, and decided that I'd stick with MRTG to collect the data, RRDTool to store the data, and 14all to display the results. Once I got down the basic idea, I switched all my targets over to RRDTool (still collecting and doing all the old mrtg stuff at the same time), went back and modified my web site to use 14all and the new data, then turned off data collection from the old mrtg/rateup combination. I wrote up a procedure for how I did it. You can find that here.An ongoing sagaAs I had anticipated, the more stuff I think I can do with RRDTool, the more I realize I don't know about RRDTool. I am getting requests from my employer everyday to collect more and more data. So much so, that I actually have a budget, and expect to add at least one or two more servers by the end of the year. Pretty soon, I'll try to make some of the stuff I do available, but for right now, my employer is not comfortable displaying actual resources for all to see.I also have written a cool script that will rank every target in a single config file in order. You can find out all the details about the talker scripts here. If you have any questions about MRTG, RRDTool, or 14all.cgi please contact me, visit the MRTG site, visit the RRDTool site, or visit the 14all.cgi site. |