WebGrapher Frequently Asked Questions

This FAQ has been organized by subject matter, so click on a specific subject below to see questions related to that subject, or scroll down to see all frequently asked questions.


Troubleshooting

  1. The crawler went to a completely different Web site then what I wanted, or the crawler gave back unexpected results. What gives?

    Make sure you entered the initial url (with the -url="url" option) 100% correctly. You can verify this by typing it into a browser and making sure you navigate to the right page; then just copy that url and paste it into the -url= option for the crawler.

    What can happen if you enter the wrong link is, the http server returns a default domain when it doesn't find a given link, and then all-of-a-sudden you're crawling (unexpectedly) some default domain instead of your domain, which lives under the default domain. For example, instead of typing "http://www.domain.com/mysubdirectory/page.htm", you type "http://www.domain.com/myFLUBdirectory/page.htm". When the crawler sends your flubbed up url to the server, the server may be programmed to return "http://www.domain.com", which means now you're crawling the root domain instead of your directory on that domain. Moral of the story: double check your url before starting the crawler.

  2. I know I have some missing jpg's on my site, but the crawler says no errors found. How come?

    You need to set the -i option to make the crawler scan image files such as jpg's, gif's and png's.

  3. I know I have some missing or invalid links on my site, but even with the -i option set the crawler is not reporting them.

    The program is still in the alpha stage, so keep that in mind. One thing is that currently the program only does minimal checking of javascript (and other script) code. Mostly it checks your html for links in keywords such as href, img src and so on. If you do find something that the crawler missed, take a moment and drop me a line so that I can work on fixing it.

  4. The crawler runs very slowly.

    The crawler is a very IO-bound program. That means its speed depends predominantly on the speed of your Internet connection. With a reliable, fast connection, the crawler should proceed at a reasonable pace. If you include the -i option, then the crawler will appear to run faster since it spends more time on each page scanned; without the -i option, the crawler will be scanning pages all the time. Each page must served by the http server before it can be scanned, and that takes time. So the answer is the speed of the crawler depends mostly on the speed of your Internet connection and the speed at which your host (your http server) serves up pages, and to a much lesser extent on the speed of your CPU.

  5. I can't see any links to image files in my graph.

    You need to set the -i option to make the crawler include image files such as jpg's, gif's and png's in your graph.

  6. The display program can't read the dot file produced by the crawler.

    Two possible reasons:

    1. Did you rename the dot file? If you rename the dot file (and don't change anything else in the file), the display program will no longer be able to read it. This is because the name of the dot file and the filename inside the dot file must match. If you really want to rename the file, you need to change the first line of the file as well. The first line of the file should look like:

           digraph filename {

    Just change filename to whatever you're renaming the file to. Alternatively, just make sure to give the crawler the filename that you really want in the first place.

    2. Did you remember to install GraphViz? GraphViz is required by WebGrapher in order to display your graph. You can find the link to download GraphViz on the WebGrapher home page.


General Questions

  1. What is Java, and why do I need it to run WebGrapher?

    Java is both a programming language and an environment in which programs run. In some ways it's like Flash. Just as you need to install Flash in order to run Flash programs, you must install Java in order to run Java programs. WebGrapher is a Java program and thus requires that you have Java installed. You must have Java version 1.5 or later.

  2. What is GraphViz, and why do I need it to run WebGrapher?

    GraphViz is a wonderful graphing program created by AT&T. It contains tools and libraries to display both directed and undirected graphs, and has many more features as well. WebGrapher integrates a graphing program called Grappa that uses the GraphViz libraries/tools. In addition, you can use some of GraphViz's tools (such as Dotty) directly on the dot files produced by WebGrapher. So the answer is: (1) you must have GraphViz to use WebGrapher's integrated display program, and (2) you may also use the Dotty tool (part of the standard GraphViz download) to view or manipulate the dot files produced by the crawler.