Dashboard > AutoFocus 4.0 documentation > ... > Managing sources > Adding a Website source
AutoFocus 4.0 documentation
Adding a Website source
Added by Herko ter Horst , last edited by Herko ter Horst on 2008-02-19
Labels: 
(None)

Follow these steps to add a Website source to AutoFocus:

  • Start the Add New Source wizard by pressing CTRL-N, selecting Add New... from the Sources menu or by pressing the Add New Source button in the Source panel's toolbar.
  • Select "Website" and press Next.
  • Specify the location of the starting page of the website, e.g. "http://my.company.com/products/". The text field features an auto-complete list with all locations that you have used before.
    During scanning, AutoFocus will load this page, collect all links in it, load those pages, determine their links, etc. How long this process is repeated is determined by the number of hops that you specify here. A setting of '1' means that the start page and all pages linked from it are loaded but not the pages that are reachable in two steps. 'Unlimited' means that this process is repeated until AutoFocus finds no more new pages.
    Following links is always restricted to the current "virtual root", in this example: all pages whose location starts with "http://my.company.com/products/". For example, links to "http://my.company.com/support/" or to other sites are not followed.
    Besides web pages, AutoFocus also scans resources such as images and media files that are explicitly linked to. Resources that are embedded in a web page (i.e., displayed as part of the web page) are not scanned.
    Press Next to continue.
  • Specify the maximum size that the linked resources may have. Any resources larger than this size are ignored. By default this is set to 5 MB, meaning that only resources smaller than 5 MB are accepted.
    The use of this limit is twofold. First, it can be used to prevent retrieval of very large resources that take a long time to download. Whether or not this is useful depends on how thoroughly you want to search a website. Note that only resources from which AutoFocus can extract useful info (particularly textual documents) are fully downloaded, executables that are linked to for example are never fully downloaded.
    Second, this limit can be used to workaround problems involving processing of very large resources. Especially very large PDF files (in the order of tens of megabytes) can make AutoFocus freeze during indexing.
    Press Next to continue.
  • Next you are requested to enter a name for the source. The name will be shown in the list of Sources as well as long search results, etc.
    Press Next to continue.
  • Finally you see a screen telling you that you have successfully defined a new source. You may optionally start scanning the source. This is needed in order to be able to search and explore the web pages and other resources of this source. As an active scanning process blocks the entire user interface, you may also decide to switch this off (e.g. to define more new sources) and scan the sources later by hitting the Refresh button.
    Once you click the Next button, the source will become visible in the list in the Sources Panel and the scanning process may start, depending on the choice you have made.

Note that at any time except for the last step, you can press the Cancel button. No source will be added then.

Powered by a free Atlassian Confluence Open Source Project License granted to Aduna Open Source. Evaluate Confluence today.
Powered by Atlassian Confluence 2.7, the Enterprise Wiki. Bug/feature request - Atlassian news - Contact administrators