LookAhead: LexIt Tips and Suggestions

Save Time: If your site's pages are generated dynamically from a database, you probably already have what you need for creating an initial lexicon. For each of your pages simply export:
{the page title} [tab] {the URL for that page}. Then import this data directly into LookAhead (format 2. This same technique can apply to any organized data that you might already have for your site, such as customer support topics, a knowledge base index, or FAQ topics.

1. Get a feel for how LexIt works by first doing small runs

2. Run several Trial crawls (up to 25 pages each)

3. From the LexIt Monitor page download the Words, Words and Count, and Words and URLs .txt files and study them. What you are trying to determine is what terms are the best representation of your site. Using LexIt output you can create a very detailed (granular) index to your site. Think in terms of what users might be looking for.

4. Open the downloaded file(s) in a powerful text editor (e.g. TextPad) or Excel and study the terms in the downloaded file.

5. Identify and edit singular and plural terms to combine.

6. Identify, edit. or delete term variations, for example: different forms of terms that can be combined, Upper/lower case, unnecessary verbs or gerunds (especially as the start or end of a term), mis-spellings of terms.

7. Add missing terms

8. Identify nonsense or wrong terms extracted and copy to a separate .txt file which you can then use for Skip Words on subsequent crawls.

9. Normalize (make consistent) your terms.

10. Re-run the crawl with 200 and then perhaps 400 Max pages (repeat steps 3-9 and continue to "tune" your lexicon.

Advanced--Building your lexicon:
1. Use the Extra field for linking to See Also alternative terms (for a term).

2. Use the Extra field for linking to small images/photos pertaining to the term.

Scenario:
1. Sites like PCConnection and LL Bean has good titles for each page. So when running LexIt, check Titles Only[ ] which yields a cleaner list of terms -- remember, terms are automatically "rotated" so all words in the title, like Polo and Shirt will be available in LookAhead.

2. Signup for 10,000 pages, but do a first LexIt run at 500 pages.

3. Download the Terms & URLs file to your desktp. You will need a good text editor (such as TextPad) or a spreadsheet such as Excel.

4. Study the terms in the downloaded file (Titles). For example, are there portions of the Titles (e.g., LL Bean - ) that need to be trimmed out -- are there redundant singular / plural versions of a term that can be combined -- are there mis-spellings -- and there different forms of a term that can be combined -- this is all part of the "tuning" or editing process that you do using a text editor or spreadsheet.

5. Import the lexicon developed from the 500 pages and Test to see how the terms in the SearchPal™ drop-down look; tweak the trimming rules.

6. "Tweak" the CSS setting and continue to Test.

7. Once you are happy with Test and CSS settings, and the rules and coverage (completeness) of the terms (from the Titles), re-run LexIt, this time going for the full site (e.g., 9,000 pages) -- again, Titles Only.

8. Download the Terms & URLs file from the 9.000 run, re-study the terms and apply the trimming identified in steps 3-4 to create the "full" lexicon (remember, you can re-import at any time, so if you change your lexicon weekly, you can fully re-import weekly so the results in your LookAhead SearchPal™ drop-down stay current with your site content.

9. Re-import the full lexicon and Test until you are satisfied.

10. Make a copy of your page on which you will make the HTML changes (Embed/Markup), and embed the 3 sections of LookAhead HTML code into that page and Test further until you are completely satisfied, then implement the embedded code on "live" pages.


 Privacy  |  Terms of Use  |  Disclaimer
© 2003-2009 SurfWax Inc. All rights reserved. Patents pending