- Download Sitebulb from website (Mac and Windows available) and install.
- Open the Sitebulb app and create an account.
- Email/Slack Shava or SusanL and request a Sitebulb license. The Sitebulb license will be issued to the email address you provided in your Sitebulb account. Once confirmed a license has been issued, click on the 'My Account' link in the top right corner and verify that see an entry under Licenses.
- Click the green 'Activate License' button and you should see the machine name show up as below when successfully activated.
- To get started, click on the "Projects" link in the top left corner and then click on the green "Start a new Project" button.
- Fill in a project name and the URL for your site. You can use https://software.xsede.org/sitebulbtest/ as the Start URL to get familiar with the tool.
- Set the Device type to 'Desktop'.
- Uncheck the "Crawl Outside of Directory" checkbox.
- Hit the "Save and Continue" button.
- Once saved, you will see additional settings as below. Click on the "Content Search" link in the left menu and click the green "Add Multiple Rules" button.
- Copy the list of terms from this list and paste it into the Basic tab text box, select "HTML and Text" in the Search In box, then click the green "Add Rules" button.
- From the Project Settings page, click on the "Crawler Settings" link from the left menu to view the crawler settings. If your machine is powerful, you can increase the Instances of Chrome to a number above 5 and that will make the crawl go faster.
- Click the "Start Now" button.
- Sitebulb will display a status board as it crawls the website – you can see the URLs it's crawling under the "URL Log". If you are doing a real site and it's browsing areas of your site you don't want it to, you can press the "Stop" button. If it's taking too much CPU, you can press the "Pause" button and adjust the configuration to a smaller number of instances under "Update Settings".
- Once it's complete, you will see an overview page as below. You can see some overall stats about the site.
- Click the "Content Search" link in the left menu tab to see the Terminology search results as below. The Overview tab shows the terms listed as rows and the number of times it was found on your site. The demo site had four words on the Terminology list like "Webmaster" and "White Paper" so those are listed first.
- To see the specific URLs that the terms were found on, click the "URLs" tab next to overview and that will give you table the list of the pages that were found with terms are the rows and the terms found on that page are listed in the columns. For example, in the screenshot below, "Webmaster" and "White Paper" were found on subpage.html. The columns are also sortable so if you have more than one page where the terms were found, you can click on the column name to have it sort the list by the number of times a term was found on pages.
- To share the list with others in your group, click the green button "Export All Search Data → Export to CSV" and it will pop up a screen like below so you can save the results to a local file
Search Google drive content with Python script (still in progress)