This document demonstrates the example use case from the bootcamp requirements.
Search Term: "Software development consultancy finland"
Goal: Analyze how companies describe their values
Expected Output: A table showing each company's values classified as soft or hard
-
Start the application:
streamlit run app.py
-
Go to the Search-Based tab
-
Enter:
- Search term:
Software development consultancy finland - Number of results:
5
- Search term:
-
Click "Start Search-Based Crawl"
-
Wait for results (typically 2-5 minutes)
Run the built-in example:
python orchestrator.pyThis automatically runs the example use case and generates reports.
import asyncio
from orchestrator import CrawlOrchestrator
async def run_example():
orchestrator = CrawlOrchestrator()
results = await orchestrator.run_search_based_crawl(
search_term="Software development consultancy finland",
num_results=5
)
if results["success"]:
print(f"✅ Analyzed {results['num_companies']} companies")
print(f"📊 Report: {results['reports']['aggregate_excel']}")
# Print summary
for analysis in results["analyses"]:
print(f"\n{analysis.company_name}:")
print(f" Soft: {', '.join(analysis.soft_values)}")
print(f" Hard: {', '.join(analysis.hard_values)}")
asyncio.run(run_example())The system will:
- Search for "Software development consultancy finland" on Google
- Find approximately 5 relevant company websites
- Crawl each website using AI to navigate to important pages:
- Homepage
- About/Company pages
- Values/Mission pages
- Culture/Team pages
- Extract values statements from each company
- Classify values as:
- Soft values: People-oriented (e.g., caring, openness, collaboration, trust)
- Hard values: Business-oriented (e.g., innovation, efficiency, quality, excellence)
- Generate reports:
- Excel table with all companies
- Individual markdown reports per company
- Summary statistics
| Company | Website | Soft Values | Hard Values | Orientation | Summary |
|---|---|---|---|---|---|
| Company A | companya.fi | Trust, Collaboration, Openness | Innovation, Quality | Balanced | Emphasizes both team culture and technical excellence |
| Company B | companyb.fi | Caring, Transparency | Efficiency, Results, Excellence | Business-Focused | Performance-driven with strong delivery focus |
| Company C | companyc.fi | Diversity, Well-being, Empathy | Innovation, Scalability | People-Focused | Strong emphasis on employee experience and culture |
After running the example, you might find:
- Common soft values: Trust, collaboration, openness, transparency
- Common hard values: Innovation, quality, excellence, customer focus
- Orientation: Finnish software consultancies often balance culture and performance
- Unique patterns: Focus on Nordic values like transparency and work-life balance
After running the example, check these directories:
-
./reports/:aggregate_results_TIMESTAMP.xlsx- Main results tableaggregate_results_TIMESTAMP.csv- CSV versionaggregate_summary_TIMESTAMP.md- Summary reportCompanyName_TIMESTAMP.md- Individual reports for each company
-
./outputs/:- Temporary files and logs
"No results found"
- Check your internet connection
- Try with fewer results (e.g., 3 instead of 5)
- Google might be rate-limiting; wait a few minutes
"API rate limit exceeded"
- You're making too many LLM calls
- Reduce number of results
- Check your API quota
"Low confidence scores"
- Some companies don't clearly state their values
- The AI infers what it can from available content
- This is expected behavior
"Takes too long"
- Each company takes 1-3 minutes to crawl and analyze
- 5 companies = 5-15 minutes total
- Be patient; AI navigation takes time
Unlike simple HTML parsers, this system:
- Renders pages like a real browser (JavaScript, dynamic content)
- Reads the page content to understand it
- Uses AI to decide which links are most relevant
- Follows the most promising links (about, values, mission pages)
- Extracts meaningful content while filtering navigation/ads
- Analyzes the aggregated content to find values
This is what makes it "AI-assisted web crawling" rather than simple scraping.
-
Be specific with search terms:
- Good: "Software development consultancy Helsinki"
- Bad: "Companies in Finland"
-
Start small: Test with 2-3 companies first
-
Check the logs: Watch the console for progress updates
-
Review individual reports: They contain more detail than the table
-
Adjust settings in
.envif needed:- Increase
MAX_CRAWL_DEPTHfor deeper crawling - Increase
CRAWL_TIMEOUTfor slow sites
- Increase
Your example run is successful if:
- ✅ Finds at least 3 company websites
- ✅ Successfully crawls each website
- ✅ Extracts some values (even if not all are clear)
- ✅ Classifies values into soft/hard categories
- ✅ Generates the results table
- ✅ Creates individual reports
Even if some companies have low confidence scores or unclear values, the system is working correctly—some companies simply don't have clear values statements on their websites.
After running the example:
- Review the results: Open the Excel file in
./reports/ - Check individual reports: Read the markdown files
- Try your own search: Use different search terms
- Upload a CSV: Test the CSV-based input method
- Experiment: Try different configurations in
.env
Remember: This is a bootcamp prototype for learning. The goal is to understand AI-powered web crawling and agentic systems, not to build a production-ready tool.