Crawl4AI + RAG - question

niall · January 30, 2025, 2:22pm

Hi everyone,

Wondering if anyone has used Crawl4AI for RAG use cases like Cole shows in this video?

I’d like to do this kind of setup where I use Crawl4AI to regularly a check a given list of URLs to see if the different websites have been updated and, if so, to update a database with that data.

Can you use the multi-URL function in crawl4AI if each URL is a different website as opposed to a different page on the same site?

ColeMedin · January 30, 2025, 6:53pm

Yes you certainly can! There really isn’t a difference between if the pages you are scraping in parallel are from the same site or not!

niall · January 30, 2025, 8:41pm

Thanks, Cole!

So I guess if I wanted to run the agent and the only change I wanted to make was rather than crawl all the pages on the one website (e.g. ai(dot)pydantic(dot)dev) but instead, say, a list of pages on different websites

e.g.

newyorktimes(dot)com/pricing
wallstreetjournal(dot)com/pricing
washingtonpost(dot)com/pricing
etc

Then to implement that change I guess I have to make some small change to the following function and then it should run as you have implemented it but for the different sources?

def get_pydantic_ai_docs_urls() → List[str]:

…

Having read through the code this seems like the only area that needs to be modified for the use case I mentioned above. Or have I missed something?

Thanks again!

ColeMedin · February 3, 2025, 8:36pm

That is exactly right! Only place to have to edit is where you get the list of URLs and that’s it!