Busy making add ons to Archon - Got crawl4AI Docs all done, whats next?

info2 · March 20, 2025, 12:11am

Also I want to make the agentic flow more interesting, it is amazing what cole has done but if I can help make it better, was thinking like this

Workflow:

User Request: The process begins with a clear and specific user request for a coding task. This is your initial input.
Reasoner Agent:

Input: The User Request.
Responsibility: Analyze the user request, understand its requirements, and break it down into smaller, manageable sub-tasks. It acts as a planner and orchestrator.
Output: A set of sub-tasks or instructions for the parallel coding agents.

Parallel Coding Agents (Working Concurrently):

Prompt Engineering Agent:
- Input: Sub-task from the Reasoner Agent.
- Responsibility: Design and refine effective prompts to guide the underlying language model in generating the desired code snippets for its assigned sub-task.
- Output: Well-crafted prompts.
Tool Definition Agent:
- Input: Sub-task from the Reasoner Agent.
- Responsibility: Identify and define the necessary tools, libraries, or APIs required to accomplish the assigned coding sub-task. This might involve selecting appropriate tools and specifying how to use them.
- Output: Definitions and specifications of required tools.
Dependencies Agent:
- Input: Sub-task from the Reasoner Agent.
- Responsibility: Identify and manage any dependencies (e.g., other code modules, libraries, data sources) needed for the assigned sub-task. This might involve ensuring these dependencies are available and correctly configured.
- Output: List of dependencies and instructions for managing them.
Model Selection Agent:
- Input: Sub-task from the Reasoner Agent.
- Responsibility: Choose the most suitable underlying AI model (e.g., a specific large language model with certain capabilities) for the assigned coding sub-task, considering factors like the complexity of the task and the model’s strengths.
- Output: Selection of the appropriate AI model.

Final Coding Agent:

Input: Outputs from all the parallel coding agents (prompts, tool definitions, dependencies, model selection).
Responsibility: Take the outputs from the parallel agents and use them to generate the final code. This involves using the engineered prompts with the selected model, utilizing the defined tools, and ensuring all dependencies are handled correctly.
Output: The complete generated code.

Human-in-the-Loop Iteration:

Input: The generated code from the Final Coding Agent.
Responsibility: Present the code to a human for review, testing, and feedback.

Feedback Loop:

Input: Feedback from the human reviewer.
Responsibility: Incorporate the feedback to improve the generated code. This might involve iterating back to the Final Coding Agent for minor adjustments or even to earlier stages like the Reasoner Agent or Parallel Coding Agents if more significant changes are needed.

Key Principles for Recreation:

Modularity: Design each agent as a separate, focused module with a clear responsibility and well-defined inputs and outputs.
Parallelism: Ensure the Prompt Engineering, Tool Definition, Dependencies, and Model Selection Agents can operate independently and concurrently to speed up the process.
Orchestration: The Reasoner Agent plays a crucial role in orchestrating the workflow by breaking down the initial request and distributing tasks effectively.
Integration: The Final Coding Agent needs to be capable of integrating the outputs from the parallel agents to produce a cohesive final result.
Human Oversight: The human-in-the-loop aspect is essential for quality control and ensuring the generated code meets the user’s needs.
Iterative Refinement: The feedback loop allows for continuous improvement of the generated code based on human input.

issue resolved

just the above flow now, also any ideas on what other frameworks to add?

Repo here GitHub - CCwithAi/Archon: Archon is an AI agent that is able to create other AI agents using an advanced agentic coding workflow and framework knowledge base to unlock a new frontier of automated agents.

take from the branch
clean-langchain-ui

Cole feel free to pull it or point me in the direction to pr it (I am still learning with github) this version has the crawl4ai, langchain, with your changes from today.

WARNING! I need to set up some filtering for the crawl for langchain, so if you use this feature, only use local AI and make sure you use a good powerful model with a good spec system, min 32gb ram, 14b models +. Mine is 16GB Ryzen 7 5000 and it struggles a lot with local AI.

ludwigsolutionsai · March 20, 2025, 4:02am

This is impressive thank you so much muchly appreciated

info2 · March 20, 2025, 4:25am

Added langchain Python, just the long long wait, so much documentation for langchain. Works great with the chat feature when I asked in th UI what frameworks it knows it replied with pydantic.ai, crawl4ai and langchain python so its a solid implementation in this respect.

ColeMedin · March 20, 2025, 8:37pm

This is amazing work @info2, nice job!! PR coming for this?

info2 · March 20, 2025, 11:49pm

Thank for your kind words Cole.
If I can add any value to this amazing community at all then I am happy.
And you and Liam Otley are the ones that gave me the push to set up on my own and leave my main job, so I already owe you so much. Amazing changes you made today, I just had some fun merging your changes with mine, had to modify a little but seems to work even better than before. the outputs are way better I am working on some filtering for langchain docs as it needs more refinement. It works just it is not ideal to scrape the whole sitemap in this case as it can overload the DB and cost a fortune with GPT, not to bad with local models but power intensive.

info2 · March 21, 2025, 12:34am

I need to sort the langchain crawler with some filtering so it is not scraping the whole site map for langchain python, will have a look at it this week but the repo I am working on is here it works with crawl4ai and langchain no issues works very well just please USE LOCAL AI for langchain to save the API costing a fortune.

GitHub - CCwithAi/Archon-v5: CCwithAi/Archon-v5

info2 · March 21, 2025, 5:03pm

made the following changes to the codebase to allow the pydantic_ai_coder agent to access Crawl4AI and Langchain Python documentation:

Modified agent_tools.py to:

Update the retrieve_relevant_documentation_tool function to accept an optional sources parameter that defaults to all three documentation sources
Implement logic to query and combine results from multiple documentation sources
Add source identification to the results to make it clear which documentation a chunk is from

Updated the list_documentation_pages_tool and get_page_content_tool functions to:

Accept an optional source parameter for filtering
Include source prefixes in the returned URLs
Parse source prefixes from URLs when retrieving content

Updated the agent and tools refiner agents to:

Accept the new parameters in their tool implementations
Include proper documentation in their docstrings about sources
Pass the parameters to the underlying functions

Enhanced the agent prompts to:

Explicitly mention all three documentation sources
Encourage using all documentation sources when researching
Guide agents to use the source parameters appropriately

These changes maintain backward compatibility while adding the capability to retrieve documentation from all three sources. Now the agents can access Pydantic AI, Crawl4AI, and Langchain Python documentation as needed.

info2 · March 21, 2025, 5:41pm

ColeMedin · March 21, 2025, 11:30pm

Beautiful work! Also I’m glad my changes are leading to even better outputs for you!

squishy64 · March 27, 2025, 3:13pm

Thank you for the adding the new crawlers to Archon v5, excellent work!! I have tried to use the new crawlers, however they crash after 150-200url into crawling. I’ve tried using Supabase cloud based and local, same for LLM’s etc. Have had a similar experience? I’m running Archon in a docker container?

info2 · March 27, 2025, 6:06pm

Hi, I had a similar issue but I found it was from rate limiting from GPT that was causing the issue try use a different model, also if you have free plan then you will be rate limited on the server? I will take a look again, did you get any errors?
I had this issue, but then I swapped model from 4-o mini to 4 and it worked fine, I have mine running locally in a an env, I will add to docker test and find a fix. Thanks for letting me know

info2 · March 28, 2025, 12:18am

Did you have the langchain / crawl4ai version or the one with just crawl4ai? can i also please ask did you have issues with each framework, Just crawl4ai? on the craw4AI one I added Caching: to reduce load on servers and improve response times.

squishy64 · March 28, 2025, 3:25am

I’m working on Python CLI that will be able to create new crawlers for new doc’s sites, test the crawler and integrations etc. The Python CLI will have a test suite that will make sure everything working has expected>

info2 · March 28, 2025, 11:54am

So I did some testing in Docker and everything seemed to work fine, I had an issue with the doc count as it was crawling Craw4AI but it still indexed correctly. Looking at how I did the code, I could make a few improvements for sure so will do this weekend. I just cleared pydantic and crawl4ai and then reindexed worked great in docker for me. I will have a try with langchain again later. The CLI is a great idea, looking forward to seeing it implemented. So I have integrated the tools into the existing pydantic agent and I am thinking I should create seperate agents to keep the code cleaner and more structured?
I have done a screen recording with the logs and build to docker so will upload it later. Any suggestions would be amazing, always looking to imporove and learn from others. Did you get any error logs squishy as that would really help me pinpoint what cause the DB crash for you?

squishy64 · March 28, 2025, 3:14pm

I’ll try to get docker logs to you later today or tomorrow. I’m hoping the Python CLI will be ready by Monday, I would be for any feedback?

info2 · March 28, 2025, 3:39pm

if there is anything I can help with I would be happy to.

ColeMedin · March 28, 2025, 4:46pm

Thanks for testing it out @squishy64 and I love your PR @info2! I’ll be testing it more myself soon too

pakcamatkry · May 24, 2025, 4:25pm

What embedding model are you using? Btw, you are awesome I am very happy to get to know great people here.