My view is that we need to look into those when thinking about IDE related features like code and text search and codebase understanding and chunked editing.
I do think that there are at least two different use cases:
Exploring a codebase
Like @wonderwhy.er described, the general context for this is the whole project. RAG-mechanisms may be appropriate, but also custom agents for helping to explore.
Here’s an example: an agent which by-prompt has certain descriptive capabilities creates a holistic, textual description of the whole repo by scanning file-by-file / folder-by-folder.
Output (illustrative):
{
file: "src/workbench.txt",
dependencies: "[src/lib/files, ...]",
responsibility: "Establishes a container which has controls..."
}
Then, this could be used in order to determine where to enhance certain features
“as a developer proficient in [detected language], where would you doe the following enhancement?”
[prompt]
Here’s howthe project currently looks like
[Context>: the above summary of all files]
Making enhancements
If an entry point to the intended change is known (usually, this is the opened file), the context could be determined by the abstract syntax tree (AST). IDEs usually employ a language server in order to interact with the language. This should “easily” output a graph of dependencies. All branches leading to the entry point would then define the context.
Both techniques could of course be combined in order to be more creative also when making entry-point-based enhancements.
I am quite sure that most if not all commercial product utilize the language for the context. Some ask/allow the user to define it (a working set of files which defined the context).
They use grep search and read files in chunks of 200 lines to select correct ones for the task
AST could allow to chunk better then just lines but its also a complication we can do later.
As usual my suggestion is to do it in small steps.
At minimum it will require:
not sending whole conversation to AI but filtering trough some form of search
If we do it for chat conversation it will allow us to include user requests, AI answers and file contents in such “context creation”
But, we will need additional support for files as we need to include path to the files in questions when feeding those chunks to llm so it knows which files to modify
may be as first iteration what I would do is add “grep search” trough chat
current state of files
give list of files and relevant messages from conversation ranked by importance, send that to AI in style of
{relevant past chat messages}
{file urls and content of relevant files found trough search}
{last user request in what to do with all of that}
Anthropic says coding software like Replit, Codeium, and Souregraph have already started using MCP to build out their AI agents, which can complete tasks on behalf of users. This tool will likely make it easier for other companies and developers to connect an AI system with multiple data sources — something that could become especially helpful as the industry leans into agentic AI.
Sounds nice but the I have experimented with grep search once I believe it requires some native bindings and actual file indexing. Web container might not have this.
If we want to do this we might need to implement our own version
I believe I have a solution. Ya there any way we could tank about this further? It’s a lot to just type on here, but it’s something I’ve been working on for other reasons, but it should reduce tokens sent back and forth a lot