First time I crawled pydantic docs I got something like 417 chunks and there was something like 7 failures. I’ve done this about 5 times now and got a different number of chunks each time. The latest run was 350 chunks with 0 errors.
I’m going to assume that when a failure occurs that that data is not read which means the chunk_size may be affected and hence the number of chunks. Is that about right or is there another mechanism at play here?
Yeah that’s right! And I think the errors are probably caused by open AI rate limit errors since each chunk is being summarized by the primary model and it all happens super quick. I’m planning on playing around with this and seeing if I can make it more consistent!
I was wondering about this, too. I thought it was sending too many requests a second, that maybe the Pydantic site was limiting requests. Even though it would be slightly slower, incorporating a slight delay between requests is better etiquette.