Crawling over all EDH decks

 avatar
DerToast avatarDerToast
{"ops":[{"insert":"Hi,\nfor a project I'm working on, I would like to crawl as many EDH-Decks as possible (I looked into the forums first and saw that you generally allow this? e.g. https://archidekt.com/forum/thread/3476605). My first instinct, was to just search all decks, sort by last updated and walk backwards, however after 50 pages, no more appear. The next idea would be to filter by commander, sort by last updated and I assume that would probably hit a good portion (though for the top 1000 commanders it would take around 2-3 months of crawling alone, if I leave 2 seconds between each request and assuming all 50 pages have exactly 60 decks).\nIs there any easy way I overlooked to do this better and is a project that needs crawling in such a large scale something that needs extra approval?\n"}]}
Edited 3/2/2025, 4:22:11 AM
0
 avatar
{"ops":[{"insert":"We used to allow it, but needed to limit it due people making requests that would result in significant strain on our db, and it ended up slowing down the site for everyone. The reason for the slowdown is frankly just that there are so many more decks on the site now, that ordering and paging the data becomes a significant issue when we need to offset the query by tens of thousands of records (or more). \n\nYou're welcome to use our API as is, but as you noticed, any one request will only allow you to page up to page 50. You could get creative by filtering by commander like you said, and grab decks that way, but even then, the 50 page restriction would still apply. \n\nWhat's the actual end goal here? I might be able to come up with something creative. \n"}]}
0
 avatar
DerToast avatarDerToast
{"ops":[{"insert":"Thats super nice of you to ask. Its multiple projects really, the big one is a recommender system for commanders. This can probabl, be achieved just by pairing commanders and users. But I also wanted to plan ahead and split commanders into builds, so i can recommend them independently. To do the latter I imagined I would have to do some playing around first. Like using different clustering algorithms to see if they come up with different groups than i would. But this step is rather vague, so I figured I would gather as much data as I can\n"}]}
0