A more advanced TPB proxy
git clone https://git.mobley.ne02ptzero.me/~louis/tpb-crawler --recursive
cd tpb-crawler/ && mkdir build && cmake .. && make && sudo make install
One configuration file is used by all the services:
tpb: url: https://thepiratebay.org categories: - 201 database: driver: sqlite3 url: db.sqlite
Since there are multiple services consuming and reading from the database, and the database might get big very fast, one might want to use a server database model rather than sqlite. For example, with postgres:
database: driver: postgresql url: "host=localhost user=postgres database=tpb"
One does need to create the database first.
The crawler is the service that crawl the original site, browsing by categories, and getting new torrents, some informations about them, and adding it to the database. It's a fairly straightforward usage, just launch it.
The torrent resolver gets magnets links from the database, and get some informations about seeders, leechers, the size of the torrent and the files in it. It then push them in the database, in order to be consumed by the web service.
The web is just a web frontend to the information stored in the database.
The Crawler is not very CPU hungry, but can be slow. I've observed some IP bans from the TPB site / proxies if many requests are fired at the same time. In order to avoid that, I've pratically removed any multithreading in the program. One can split the charge by deploying multiple crawlers that are each crawling different categories.
The resolver is very CPU hungry. It can easily take all the CPUS of a big server hostage. I haven't coded a way to distribute the load yet, but I advise to deploy this software on a dedicated server.
The web can be scaled indefinetely with a load-balancer in front of it.
One can create a postgres database cluster in order to handle more load, and introduce a bit of redundancy.