The Pirate Bay advanced proxy
fd7da3bd — Louis Solofrizzo 5 months ago
.gitignore: Update
45d04813 — Louis Solofrizzo 5 months ago
Readme: Add mailing lists and todo links
4284c6ab — Louis Solofrizzo 5 months ago
torrent-resolver: Add torrent resolver in tree

refs

master
browse log

clone

read-only
https://git.mobley.ne02ptzero.me/~louis/tpb-crawler
read/write
git@git.mobley.ne02ptzero.me:~louis/tpb-crawler

The Pirate Bay - Crawler

A more advanced TPB proxy

Clone

git clone https://git.mobley.ne02ptzero.me/~louis/tpb-crawler --recursive

Compile and Install

cd tpb-crawler/ && mkdir build && cmake .. && make && sudo make install

Usage

Configuration

One configuration file is used by all the services:

tpb:
  url: https://thepiratebay.org
  categories:
    - 201
database:
  driver: sqlite3
  url: db.sqlite

Database

Since there are multiple services consuming and reading from the database, and the database might get big very fast, one might want to use a server database model rather than sqlite. For example, with postgres:

database:
  driver: postgresql
  url: "host=localhost user=postgres database=tpb"

One does need to create the database first.

crawler

The crawler is the service that crawl the original site, browsing by categories, and getting new torrents, some informations about them, and adding it to the database. It's a fairly straightforward usage, just launch it.

torrent-resolver

The torrent resolver gets magnets links from the database, and get some informations about seeders, leechers, the size of the torrent and the files in it. It then push them in the database, in order to be consumed by the web service.

web

The web is just a web frontend to the information stored in the database.

Deployment and Scaling

crawler

The Crawler is not very CPU hungry, but can be slow. I've observed some IP bans from the TPB site / proxies if many requests are fired at the same time. In order to avoid that, I've pratically removed any multithreading in the program. One can split the charge by deploying multiple crawlers that are each crawling different categories.

torrent-resolver

The resolver is very CPU hungry. It can easily take all the CPUS of a big server hostage. I haven't coded a way to distribute the load yet, but I advise to deploy this software on a dedicated server.

web

The web can be scaled indefinetely with a load-balancer in front of it.

database

One can create a postgres database cluster in order to handle more load, and introduce a bit of redundancy.

Contributing

If you want to contribute, please send your patches to ~louis/tpb-crawler@lists.mobley.ne02ptzero.me You can look at git-send-email.io if you need to setup your mail client.

Bug Report

Please report your bugs at todo/~louis/tpb-crawler You can also send an email to ~louis/tpb-crawler@todo.mobley.ne02ptzero.me