Art crawler
aacb5d62 — Louis Solofrizzo 15 days ago
sorter: Add the ability to pass custom commands for video and gif sorting
dba5389e — Louis Solofrizzo 16 days ago
bot: Add infinite retry on error
71e51ca2 — Louis Solofrizzo 16 days ago
Readme: Add contributing & bug report instruction

refs

master
browse log
0.1.0
release notes

clone

read-only
https://git.mobley.ne02ptzero.me/~louis/kart
read/write
git@git.mobley.ne02ptzero.me:~louis/kart

kart

logo

Art crawler - Live Version

Table of Contents

About the project

This collections of binaries helps to crawl art around the internet, mainly from subreddits. There are stored in S3, where you can then sort them, and expose them in any way you want.

Getting Started

Prerequisites

For the worker, you'll need two libraries, which you can install via pip3:

  • imgurpython
  • gfycat

In order to sort the images, two external programs are needed:

  • feh
  • vlc

Please refer to your distro package manager on how to install these two.

You will need the following API keys:

  • reddit
  • imgur
  • gfycat
  • S3

Please refer to the documentation on each of those services on how to get a key. You are not required to use Amazon's S3, any provider that is S3 compatible will work too.

Installation

Compilation

In order to compile, you will need the following programs:

  • golang
  • cmake

Clone the repository:

git clone https://git.mobley.ne02ptzero.me/~louis/kart

cd & create the build directory:

cd kart && mkdir -p build && cd build && cmake ..

Compile everything:

make

Usage

Configuration

One configuration file is used by all the services:

reddit:
  id: XXX
  secret: XXX
  user_agent: linux:XXX:0.0.1 (by /u/User)
  username: User
  password: Password
  subreddits:
    - pixelart
    - VoxelArt

imgur:
  id: XXX
  secret: XXX

gfycat:
  id: XXX
  secret: XXX

S3:
  endpoint: s3.fr-par.scw.cloud
  region: fr-par
  id: XXX
  secret: XXX
  bucket: my-bucket
  prefix: non-sorted/
  good_prefix: good/

download:
  directory: /tmp/
  database: /var/db/kart.db.sqlite

scripts:
  directory: /usr/local/sbin/

web:
  port: 8080
  views: /usr/share/kart/views
  static: /usr/share/kart/static
  database: /var/db/kart-web.db.sqlite
  desc: Some description

Worker

The worker is programs that crawl the subreddits, indefinetely. You can run it with:

kart-worker -conf /etc/kart-conf.yml

It will get the posts of subreddits, look for images, gif or videos, and then upload them in S3, if the images are new.

Sorter

The sorter is the progam meant to be run by a human in order to sort the images from the worker.

kart-sorter -conf /etc/kart-conf.yml

It will launch a 'sort session' where the user is asked to say Yes or No to each images the worker has crawled. If the user wants them, the content is moved to another prefix in the S3 bucket, in order to be consumed.

Web

The web binary is the web server that I use to display the images. It is launched with:

kart-web -conf /etc/kart-conf.yml

And it is meant to have a reverse proxy in front of it. Here's the one I use:

server {
    server_name  kart.ne02ptzero.me;

    location / {
        proxy_pass             http://127.0.0.1:8080/;
    }

    location /cdn {
        rewrite                /cdn/(.*) /good/$1  break;
        proxy_pass             https://art.s3.fr-par.scw.cloud/;
    }
}

Contributing

If you want to contribute, please send your patches to ~louis/kart@lists.mobley.ne02ptzero.me You can look at git-send-email.io if you need to setup your mail client.

Bug Report

Please report your bugs at todo/~louis/kart You can also send an email to ~louis/kart@todo.mobley.ne02ptzero.me