Files
reddit_stock_analyzer/README.md
2025-08-26 15:16:53 +02:00

10 KiB
Raw Blame History

RSTAT — Reddit Stock Analyzer

Scan Reddit for stock ticker mentions, score sentiment, enrich with price/market cap, and explore the results in a clean web dashboard. Automate shareable images and post them to Reddit.

Highlights

  • CLI + Web UI: Collect data with rstat, browse it with rstat-dashboard.
  • Smart ticker parsing: Prefer $TSLA/$AAPL “golden” matches; fall back to filtered ALL-CAPS words.
  • Sentiment: VADER (NLTK) scores for titles and comments; “deep dive” averages per post.
  • Storage: Local SQLite database reddit_stocks.db with de-duped mentions and post analytics.
  • Enrichment: Yahoo Finance market cap + latest close fetched in batch and on-demand.
  • Images: Export polished daily/weekly summary PNGs for subreddits or “overall”.
  • Automation: Optional cron job plus one-command posting to Reddit with OAuth refresh tokens.

Repository layout

.
├── Dockerfile                     # Multi-stage build (Tailwind -> Python + gunicorn)
├── docker-compose.yml             # Prod (nginx + varnish optional) + dashboard
├── docker-compose-dev.yml         # Dev compose (local nginx)
├── requirements.txt               # Python deps
├── setup.py                       # Installs console scripts
├── subreddits.json                # Default subreddits list
├── reddit_stocks.db               # SQLite database (generated/updated by CLI)
├── export_image.py                # Generate shareable PNGs (Playwright)
├── post_to_reddit.py              # Post latest PNG to Reddit
├── get_refresh_token.py           # One-time OAuth2 refresh token helper
├── fetch_close_price.py           # Utility for closing price (yfinance)
├── fetch_market_cap.py            # Utility for market cap (yfinance)
├── rstat_tool/
│   ├── main.py                    # CLI entry (rstat)
│   ├── dashboard.py               # Flask app entry (rstat-dashboard)
│   ├── database.py                # SQLite schema + queries
│   ├── ticker_extractor.py        # Ticker parsing + blacklist
│   ├── sentiment_analyzer.py      # VADER sentiment
│   ├── cleanup.py                 # Cleanup utilities (rstat-cleanup)
│   ├── flair_finder.py            # Fetch subreddit flair IDs (rstat-flairs)
│   ├── logger_setup.py            # Logging
│   └── setup_nltk.py              # One-time VADER download
├── templates/                     # Jinja2 templates (Tailwind 4 styling)
└── static/                        # Favicon + generated CSS (style.css)

Requirements

  • Python 3.10+ (Docker image uses Python 3.13-slim)
  • Reddit API app (script type) for read + submit
  • For optional image export: Playwright browsers
  • For UI development (optional): Node 18+ to rebuild Tailwind CSS

Setup

  1. Clone and enter the repo
git clone <your-repo>
cd reddit_stock_analyzer
  1. Create and activate a virtualenv
  • bash/zsh:
    python3 -m venv .venv
    source .venv/bin/activate
    
  • fish:
    python3 -m venv .venv
    source .venv/bin/activate.fish
    
  1. Install Python dependencies and commands
pip install -r requirements.txt
pip install -e .
  1. Configure environment

Create a .env file in the repo root with your Reddit app credentials:

REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USER_AGENT=python:rstat:v1.0 (by u/yourname)

Optional (after OAuth step below):

REDDIT_REFRESH_TOKEN=your_refresh_token
  1. One-time NLTK setup
python rstat_tool/setup_nltk.py
  1. Configure subreddits (optional)

Edit subreddits.json to your liking. It ships with a sane default list.

CLI usage (rstat)

The rstat command collects Reddit data and updates the database. Credentials are read from .env.

Common flags (see rstat --help):

  • --config FILE Use a JSON file with {"subreddits": [ ... ]} (default: subreddits.json)
  • --subreddit NAME Scan a single subreddit instead of the config
  • --days N Only scan posts from the last N days (default 1)
  • --posts N Max posts per subreddit to check (default 200)
  • --comments N Max comments per post to scan (default 100)
  • --no-financials Skip Yahoo Finance during the scan (faster)
  • --update-top-tickers Update financials for tickers that are currently top daily/weekly
  • --update-financials-only [TICKER] Update all or a single tickers market cap/close
  • --stdout Log to console as well as file; --debug for verbose

Examples:

# Scan configured subs for last 24h, including financials
rstat --days 1

# Target a single subreddit for the past week, scan more comments
rstat --subreddit wallstreetbets --days 7 --comments 250

# Skip financials during scan, then update only top tickers
rstat --no-financials
rstat --update-top-tickers

# Update financials for all tickers in DB
rstat --update-financials-only

# Update a single ticker (case-insensitive)
rstat --update-financials-only TSLA

How mentions are detected:

  • If a post contains any $TICKER (e.g., $TSLA) anywhere, we use “golden-only” mode: only $-prefixed tickers are considered.
  • Otherwise, we fall back to filtered ALL-CAPS 25 letter words, excluding a large blacklist to avoid false positives.
  • Title tickers attribute all comments in the thread; otherwise, we scan comments directly for mentions.

Web dashboard (rstat-dashboard)

Start the dashboard and open http://127.0.0.1:5000

rstat-dashboard

Features:

  • Overall top 10 (daily/weekly) across all subs
  • Per-subreddit dashboards (daily/weekly)
  • Deep Dive pages listing posts analyzed for a ticker
  • Shareable image-friendly views (UI hides nav when ?image=true)

The dashboard reads from reddit_stocks.db. Run rstat first so you have data.

Image export (export_image.py)

Exports a high-res PNG of the dashboard views via Playwright. Note: the script currently uses https://rstat.net as its base URL.

# Overall daily image
python export_image.py --overall

# Subreddit daily image
python export_image.py --subreddit wallstreetbets

# Weekly view
python export_image.py --subreddit wallstreetbets --weekly

Output files are saved into the images/ folder, e.g. overall_summary_daily_1700000000.png.

Tip: If you want to export from a local dashboard instead of rstat.net, edit base_url in export_image.py.

Post images to Reddit (post_to_reddit.py)

One-time OAuth2 step to obtain a refresh token:

  1. In your Reddit app settings, set the redirect URI to exactly http://localhost:5000 (matches the script).
  2. Run:
python get_refresh_token.py

Follow the on-screen steps: open the generated URL, allow, copy the redirected URL, paste back. Add the printed token to .env as REDDIT_REFRESH_TOKEN.

Now you can post:

# Post the most recent overall image to r/rstat
python post_to_reddit.py

# Post the most recent daily image for a subreddit
python post_to_reddit.py --subreddit wallstreetbets

# Post weekly image for a subreddit
python post_to_reddit.py --subreddit wallstreetbets --weekly

# Choose a target subreddit and (optionally) a flair ID
python post_to_reddit.py --subreddit wallstreetbets --target-subreddit rstat --flair-id <ID>

Need a flair ID? Use the helper:

rstat-flairs wallstreetbets

Cleanup utilities (rstat-cleanup)

Remove blacklisted “ticker” rows and/or purge data for subreddits no longer in your config.

# Show help
rstat-cleanup --help

# Remove tickers that are in the internal COMMON_WORDS_BLACKLIST
rstat-cleanup --tickers

# Remove any subreddit data not in subreddits.json
rstat-cleanup --subreddits

# Use a custom config file
rstat-cleanup --subreddits my_subs.json

# Run both tasks
rstat-cleanup --all

Automation (cron)

An example run_daily_job.sh is provided. Update BASE_DIR and make it executable:

chmod +x run_daily_job.sh

Add a cron entry (example 22:00 daily):

0 22 * * * /absolute/path/to/reddit_stock_analyzer/run_daily_job.sh >> /absolute/path/to/reddit_stock_analyzer/cron.log 2>&1

Docker

Builds a Tailwind CSS layer, then a Python runtime with gunicorn. The compose files include optional nginx and varnish.

Quick start for the dashboard only (uses your host reddit_stocks.db):

docker compose up -d rstat-dashboard

Notes:

  • The rstat-dashboard container mounts ./reddit_stocks.db read-only. Populate it by running rstat on the host (or add a separate CLI container).
  • Prod compose includes nginx (and optional certbot/varnish) configs under config/.

Data model (SQLite)

  • tickers(id, symbol UNIQUE, market_cap, closing_price, last_updated)
  • subreddits(id, name UNIQUE)
  • mentions(id, ticker_id, subreddit_id, post_id, comment_id NULLABLE, mention_type, mention_sentiment, mention_timestamp, UNIQUE(ticker_id, post_id, comment_id))
  • posts(id, post_id UNIQUE, title, post_url, subreddit_id, post_timestamp, comment_count, avg_comment_sentiment)

Uniqueness prevents duplicates across post/comment granularity. Cleanup helpers remove blacklisted “tickers” and stale subreddits.

UI and Tailwind

The CSS (static/css/style.css) is generated from static/css/input.css using Tailwind 4 during Docker build. If you want to tweak UI locally:

npm install
npx tailwindcss -i ./static/css/input.css -o ./static/css/style.css --minify

Troubleshooting

  • Missing VADER: Run python rstat_tool/setup_nltk.py once (in your venv).
  • Playwright errors: Run playwright install once; ensure lib dependencies are present on your OS.
  • yfinance returns None: Retry later; some tickers or regions can be spotty. The app tolerates missing financials.
  • Flair required: If posting fails with flair errors, fetch a valid flair ID and pass --flair-id.
  • Empty dashboards: Make sure rstat ran recently and .env is set; check rstat.log.
  • DB locked: If you edit while the dashboard is reading, wait or stop the server; SQLite locks are short-lived.

Safety and notes

  • Do not commit .env or your database if it contains sensitive data.
  • This project is for research/entertainment. Not investment advice.

Made with Python, Flask, NLTK, Playwright, and Tailwind.