Files
reddit_stock_analyzer/README.md
2025-08-26 15:15:36 +02:00

15 KiB
Raw Blame History

rstat - Reddit Stock Analyzer

A powerful, installable command-line tool and web dashboard to scan Reddit for stock ticker mentions, perform sentiment analysis, generate insightful reports, and create shareable summary images.

Key Features

  • Dual-Interface: Use a flexible command-line tool (rstat) for data collection and a simple web dashboard (rstat-dashboard) for data visualization.
  • Flexible Data Scraping:
    • Scan subreddits from a config file or target a single subreddit on the fly.
    • Configure the time window to scan posts from the last 24 hours (for daily cron jobs) or back-fill data from several past days (e.g., last 7 days).
    • Fetches from /new to capture the most recent discussions.
  • Deep Analysis & Storage:
    • Scans both post titles and comments, differentiating between the two.
    • Performs a "deep dive" analysis on posts to calculate the average sentiment of the entire comment section.
    • Persists all data in a local SQLite database (reddit_stocks.db) to track trends over time.
  • Rich Data Enrichment:
    • Calculates sentiment (Bullish, Bearish, Neutral) for every mention using NLTK.
    • Fetches and stores daily closing prices and market capitalization from Yahoo Finance.
  • Interactive Web Dashboard:
    • View Top 10 tickers across all subreddits or on a per-subreddit basis.
    • Click any ticker to get a "Deep Dive" page, showing every post it was mentioned in.
  • Shareable Summary Images:
    • Generate clean, dark-mode summary images for both daily and weekly sentiment for any subreddit, perfect for sharing.
  • High-Quality Data:
    • Uses a configurable blacklist and smart filtering to reduce false positives.
    • Automatically cleans the database of invalid tickers if the blacklist is updated.

Project Structure

reddit_stock_analyzer/
├── .env                  # Your secret API keys
├── requirements.txt      # Project dependencies
├── setup.py              # Installation script for the tool
├── subreddits.json       # Default list of subreddits to scan
├── templates/            # HTML templates for the web dashboard
│   ├── base.html
│   ├── index.html
│   ├── subreddit.html
│   ├── deep_dive.html
│   ├── image_view.html
│   └── weekly_image_view.html
└── rstat_tool/           # The main source code package
    ├── __init__.py
    ├── main.py           # Scraper entry point and CLI logic
    ├── dashboard.py      # Web dashboard entry point (Flask app)
    ├── database.py       # All SQLite database functions
    └── ...

Setup and Installation

Follow these steps to set up the project on your local machine.

1. Prerequisites

  • Python 3.7+
  • Git

2. Clone the Repository

git clone <your-repository-url>
cd reddit_stock_analyzer

3. Set Up a Python Virtual Environment

It is highly recommended to use a virtual environment to manage dependencies.

On macOS / Linux:

python3 -m venv .venv
source .venv/bin/activate

On Windows:

python -m venv .venv
.\.venv\Scripts\activate

4. Install Dependencies

pip install -r requirements.txt

5. Configure Reddit API Credentials

  1. Go to the Reddit Apps preferences page and create a new "script" app.

  2. Create a file named .env in the root of the project directory.

  3. Add your credentials to the .env file like this:

    REDDIT_CLIENT_ID=your_client_id_from_reddit
    REDDIT_CLIENT_SECRET=your_client_secret_from_reddit
    REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.2)
    

6. Set Up NLTK

Run the included setup script once to download the required vader_lexicon for sentiment analysis.

python rstat_tool/setup_nltk.py

7. Set Up Playwright

Run the install routine for playwright. You might need to install some dependencies. Follow on-screen instruction if that's the case.

playwright install

8. Build and Install the Commands

Install the tool in "editable" mode. This creates the rstat and rstat-dashboard commands in your virtual environment and links them to your source code.

pip install -e .

The installation is now complete.


Usage

The tool is split into two commands: one for gathering data and one for viewing it.

1. The Scraper (rstat)

This is the command-line tool you will use to populate the database. It is highly flexible.

Common Commands:

  • Run a daily scan (for cron jobs): Scans subreddits from subreddits.json for posts in the last 24 hours.

    rstat --config subreddits.json --days 1
    
  • Scan a single subreddit: Ignores the config file and scans just one subreddit.

    rstat --subreddit wallstreetbets --days 1
    
  • Back-fill data for last week: Scans a specific subreddit for all new posts in the last 7 days.

    rstat --subreddit Tollbugatabets --days 7
    
  • Get help and see all options:

    rstat --help
    

2. The Web Dashboard (rstat-dashboard)

This command starts a local web server to let you explore the data you've collected.

RSTAT — Reddit Stock Analyzer

Scan Reddit for stock ticker mentions, score sentiment, enrich with price/market cap, and explore the results in a clean web dashboard. Automate shareable images and post them to Reddit.

Highlights

  • CLI + Web UI: Collect data with rstat, browse it with rstat-dashboard.
  • Smart ticker parsing: Prefer $TSLA/$AAPL “golden” matches; fall back to filtered ALL-CAPS words.
  • Sentiment: VADER (NLTK) scores for titles and comments; “deep dive” averages per post.
  • Storage: Local SQLite database reddit_stocks.db with de-duped mentions and post analytics.
  • Enrichment: Yahoo Finance market cap + latest close fetched in batch and on-demand.
  • Images: Export polished daily/weekly summary PNGs for subreddits or “overall”.
  • Automation: Optional cron job plus one-command posting to Reddit with OAuth refresh tokens.

Repository layout

.
├── Dockerfile                     # Multi-stage build (Tailwind -> Python + gunicorn)
├── docker-compose.yml             # Prod (nginx + varnish optional) + dashboard
├── docker-compose-dev.yml         # Dev compose (local nginx)
├── requirements.txt               # Python deps
├── setup.py                       # Installs console scripts
├── subreddits.json                # Default subreddits list
├── reddit_stocks.db               # SQLite database (generated/updated by CLI)
├── export_image.py                # Generate shareable PNGs (Playwright)
├── post_to_reddit.py              # Post latest PNG to Reddit
├── get_refresh_token.py           # One-time OAuth2 refresh token helper
├── fetch_close_price.py           # Utility for closing price (yfinance)
├── fetch_market_cap.py            # Utility for market cap (yfinance)
├── rstat_tool/
│   ├── main.py                    # CLI entry (rstat)
│   ├── dashboard.py               # Flask app entry (rstat-dashboard)
│   ├── database.py                # SQLite schema + queries
│   ├── ticker_extractor.py        # Ticker parsing + blacklist
│   ├── sentiment_analyzer.py      # VADER sentiment
│   ├── cleanup.py                 # Cleanup utilities (rstat-cleanup)
│   ├── flair_finder.py            # Fetch subreddit flair IDs (rstat-flairs)
│   ├── logger_setup.py            # Logging
│   └── setup_nltk.py              # One-time VADER download
├── templates/                     # Jinja2 templates (Tailwind 4 styling)
└── static/                        # Favicon + generated CSS (style.css)

Requirements

  • Python 3.10+ (Docker image uses Python 3.13-slim)
  • Reddit API app (script type) for read + submit
  • For optional image export: Playwright browsers
  • For UI development (optional): Node 18+ to rebuild Tailwind CSS

Setup

  1. Clone and enter the repo
git clone <your-repo>
cd reddit_stock_analyzer
  1. Create and activate a virtualenv
  • bash/zsh:
    python3 -m venv .venv
    source .venv/bin/activate
    
  • fish:
    python3 -m venv .venv
    source .venv/bin/activate.fish
    
  1. Install Python dependencies and commands
pip install -r requirements.txt
pip install -e .
  1. Configure environment

Create a .env file in the repo root with your Reddit app credentials:

REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USER_AGENT=python:rstat:v1.0 (by u/yourname)

Optional (after OAuth step below):

REDDIT_REFRESH_TOKEN=your_refresh_token
  1. One-time NLTK setup
python rstat_tool/setup_nltk.py
  1. Configure subreddits (optional)

Edit subreddits.json to your liking. It ships with a sane default list.

CLI usage (rstat)

The rstat command collects Reddit data and updates the database. Credentials are read from .env.

Common flags (see rstat --help):

  • --config FILE Use a JSON file with {"subreddits": [ ... ]} (default: subreddits.json)
  • --subreddit NAME Scan a single subreddit instead of the config
  • --days N Only scan posts from the last N days (default 1)
  • --posts N Max posts per subreddit to check (default 200)
  • --comments N Max comments per post to scan (default 100)
  • --no-financials Skip Yahoo Finance during the scan (faster)
  • --update-top-tickers Update financials for tickers that are currently top daily/weekly
  • --update-financials-only [TICKER] Update all or a single tickers market cap/close
  • --stdout Log to console as well as file; --debug for verbose

Examples:

# Scan configured subs for last 24h, including financials
rstat --days 1

# Target a single subreddit for the past week, scan more comments
rstat --subreddit wallstreetbets --days 7 --comments 250

# Skip financials during scan, then update only top tickers
rstat --no-financials
rstat --update-top-tickers

# Update financials for all tickers in DB
rstat --update-financials-only

# Update a single ticker (case-insensitive)
rstat --update-financials-only TSLA

How mentions are detected:

  • If a post contains any $TICKER (e.g., $TSLA) anywhere, we use “golden-only” mode: only $-prefixed tickers are considered.
  • Otherwise, we fall back to filtered ALL-CAPS 25 letter words, excluding a large blacklist to avoid false positives.
  • Title tickers attribute all comments in the thread; otherwise, we scan comments directly for mentions.

Web dashboard (rstat-dashboard)

Start the dashboard and open http://127.0.0.1:5000

rstat-dashboard

Features:

  • Overall top 10 (daily/weekly) across all subs
  • Per-subreddit dashboards (daily/weekly)
  • Deep Dive pages listing posts analyzed for a ticker
  • Shareable image-friendly views (UI hides nav when ?image=true)

The dashboard reads from reddit_stocks.db. Run rstat first so you have data.

Image export (export_image.py)

Exports a high-res PNG of the dashboard views via Playwright. Note: the script currently uses https://rstat.net as its base URL.

# Overall daily image
python export_image.py --overall

# Subreddit daily image
python export_image.py --subreddit wallstreetbets

# Weekly view
python export_image.py --subreddit wallstreetbets --weekly

Output files are saved into the images/ folder, e.g. overall_summary_daily_1700000000.png.

Tip: If you want to export from a local dashboard instead of rstat.net, edit base_url in export_image.py.

Post images to Reddit (post_to_reddit.py)

One-time OAuth2 step to obtain a refresh token:

  1. In your Reddit app settings, set the redirect URI to exactly http://localhost:5000 (matches the script).
  2. Run:
python get_refresh_token.py

Follow the on-screen steps: open the generated URL, allow, copy the redirected URL, paste back. Add the printed token to .env as REDDIT_REFRESH_TOKEN.

Now you can post:

# Post the most recent overall image to r/rstat
python post_to_reddit.py

# Post the most recent daily image for a subreddit
python post_to_reddit.py --subreddit wallstreetbets

# Post weekly image for a subreddit
python post_to_reddit.py --subreddit wallstreetbets --weekly

# Choose a target subreddit and (optionally) a flair ID
python post_to_reddit.py --subreddit wallstreetbets --target-subreddit rstat --flair-id <ID>

Need a flair ID? Use the helper:

rstat-flairs wallstreetbets

Cleanup utilities (rstat-cleanup)

Remove blacklisted “ticker” rows and/or purge data for subreddits no longer in your config.

# Show help
rstat-cleanup --help

# Remove tickers that are in the internal COMMON_WORDS_BLACKLIST
rstat-cleanup --tickers

# Remove any subreddit data not in subreddits.json
rstat-cleanup --subreddits

# Use a custom config file
rstat-cleanup --subreddits my_subs.json

# Run both tasks
rstat-cleanup --all

Automation (cron)

An example run_daily_job.sh is provided. Update BASE_DIR and make it executable:

chmod +x run_daily_job.sh

Add a cron entry (example 22:00 daily):

0 22 * * * /absolute/path/to/reddit_stock_analyzer/run_daily_job.sh >> /absolute/path/to/reddit_stock_analyzer/cron.log 2>&1

Docker

Builds a Tailwind CSS layer, then a Python runtime with gunicorn. The compose files include optional nginx and varnish.

Quick start for the dashboard only (uses your host reddit_stocks.db):

docker compose up -d rstat-dashboard

Notes:

  • The rstat-dashboard container mounts ./reddit_stocks.db read-only. Populate it by running rstat on the host (or add a separate CLI container).
  • Prod compose includes nginx (and optional certbot/varnish) configs under config/.

Data model (SQLite)

  • tickers(id, symbol UNIQUE, market_cap, closing_price, last_updated)
  • subreddits(id, name UNIQUE)
  • mentions(id, ticker_id, subreddit_id, post_id, comment_id NULLABLE, mention_type, mention_sentiment, mention_timestamp, UNIQUE(ticker_id, post_id, comment_id))
  • posts(id, post_id UNIQUE, title, post_url, subreddit_id, post_timestamp, comment_count, avg_comment_sentiment)

Uniqueness prevents duplicates across post/comment granularity. Cleanup helpers remove blacklisted “tickers” and stale subreddits.

UI and Tailwind

The CSS (static/css/style.css) is generated from static/css/input.css using Tailwind 4 during Docker build. If you want to tweak UI locally:

npm install
npx tailwindcss -i ./static/css/input.css -o ./static/css/style.css --minify

Troubleshooting

  • Missing VADER: Run python rstat_tool/setup_nltk.py once (in your venv).
  • Playwright errors: Run playwright install once; ensure lib dependencies are present on your OS.
  • yfinance returns None: Retry later; some tickers or regions can be spotty. The app tolerates missing financials.
  • Flair required: If posting fails with flair errors, fetch a valid flair ID and pass --flair-id.
  • Empty dashboards: Make sure rstat ran recently and .env is set; check rstat.log.
  • DB locked: If you edit while the dashboard is reading, wait or stop the server; SQLite locks are short-lived.

Safety and notes

  • Do not commit .env or your database if it contains sensitive data.
  • This project is for research/entertainment. Not investment advice.

Made with Python, Flask, NLTK, Playwright, and Tailwind.