# rstat - Reddit Stock Analyzer A powerful, installable command-line tool and web dashboard to scan Reddit for stock ticker mentions, perform sentiment analysis, generate insightful reports, and create shareable summary images. ## Key Features * **Dual-Interface:** Use a flexible command-line tool (`rstat`) for data collection and a simple web dashboard (`rstat-dashboard`) for data visualization. * **Flexible Data Scraping:** * Scan subreddits from a config file or target a single subreddit on the fly. * Configure the time window to scan posts from the last 24 hours (for daily cron jobs) or back-fill data from several past days (e.g., last 7 days). * Fetches from `/new` to capture the most recent discussions. * **Deep Analysis & Storage:** * Scans both post titles and comments, differentiating between the two. * Performs a "deep dive" analysis on posts to calculate the average sentiment of the entire comment section. * Persists all data in a local SQLite database (`reddit_stocks.db`) to track trends over time. * **Rich Data Enrichment:** * Calculates sentiment (Bullish, Bearish, Neutral) for every mention using NLTK. * Fetches and stores daily closing prices and market capitalization from Yahoo Finance. * **Interactive Web Dashboard:** * View Top 10 tickers across all subreddits or on a per-subreddit basis. * Click any ticker to get a "Deep Dive" page, showing every post it was mentioned in. * **Shareable Summary Images:** * Generate clean, dark-mode summary images for both daily and weekly sentiment for any subreddit, perfect for sharing. * **High-Quality Data:** * Uses a configurable blacklist and smart filtering to reduce false positives. * Automatically cleans the database of invalid tickers if the blacklist is updated. ## Project Structure ``` reddit_stock_analyzer/ ├── .env # Your secret API keys ├── requirements.txt # Project dependencies ├── setup.py # Installation script for the tool ├── subreddits.json # Default list of subreddits to scan ├── templates/ # HTML templates for the web dashboard │ ├── base.html │ ├── index.html │ ├── subreddit.html │ ├── deep_dive.html │ ├── image_view.html │ └── weekly_image_view.html └── rstat_tool/ # The main source code package ├── __init__.py ├── main.py # Scraper entry point and CLI logic ├── dashboard.py # Web dashboard entry point (Flask app) ├── database.py # All SQLite database functions └── ... ``` ## Setup and Installation Follow these steps to set up the project on your local machine. ### 1. Prerequisites * Python 3.7+ * Git ### 2. Clone the Repository ```bash git clone cd reddit_stock_analyzer ``` ### 3. Set Up a Python Virtual Environment It is highly recommended to use a virtual environment to manage dependencies. **On macOS / Linux:** ```bash python3 -m venv .venv source .venv/bin/activate ``` **On Windows:** ```bash python -m venv .venv .\.venv\Scripts\activate ``` ### 4. Install Dependencies ```bash pip install -r requirements.txt ``` ### 5. Configure Reddit API Credentials 1. Go to the [Reddit Apps preferences page](https://www.reddit.com/prefs/apps) and create a new "script" app. 2. Create a file named `.env` in the root of the project directory. 3. Add your credentials to the `.env` file like this: ``` REDDIT_CLIENT_ID=your_client_id_from_reddit REDDIT_CLIENT_SECRET=your_client_secret_from_reddit REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.2) ``` ### 6. Set Up NLTK Run the included setup script **once** to download the required `vader_lexicon` for sentiment analysis. ```bash python rstat_tool/setup_nltk.py ``` ### 7. Set Up Playwright Run the install routine for playwright. You might need to install some dependencies. Follow on-screen instruction if that's the case. ```bash playwright install ``` ### 8. Build and Install the Commands Install the tool in "editable" mode. This creates the `rstat` and `rstat-dashboard` commands in your virtual environment and links them to your source code. ```bash pip install -e . ``` The installation is now complete. --- ## Usage The tool is split into two commands: one for gathering data and one for viewing it. ### 1. The Scraper (`rstat`) This is the command-line tool you will use to populate the database. It is highly flexible. **Common Commands:** * **Run a daily scan (for cron jobs):** Scans subreddits from `subreddits.json` for posts in the last 24 hours. ```bash rstat --config subreddits.json --days 1 ``` * **Scan a single subreddit:** Ignores the config file and scans just one subreddit. ```bash rstat --subreddit wallstreetbets --days 1 ``` * **Back-fill data for last week:** Scans a specific subreddit for all new posts in the last 7 days. ```bash rstat --subreddit Tollbugatabets --days 7 ``` * **Get help and see all options:** ```bash rstat --help ``` ### 2. The Web Dashboard (`rstat-dashboard`) This command starts a local web server to let you explore the data you've collected.
# RSTAT — Reddit Stock Analyzer Scan Reddit for stock ticker mentions, score sentiment, enrich with price/market cap, and explore the results in a clean web dashboard. Automate shareable images and post them to Reddit.
## Highlights - CLI + Web UI: Collect data with `rstat`, browse it with `rstat-dashboard`. - Smart ticker parsing: Prefer $TSLA/$AAPL “golden” matches; fall back to filtered ALL-CAPS words. - Sentiment: VADER (NLTK) scores for titles and comments; “deep dive” averages per post. - Storage: Local SQLite database `reddit_stocks.db` with de-duped mentions and post analytics. - Enrichment: Yahoo Finance market cap + latest close fetched in batch and on-demand. - Images: Export polished daily/weekly summary PNGs for subreddits or “overall”. - Automation: Optional cron job plus one-command posting to Reddit with OAuth refresh tokens. ## Repository layout ``` . ├── Dockerfile # Multi-stage build (Tailwind -> Python + gunicorn) ├── docker-compose.yml # Prod (nginx + varnish optional) + dashboard ├── docker-compose-dev.yml # Dev compose (local nginx) ├── requirements.txt # Python deps ├── setup.py # Installs console scripts ├── subreddits.json # Default subreddits list ├── reddit_stocks.db # SQLite database (generated/updated by CLI) ├── export_image.py # Generate shareable PNGs (Playwright) ├── post_to_reddit.py # Post latest PNG to Reddit ├── get_refresh_token.py # One-time OAuth2 refresh token helper ├── fetch_close_price.py # Utility for closing price (yfinance) ├── fetch_market_cap.py # Utility for market cap (yfinance) ├── rstat_tool/ │ ├── main.py # CLI entry (rstat) │ ├── dashboard.py # Flask app entry (rstat-dashboard) │ ├── database.py # SQLite schema + queries │ ├── ticker_extractor.py # Ticker parsing + blacklist │ ├── sentiment_analyzer.py # VADER sentiment │ ├── cleanup.py # Cleanup utilities (rstat-cleanup) │ ├── flair_finder.py # Fetch subreddit flair IDs (rstat-flairs) │ ├── logger_setup.py # Logging │ └── setup_nltk.py # One-time VADER download ├── templates/ # Jinja2 templates (Tailwind 4 styling) └── static/ # Favicon + generated CSS (style.css) ``` ## Requirements - Python 3.10+ (Docker image uses Python 3.13-slim) - Reddit API app (script type) for read + submit - For optional image export: Playwright browsers - For UI development (optional): Node 18+ to rebuild Tailwind CSS ## Setup 1) Clone and enter the repo ```bash git clone cd reddit_stock_analyzer ``` 2) Create and activate a virtualenv - bash/zsh: ```bash python3 -m venv .venv source .venv/bin/activate ``` - fish: ```fish python3 -m venv .venv source .venv/bin/activate.fish ``` 3) Install Python dependencies and commands ```bash pip install -r requirements.txt pip install -e . ``` 4) Configure environment Create a `.env` file in the repo root with your Reddit app credentials: ``` REDDIT_CLIENT_ID=your_client_id REDDIT_CLIENT_SECRET=your_client_secret REDDIT_USER_AGENT=python:rstat:v1.0 (by u/yourname) ``` Optional (after OAuth step below): ``` REDDIT_REFRESH_TOKEN=your_refresh_token ``` 5) One-time NLTK setup ```bash python rstat_tool/setup_nltk.py ``` 6) Configure subreddits (optional) Edit `subreddits.json` to your liking. It ships with a sane default list. ## CLI usage (rstat) The `rstat` command collects Reddit data and updates the database. Credentials are read from `.env`. Common flags (see `rstat --help`): - `--config FILE` Use a JSON file with `{"subreddits": [ ... ]}` (default: `subreddits.json`) - `--subreddit NAME` Scan a single subreddit instead of the config - `--days N` Only scan posts from the last N days (default 1) - `--posts N` Max posts per subreddit to check (default 200) - `--comments N` Max comments per post to scan (default 100) - `--no-financials` Skip Yahoo Finance during the scan (faster) - `--update-top-tickers` Update financials for tickers that are currently top daily/weekly - `--update-financials-only [TICKER]` Update all or a single ticker’s market cap/close - `--stdout` Log to console as well as file; `--debug` for verbose Examples: ```bash # Scan configured subs for last 24h, including financials rstat --days 1 # Target a single subreddit for the past week, scan more comments rstat --subreddit wallstreetbets --days 7 --comments 250 # Skip financials during scan, then update only top tickers rstat --no-financials rstat --update-top-tickers # Update financials for all tickers in DB rstat --update-financials-only # Update a single ticker (case-insensitive) rstat --update-financials-only TSLA ``` How mentions are detected: - If a post contains any $TICKER (e.g., `$TSLA`) anywhere, we use “golden-only” mode: only $-prefixed tickers are considered. - Otherwise, we fall back to filtered ALL-CAPS 2–5 letter words, excluding a large blacklist to avoid false positives. - Title tickers attribute all comments in the thread; otherwise, we scan comments directly for mentions. ## Web dashboard (rstat-dashboard) Start the dashboard and open http://127.0.0.1:5000 ```bash rstat-dashboard ``` Features: - Overall top 10 (daily/weekly) across all subs - Per-subreddit dashboards (daily/weekly) - Deep Dive pages listing posts analyzed for a ticker - Shareable image-friendly views (UI hides nav when `?image=true`) The dashboard reads from `reddit_stocks.db`. Run `rstat` first so you have data. ## Image export (export_image.py) Exports a high-res PNG of the dashboard views via Playwright. Note: the script currently uses `https://rstat.net` as its base URL. ```bash # Overall daily image python export_image.py --overall # Subreddit daily image python export_image.py --subreddit wallstreetbets # Weekly view python export_image.py --subreddit wallstreetbets --weekly ``` Output files are saved into the `images/` folder, e.g. `overall_summary_daily_1700000000.png`. Tip: If you want to export from a local dashboard instead of rstat.net, edit `base_url` in `export_image.py`. ## Post images to Reddit (post_to_reddit.py) One-time OAuth2 step to obtain a refresh token: 1) In your Reddit app settings, set the redirect URI to exactly `http://localhost:5000` (matches the script). 2) Run: ```bash python get_refresh_token.py ``` Follow the on-screen steps: open the generated URL, allow, copy the redirected URL, paste back. Add the printed token to `.env` as `REDDIT_REFRESH_TOKEN`. Now you can post: ```bash # Post the most recent overall image to r/rstat python post_to_reddit.py # Post the most recent daily image for a subreddit python post_to_reddit.py --subreddit wallstreetbets # Post weekly image for a subreddit python post_to_reddit.py --subreddit wallstreetbets --weekly # Choose a target subreddit and (optionally) a flair ID python post_to_reddit.py --subreddit wallstreetbets --target-subreddit rstat --flair-id ``` Need a flair ID? Use the helper: ```bash rstat-flairs wallstreetbets ``` ## Cleanup utilities (rstat-cleanup) Remove blacklisted “ticker” rows and/or purge data for subreddits no longer in your config. ```bash # Show help rstat-cleanup --help # Remove tickers that are in the internal COMMON_WORDS_BLACKLIST rstat-cleanup --tickers # Remove any subreddit data not in subreddits.json rstat-cleanup --subreddits # Use a custom config file rstat-cleanup --subreddits my_subs.json # Run both tasks rstat-cleanup --all ``` ## Automation (cron) An example `run_daily_job.sh` is provided. Update `BASE_DIR` and make it executable: ```bash chmod +x run_daily_job.sh ``` Add a cron entry (example 22:00 daily): ``` 0 22 * * * /absolute/path/to/reddit_stock_analyzer/run_daily_job.sh >> /absolute/path/to/reddit_stock_analyzer/cron.log 2>&1 ``` ## Docker Builds a Tailwind CSS layer, then a Python runtime with gunicorn. The compose files include optional nginx and varnish. Quick start for the dashboard only (uses your host `reddit_stocks.db`): ```bash docker compose up -d rstat-dashboard ``` Notes: - The `rstat-dashboard` container mounts `./reddit_stocks.db` read-only. Populate it by running `rstat` on the host (or add a separate CLI container). - Prod compose includes nginx (and optional certbot/varnish) configs under `config/`. ## Data model (SQLite) - `tickers(id, symbol UNIQUE, market_cap, closing_price, last_updated)` - `subreddits(id, name UNIQUE)` - `mentions(id, ticker_id, subreddit_id, post_id, comment_id NULLABLE, mention_type, mention_sentiment, mention_timestamp, UNIQUE(ticker_id, post_id, comment_id))` - `posts(id, post_id UNIQUE, title, post_url, subreddit_id, post_timestamp, comment_count, avg_comment_sentiment)` Uniqueness prevents duplicates across post/comment granularity. Cleanup helpers remove blacklisted “tickers” and stale subreddits. ## UI and Tailwind The CSS (`static/css/style.css`) is generated from `static/css/input.css` using Tailwind 4 during Docker build. If you want to tweak UI locally: ```bash npm install npx tailwindcss -i ./static/css/input.css -o ./static/css/style.css --minify ``` ## Troubleshooting - Missing VADER: Run `python rstat_tool/setup_nltk.py` once (in your venv). - Playwright errors: Run `playwright install` once; ensure lib dependencies are present on your OS. - yfinance returns None: Retry later; some tickers or regions can be spotty. The app tolerates missing financials. - Flair required: If posting fails with flair errors, fetch a valid flair ID and pass `--flair-id`. - Empty dashboards: Make sure `rstat` ran recently and `.env` is set; check `rstat.log`. - DB locked: If you edit while the dashboard is reading, wait or stop the server; SQLite locks are short-lived. ## Safety and notes - Do not commit `.env` or your database if it contains sensitive data. - This project is for research/entertainment. Not investment advice. --- Made with Python, Flask, NLTK, Playwright, and Tailwind.