459 lines
15 KiB
Markdown
459 lines
15 KiB
Markdown
# rstat - Reddit Stock Analyzer
|
||
|
||
A powerful, installable command-line tool and web dashboard to scan Reddit for stock ticker mentions, perform sentiment analysis, generate insightful reports, and create shareable summary images.
|
||
|
||
## Key Features
|
||
|
||
* **Dual-Interface:** Use a flexible command-line tool (`rstat`) for data collection and a simple web dashboard (`rstat-dashboard`) for data visualization.
|
||
* **Flexible Data Scraping:**
|
||
* Scan subreddits from a config file or target a single subreddit on the fly.
|
||
* Configure the time window to scan posts from the last 24 hours (for daily cron jobs) or back-fill data from several past days (e.g., last 7 days).
|
||
* Fetches from `/new` to capture the most recent discussions.
|
||
* **Deep Analysis & Storage:**
|
||
* Scans both post titles and comments, differentiating between the two.
|
||
* Performs a "deep dive" analysis on posts to calculate the average sentiment of the entire comment section.
|
||
* Persists all data in a local SQLite database (`reddit_stocks.db`) to track trends over time.
|
||
* **Rich Data Enrichment:**
|
||
* Calculates sentiment (Bullish, Bearish, Neutral) for every mention using NLTK.
|
||
* Fetches and stores daily closing prices and market capitalization from Yahoo Finance.
|
||
* **Interactive Web Dashboard:**
|
||
* View Top 10 tickers across all subreddits or on a per-subreddit basis.
|
||
* Click any ticker to get a "Deep Dive" page, showing every post it was mentioned in.
|
||
* **Shareable Summary Images:**
|
||
* Generate clean, dark-mode summary images for both daily and weekly sentiment for any subreddit, perfect for sharing.
|
||
* **High-Quality Data:**
|
||
* Uses a configurable blacklist and smart filtering to reduce false positives.
|
||
* Automatically cleans the database of invalid tickers if the blacklist is updated.
|
||
|
||
## Project Structure
|
||
|
||
```
|
||
reddit_stock_analyzer/
|
||
├── .env # Your secret API keys
|
||
├── requirements.txt # Project dependencies
|
||
├── setup.py # Installation script for the tool
|
||
├── subreddits.json # Default list of subreddits to scan
|
||
├── templates/ # HTML templates for the web dashboard
|
||
│ ├── base.html
|
||
│ ├── index.html
|
||
│ ├── subreddit.html
|
||
│ ├── deep_dive.html
|
||
│ ├── image_view.html
|
||
│ └── weekly_image_view.html
|
||
└── rstat_tool/ # The main source code package
|
||
├── __init__.py
|
||
├── main.py # Scraper entry point and CLI logic
|
||
├── dashboard.py # Web dashboard entry point (Flask app)
|
||
├── database.py # All SQLite database functions
|
||
└── ...
|
||
```
|
||
|
||
## Setup and Installation
|
||
|
||
Follow these steps to set up the project on your local machine.
|
||
|
||
### 1. Prerequisites
|
||
* Python 3.7+
|
||
* Git
|
||
|
||
### 2. Clone the Repository
|
||
```bash
|
||
git clone <your-repository-url>
|
||
cd reddit_stock_analyzer
|
||
```
|
||
|
||
### 3. Set Up a Python Virtual Environment
|
||
It is highly recommended to use a virtual environment to manage dependencies.
|
||
|
||
**On macOS / Linux:**
|
||
```bash
|
||
python3 -m venv .venv
|
||
source .venv/bin/activate
|
||
```
|
||
|
||
**On Windows:**
|
||
```bash
|
||
python -m venv .venv
|
||
.\.venv\Scripts\activate
|
||
```
|
||
|
||
### 4. Install Dependencies
|
||
```bash
|
||
pip install -r requirements.txt
|
||
```
|
||
|
||
### 5. Configure Reddit API Credentials
|
||
1. Go to the [Reddit Apps preferences page](https://www.reddit.com/prefs/apps) and create a new "script" app.
|
||
2. Create a file named `.env` in the root of the project directory.
|
||
3. Add your credentials to the `.env` file like this:
|
||
|
||
```
|
||
REDDIT_CLIENT_ID=your_client_id_from_reddit
|
||
REDDIT_CLIENT_SECRET=your_client_secret_from_reddit
|
||
REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.2)
|
||
```
|
||
|
||
### 6. Set Up NLTK
|
||
Run the included setup script **once** to download the required `vader_lexicon` for sentiment analysis.
|
||
```bash
|
||
python rstat_tool/setup_nltk.py
|
||
```
|
||
|
||
### 7. Set Up Playwright
|
||
Run the install routine for playwright. You might need to install some dependencies. Follow on-screen instruction if that's the case.
|
||
```bash
|
||
playwright install
|
||
```
|
||
|
||
### 8. Build and Install the Commands
|
||
Install the tool in "editable" mode. This creates the `rstat` and `rstat-dashboard` commands in your virtual environment and links them to your source code.
|
||
|
||
```bash
|
||
pip install -e .
|
||
```
|
||
The installation is now complete.
|
||
|
||
---
|
||
|
||
## Usage
|
||
|
||
The tool is split into two commands: one for gathering data and one for viewing it.
|
||
|
||
### 1. The Scraper (`rstat`)
|
||
|
||
This is the command-line tool you will use to populate the database. It is highly flexible.
|
||
|
||
**Common Commands:**
|
||
|
||
* **Run a daily scan (for cron jobs):** Scans subreddits from `subreddits.json` for posts in the last 24 hours.
|
||
```bash
|
||
rstat --config subreddits.json --days 1
|
||
```
|
||
|
||
* **Scan a single subreddit:** Ignores the config file and scans just one subreddit.
|
||
```bash
|
||
rstat --subreddit wallstreetbets --days 1
|
||
```
|
||
|
||
* **Back-fill data for last week:** Scans a specific subreddit for all new posts in the last 7 days.
|
||
```bash
|
||
rstat --subreddit Tollbugatabets --days 7
|
||
```
|
||
|
||
* **Get help and see all options:**
|
||
```bash
|
||
rstat --help
|
||
```
|
||
|
||
### 2. The Web Dashboard (`rstat-dashboard`)
|
||
|
||
This command starts a local web server to let you explore the data you've collected.
|
||
|
||
<div align="center">
|
||
|
||
# RSTAT — Reddit Stock Analyzer
|
||
|
||
Scan Reddit for stock ticker mentions, score sentiment, enrich with price/market cap, and explore the results in a clean web dashboard. Automate shareable images and post them to Reddit.
|
||
|
||
</div>
|
||
|
||
## Highlights
|
||
|
||
- CLI + Web UI: Collect data with `rstat`, browse it with `rstat-dashboard`.
|
||
- Smart ticker parsing: Prefer $TSLA/$AAPL “golden” matches; fall back to filtered ALL-CAPS words.
|
||
- Sentiment: VADER (NLTK) scores for titles and comments; “deep dive” averages per post.
|
||
- Storage: Local SQLite database `reddit_stocks.db` with de-duped mentions and post analytics.
|
||
- Enrichment: Yahoo Finance market cap + latest close fetched in batch and on-demand.
|
||
- Images: Export polished daily/weekly summary PNGs for subreddits or “overall”.
|
||
- Automation: Optional cron job plus one-command posting to Reddit with OAuth refresh tokens.
|
||
|
||
## Repository layout
|
||
|
||
```
|
||
.
|
||
├── Dockerfile # Multi-stage build (Tailwind -> Python + gunicorn)
|
||
├── docker-compose.yml # Prod (nginx + varnish optional) + dashboard
|
||
├── docker-compose-dev.yml # Dev compose (local nginx)
|
||
├── requirements.txt # Python deps
|
||
├── setup.py # Installs console scripts
|
||
├── subreddits.json # Default subreddits list
|
||
├── reddit_stocks.db # SQLite database (generated/updated by CLI)
|
||
├── export_image.py # Generate shareable PNGs (Playwright)
|
||
├── post_to_reddit.py # Post latest PNG to Reddit
|
||
├── get_refresh_token.py # One-time OAuth2 refresh token helper
|
||
├── fetch_close_price.py # Utility for closing price (yfinance)
|
||
├── fetch_market_cap.py # Utility for market cap (yfinance)
|
||
├── rstat_tool/
|
||
│ ├── main.py # CLI entry (rstat)
|
||
│ ├── dashboard.py # Flask app entry (rstat-dashboard)
|
||
│ ├── database.py # SQLite schema + queries
|
||
│ ├── ticker_extractor.py # Ticker parsing + blacklist
|
||
│ ├── sentiment_analyzer.py # VADER sentiment
|
||
│ ├── cleanup.py # Cleanup utilities (rstat-cleanup)
|
||
│ ├── flair_finder.py # Fetch subreddit flair IDs (rstat-flairs)
|
||
│ ├── logger_setup.py # Logging
|
||
│ └── setup_nltk.py # One-time VADER download
|
||
├── templates/ # Jinja2 templates (Tailwind 4 styling)
|
||
└── static/ # Favicon + generated CSS (style.css)
|
||
```
|
||
|
||
## Requirements
|
||
|
||
- Python 3.10+ (Docker image uses Python 3.13-slim)
|
||
- Reddit API app (script type) for read + submit
|
||
- For optional image export: Playwright browsers
|
||
- For UI development (optional): Node 18+ to rebuild Tailwind CSS
|
||
|
||
## Setup
|
||
|
||
1) Clone and enter the repo
|
||
|
||
```bash
|
||
git clone <your-repo>
|
||
cd reddit_stock_analyzer
|
||
```
|
||
|
||
2) Create and activate a virtualenv
|
||
|
||
- bash/zsh:
|
||
```bash
|
||
python3 -m venv .venv
|
||
source .venv/bin/activate
|
||
```
|
||
- fish:
|
||
```fish
|
||
python3 -m venv .venv
|
||
source .venv/bin/activate.fish
|
||
```
|
||
|
||
3) Install Python dependencies and commands
|
||
|
||
```bash
|
||
pip install -r requirements.txt
|
||
pip install -e .
|
||
```
|
||
|
||
4) Configure environment
|
||
|
||
Create a `.env` file in the repo root with your Reddit app credentials:
|
||
|
||
```
|
||
REDDIT_CLIENT_ID=your_client_id
|
||
REDDIT_CLIENT_SECRET=your_client_secret
|
||
REDDIT_USER_AGENT=python:rstat:v1.0 (by u/yourname)
|
||
```
|
||
|
||
Optional (after OAuth step below):
|
||
|
||
```
|
||
REDDIT_REFRESH_TOKEN=your_refresh_token
|
||
```
|
||
|
||
5) One-time NLTK setup
|
||
|
||
```bash
|
||
python rstat_tool/setup_nltk.py
|
||
```
|
||
|
||
6) Configure subreddits (optional)
|
||
|
||
Edit `subreddits.json` to your liking. It ships with a sane default list.
|
||
|
||
## CLI usage (rstat)
|
||
|
||
The `rstat` command collects Reddit data and updates the database. Credentials are read from `.env`.
|
||
|
||
Common flags (see `rstat --help`):
|
||
|
||
- `--config FILE` Use a JSON file with `{"subreddits": [ ... ]}` (default: `subreddits.json`)
|
||
- `--subreddit NAME` Scan a single subreddit instead of the config
|
||
- `--days N` Only scan posts from the last N days (default 1)
|
||
- `--posts N` Max posts per subreddit to check (default 200)
|
||
- `--comments N` Max comments per post to scan (default 100)
|
||
- `--no-financials` Skip Yahoo Finance during the scan (faster)
|
||
- `--update-top-tickers` Update financials for tickers that are currently top daily/weekly
|
||
- `--update-financials-only [TICKER]` Update all or a single ticker’s market cap/close
|
||
- `--stdout` Log to console as well as file; `--debug` for verbose
|
||
|
||
Examples:
|
||
|
||
```bash
|
||
# Scan configured subs for last 24h, including financials
|
||
rstat --days 1
|
||
|
||
# Target a single subreddit for the past week, scan more comments
|
||
rstat --subreddit wallstreetbets --days 7 --comments 250
|
||
|
||
# Skip financials during scan, then update only top tickers
|
||
rstat --no-financials
|
||
rstat --update-top-tickers
|
||
|
||
# Update financials for all tickers in DB
|
||
rstat --update-financials-only
|
||
|
||
# Update a single ticker (case-insensitive)
|
||
rstat --update-financials-only TSLA
|
||
```
|
||
|
||
How mentions are detected:
|
||
|
||
- If a post contains any $TICKER (e.g., `$TSLA`) anywhere, we use “golden-only” mode: only $-prefixed tickers are considered.
|
||
- Otherwise, we fall back to filtered ALL-CAPS 2–5 letter words, excluding a large blacklist to avoid false positives.
|
||
- Title tickers attribute all comments in the thread; otherwise, we scan comments directly for mentions.
|
||
|
||
## Web dashboard (rstat-dashboard)
|
||
|
||
Start the dashboard and open http://127.0.0.1:5000
|
||
|
||
```bash
|
||
rstat-dashboard
|
||
```
|
||
|
||
Features:
|
||
|
||
- Overall top 10 (daily/weekly) across all subs
|
||
- Per-subreddit dashboards (daily/weekly)
|
||
- Deep Dive pages listing posts analyzed for a ticker
|
||
- Shareable image-friendly views (UI hides nav when `?image=true`)
|
||
|
||
The dashboard reads from `reddit_stocks.db`. Run `rstat` first so you have data.
|
||
|
||
## Image export (export_image.py)
|
||
|
||
Exports a high-res PNG of the dashboard views via Playwright. Note: the script currently uses `https://rstat.net` as its base URL.
|
||
|
||
```bash
|
||
# Overall daily image
|
||
python export_image.py --overall
|
||
|
||
# Subreddit daily image
|
||
python export_image.py --subreddit wallstreetbets
|
||
|
||
# Weekly view
|
||
python export_image.py --subreddit wallstreetbets --weekly
|
||
```
|
||
|
||
Output files are saved into the `images/` folder, e.g. `overall_summary_daily_1700000000.png`.
|
||
|
||
Tip: If you want to export from a local dashboard instead of rstat.net, edit `base_url` in `export_image.py`.
|
||
|
||
## Post images to Reddit (post_to_reddit.py)
|
||
|
||
One-time OAuth2 step to obtain a refresh token:
|
||
|
||
1) In your Reddit app settings, set the redirect URI to exactly `http://localhost:5000` (matches the script).
|
||
2) Run:
|
||
|
||
```bash
|
||
python get_refresh_token.py
|
||
```
|
||
|
||
Follow the on-screen steps: open the generated URL, allow, copy the redirected URL, paste back. Add the printed token to `.env` as `REDDIT_REFRESH_TOKEN`.
|
||
|
||
Now you can post:
|
||
|
||
```bash
|
||
# Post the most recent overall image to r/rstat
|
||
python post_to_reddit.py
|
||
|
||
# Post the most recent daily image for a subreddit
|
||
python post_to_reddit.py --subreddit wallstreetbets
|
||
|
||
# Post weekly image for a subreddit
|
||
python post_to_reddit.py --subreddit wallstreetbets --weekly
|
||
|
||
# Choose a target subreddit and (optionally) a flair ID
|
||
python post_to_reddit.py --subreddit wallstreetbets --target-subreddit rstat --flair-id <ID>
|
||
```
|
||
|
||
Need a flair ID? Use the helper:
|
||
|
||
```bash
|
||
rstat-flairs wallstreetbets
|
||
```
|
||
|
||
## Cleanup utilities (rstat-cleanup)
|
||
|
||
Remove blacklisted “ticker” rows and/or purge data for subreddits no longer in your config.
|
||
|
||
```bash
|
||
# Show help
|
||
rstat-cleanup --help
|
||
|
||
# Remove tickers that are in the internal COMMON_WORDS_BLACKLIST
|
||
rstat-cleanup --tickers
|
||
|
||
# Remove any subreddit data not in subreddits.json
|
||
rstat-cleanup --subreddits
|
||
|
||
# Use a custom config file
|
||
rstat-cleanup --subreddits my_subs.json
|
||
|
||
# Run both tasks
|
||
rstat-cleanup --all
|
||
```
|
||
|
||
## Automation (cron)
|
||
|
||
An example `run_daily_job.sh` is provided. Update `BASE_DIR` and make it executable:
|
||
|
||
```bash
|
||
chmod +x run_daily_job.sh
|
||
```
|
||
|
||
Add a cron entry (example 22:00 daily):
|
||
|
||
```
|
||
0 22 * * * /absolute/path/to/reddit_stock_analyzer/run_daily_job.sh >> /absolute/path/to/reddit_stock_analyzer/cron.log 2>&1
|
||
```
|
||
|
||
## Docker
|
||
|
||
Builds a Tailwind CSS layer, then a Python runtime with gunicorn. The compose files include optional nginx and varnish.
|
||
|
||
Quick start for the dashboard only (uses your host `reddit_stocks.db`):
|
||
|
||
```bash
|
||
docker compose up -d rstat-dashboard
|
||
```
|
||
|
||
Notes:
|
||
|
||
- The `rstat-dashboard` container mounts `./reddit_stocks.db` read-only. Populate it by running `rstat` on the host (or add a separate CLI container).
|
||
- Prod compose includes nginx (and optional certbot/varnish) configs under `config/`.
|
||
|
||
## Data model (SQLite)
|
||
|
||
- `tickers(id, symbol UNIQUE, market_cap, closing_price, last_updated)`
|
||
- `subreddits(id, name UNIQUE)`
|
||
- `mentions(id, ticker_id, subreddit_id, post_id, comment_id NULLABLE, mention_type, mention_sentiment, mention_timestamp, UNIQUE(ticker_id, post_id, comment_id))`
|
||
- `posts(id, post_id UNIQUE, title, post_url, subreddit_id, post_timestamp, comment_count, avg_comment_sentiment)`
|
||
|
||
Uniqueness prevents duplicates across post/comment granularity. Cleanup helpers remove blacklisted “tickers” and stale subreddits.
|
||
|
||
## UI and Tailwind
|
||
|
||
The CSS (`static/css/style.css`) is generated from `static/css/input.css` using Tailwind 4 during Docker build. If you want to tweak UI locally:
|
||
|
||
```bash
|
||
npm install
|
||
npx tailwindcss -i ./static/css/input.css -o ./static/css/style.css --minify
|
||
```
|
||
|
||
## Troubleshooting
|
||
|
||
- Missing VADER: Run `python rstat_tool/setup_nltk.py` once (in your venv).
|
||
- Playwright errors: Run `playwright install` once; ensure lib dependencies are present on your OS.
|
||
- yfinance returns None: Retry later; some tickers or regions can be spotty. The app tolerates missing financials.
|
||
- Flair required: If posting fails with flair errors, fetch a valid flair ID and pass `--flair-id`.
|
||
- Empty dashboards: Make sure `rstat` ran recently and `.env` is set; check `rstat.log`.
|
||
- DB locked: If you edit while the dashboard is reading, wait or stop the server; SQLite locks are short-lived.
|
||
|
||
## Safety and notes
|
||
|
||
- Do not commit `.env` or your database if it contains sensitive data.
|
||
- This project is for research/entertainment. Not investment advice.
|
||
|
||
---
|
||
|
||
Made with Python, Flask, NLTK, Playwright, and Tailwind. |