Files
reddit_stock_analyzer/README.md
2025-08-26 15:16:53 +02:00

308 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<div align="center">
# RSTAT — Reddit Stock Analyzer
Scan Reddit for stock ticker mentions, score sentiment, enrich with price/market cap, and explore the results in a clean web dashboard. Automate shareable images and post them to Reddit.
</div>
## Highlights
- CLI + Web UI: Collect data with `rstat`, browse it with `rstat-dashboard`.
- Smart ticker parsing: Prefer $TSLA/$AAPL “golden” matches; fall back to filtered ALL-CAPS words.
- Sentiment: VADER (NLTK) scores for titles and comments; “deep dive” averages per post.
- Storage: Local SQLite database `reddit_stocks.db` with de-duped mentions and post analytics.
- Enrichment: Yahoo Finance market cap + latest close fetched in batch and on-demand.
- Images: Export polished daily/weekly summary PNGs for subreddits or “overall”.
- Automation: Optional cron job plus one-command posting to Reddit with OAuth refresh tokens.
## Repository layout
```
.
├── Dockerfile # Multi-stage build (Tailwind -> Python + gunicorn)
├── docker-compose.yml # Prod (nginx + varnish optional) + dashboard
├── docker-compose-dev.yml # Dev compose (local nginx)
├── requirements.txt # Python deps
├── setup.py # Installs console scripts
├── subreddits.json # Default subreddits list
├── reddit_stocks.db # SQLite database (generated/updated by CLI)
├── export_image.py # Generate shareable PNGs (Playwright)
├── post_to_reddit.py # Post latest PNG to Reddit
├── get_refresh_token.py # One-time OAuth2 refresh token helper
├── fetch_close_price.py # Utility for closing price (yfinance)
├── fetch_market_cap.py # Utility for market cap (yfinance)
├── rstat_tool/
│ ├── main.py # CLI entry (rstat)
│ ├── dashboard.py # Flask app entry (rstat-dashboard)
│ ├── database.py # SQLite schema + queries
│ ├── ticker_extractor.py # Ticker parsing + blacklist
│ ├── sentiment_analyzer.py # VADER sentiment
│ ├── cleanup.py # Cleanup utilities (rstat-cleanup)
│ ├── flair_finder.py # Fetch subreddit flair IDs (rstat-flairs)
│ ├── logger_setup.py # Logging
│ └── setup_nltk.py # One-time VADER download
├── templates/ # Jinja2 templates (Tailwind 4 styling)
└── static/ # Favicon + generated CSS (style.css)
```
## Requirements
- Python 3.10+ (Docker image uses Python 3.13-slim)
- Reddit API app (script type) for read + submit
- For optional image export: Playwright browsers
- For UI development (optional): Node 18+ to rebuild Tailwind CSS
## Setup
1) Clone and enter the repo
```bash
git clone <your-repo>
cd reddit_stock_analyzer
```
2) Create and activate a virtualenv
- bash/zsh:
```bash
python3 -m venv .venv
source .venv/bin/activate
```
- fish:
```fish
python3 -m venv .venv
source .venv/bin/activate.fish
```
3) Install Python dependencies and commands
```bash
pip install -r requirements.txt
pip install -e .
```
4) Configure environment
Create a `.env` file in the repo root with your Reddit app credentials:
```
REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USER_AGENT=python:rstat:v1.0 (by u/yourname)
```
Optional (after OAuth step below):
```
REDDIT_REFRESH_TOKEN=your_refresh_token
```
5) One-time NLTK setup
```bash
python rstat_tool/setup_nltk.py
```
6) Configure subreddits (optional)
Edit `subreddits.json` to your liking. It ships with a sane default list.
## CLI usage (rstat)
The `rstat` command collects Reddit data and updates the database. Credentials are read from `.env`.
Common flags (see `rstat --help`):
- `--config FILE` Use a JSON file with `{"subreddits": [ ... ]}` (default: `subreddits.json`)
- `--subreddit NAME` Scan a single subreddit instead of the config
- `--days N` Only scan posts from the last N days (default 1)
- `--posts N` Max posts per subreddit to check (default 200)
- `--comments N` Max comments per post to scan (default 100)
- `--no-financials` Skip Yahoo Finance during the scan (faster)
- `--update-top-tickers` Update financials for tickers that are currently top daily/weekly
- `--update-financials-only [TICKER]` Update all or a single tickers market cap/close
- `--stdout` Log to console as well as file; `--debug` for verbose
Examples:
```bash
# Scan configured subs for last 24h, including financials
rstat --days 1
# Target a single subreddit for the past week, scan more comments
rstat --subreddit wallstreetbets --days 7 --comments 250
# Skip financials during scan, then update only top tickers
rstat --no-financials
rstat --update-top-tickers
# Update financials for all tickers in DB
rstat --update-financials-only
# Update a single ticker (case-insensitive)
rstat --update-financials-only TSLA
```
How mentions are detected:
- If a post contains any $TICKER (e.g., `$TSLA`) anywhere, we use “golden-only” mode: only $-prefixed tickers are considered.
- Otherwise, we fall back to filtered ALL-CAPS 25 letter words, excluding a large blacklist to avoid false positives.
- Title tickers attribute all comments in the thread; otherwise, we scan comments directly for mentions.
## Web dashboard (rstat-dashboard)
Start the dashboard and open http://127.0.0.1:5000
```bash
rstat-dashboard
```
Features:
- Overall top 10 (daily/weekly) across all subs
- Per-subreddit dashboards (daily/weekly)
- Deep Dive pages listing posts analyzed for a ticker
- Shareable image-friendly views (UI hides nav when `?image=true`)
The dashboard reads from `reddit_stocks.db`. Run `rstat` first so you have data.
## Image export (export_image.py)
Exports a high-res PNG of the dashboard views via Playwright. Note: the script currently uses `https://rstat.net` as its base URL.
```bash
# Overall daily image
python export_image.py --overall
# Subreddit daily image
python export_image.py --subreddit wallstreetbets
# Weekly view
python export_image.py --subreddit wallstreetbets --weekly
```
Output files are saved into the `images/` folder, e.g. `overall_summary_daily_1700000000.png`.
Tip: If you want to export from a local dashboard instead of rstat.net, edit `base_url` in `export_image.py`.
## Post images to Reddit (post_to_reddit.py)
One-time OAuth2 step to obtain a refresh token:
1) In your Reddit app settings, set the redirect URI to exactly `http://localhost:5000` (matches the script).
2) Run:
```bash
python get_refresh_token.py
```
Follow the on-screen steps: open the generated URL, allow, copy the redirected URL, paste back. Add the printed token to `.env` as `REDDIT_REFRESH_TOKEN`.
Now you can post:
```bash
# Post the most recent overall image to r/rstat
python post_to_reddit.py
# Post the most recent daily image for a subreddit
python post_to_reddit.py --subreddit wallstreetbets
# Post weekly image for a subreddit
python post_to_reddit.py --subreddit wallstreetbets --weekly
# Choose a target subreddit and (optionally) a flair ID
python post_to_reddit.py --subreddit wallstreetbets --target-subreddit rstat --flair-id <ID>
```
Need a flair ID? Use the helper:
```bash
rstat-flairs wallstreetbets
```
## Cleanup utilities (rstat-cleanup)
Remove blacklisted “ticker” rows and/or purge data for subreddits no longer in your config.
```bash
# Show help
rstat-cleanup --help
# Remove tickers that are in the internal COMMON_WORDS_BLACKLIST
rstat-cleanup --tickers
# Remove any subreddit data not in subreddits.json
rstat-cleanup --subreddits
# Use a custom config file
rstat-cleanup --subreddits my_subs.json
# Run both tasks
rstat-cleanup --all
```
## Automation (cron)
An example `run_daily_job.sh` is provided. Update `BASE_DIR` and make it executable:
```bash
chmod +x run_daily_job.sh
```
Add a cron entry (example 22:00 daily):
```
0 22 * * * /absolute/path/to/reddit_stock_analyzer/run_daily_job.sh >> /absolute/path/to/reddit_stock_analyzer/cron.log 2>&1
```
## Docker
Builds a Tailwind CSS layer, then a Python runtime with gunicorn. The compose files include optional nginx and varnish.
Quick start for the dashboard only (uses your host `reddit_stocks.db`):
```bash
docker compose up -d rstat-dashboard
```
Notes:
- The `rstat-dashboard` container mounts `./reddit_stocks.db` read-only. Populate it by running `rstat` on the host (or add a separate CLI container).
- Prod compose includes nginx (and optional certbot/varnish) configs under `config/`.
## Data model (SQLite)
- `tickers(id, symbol UNIQUE, market_cap, closing_price, last_updated)`
- `subreddits(id, name UNIQUE)`
- `mentions(id, ticker_id, subreddit_id, post_id, comment_id NULLABLE, mention_type, mention_sentiment, mention_timestamp, UNIQUE(ticker_id, post_id, comment_id))`
- `posts(id, post_id UNIQUE, title, post_url, subreddit_id, post_timestamp, comment_count, avg_comment_sentiment)`
Uniqueness prevents duplicates across post/comment granularity. Cleanup helpers remove blacklisted “tickers” and stale subreddits.
## UI and Tailwind
The CSS (`static/css/style.css`) is generated from `static/css/input.css` using Tailwind 4 during Docker build. If you want to tweak UI locally:
```bash
npm install
npx tailwindcss -i ./static/css/input.css -o ./static/css/style.css --minify
```
## Troubleshooting
- Missing VADER: Run `python rstat_tool/setup_nltk.py` once (in your venv).
- Playwright errors: Run `playwright install` once; ensure lib dependencies are present on your OS.
- yfinance returns None: Retry later; some tickers or regions can be spotty. The app tolerates missing financials.
- Flair required: If posting fails with flair errors, fetch a valid flair ID and pass `--flair-id`.
- Empty dashboards: Make sure `rstat` ran recently and `.env` is set; check `rstat.log`.
- DB locked: If you edit while the dashboard is reading, wait or stop the server; SQLite locks are short-lived.
## Safety and notes
- Do not commit `.env` or your database if it contains sensitive data.
- This project is for research/entertainment. Not investment advice.
---
Made with Python, Flask, NLTK, Playwright, and Tailwind.