Update README.

This commit is contained in:
2025-08-26 15:15:36 +02:00
parent 35577770dd
commit 8238ca5352

393
README.md
View File

@@ -149,188 +149,311 @@ This is the command-line tool you will use to populate the database. It is highl
This command starts a local web server to let you explore the data you've collected. This command starts a local web server to let you explore the data you've collected.
**How to Run:** <div align="center">
1. Make sure you have run the `rstat` scraper at least once to populate the database.
2. Start the web server: # RSTAT — Reddit Stock Analyzer
```bash
rstat-dashboard Scan Reddit for stock ticker mentions, score sentiment, enrich with price/market cap, and explore the results in a clean web dashboard. Automate shareable images and post them to Reddit.
</div>
## Highlights
- CLI + Web UI: Collect data with `rstat`, browse it with `rstat-dashboard`.
- Smart ticker parsing: Prefer $TSLA/$AAPL “golden” matches; fall back to filtered ALL-CAPS words.
- Sentiment: VADER (NLTK) scores for titles and comments; “deep dive” averages per post.
- Storage: Local SQLite database `reddit_stocks.db` with de-duped mentions and post analytics.
- Enrichment: Yahoo Finance market cap + latest close fetched in batch and on-demand.
- Images: Export polished daily/weekly summary PNGs for subreddits or “overall”.
- Automation: Optional cron job plus one-command posting to Reddit with OAuth refresh tokens.
## Repository layout
``` ```
3. Open your web browser and navigate to **http://127.0.0.1:5000**. .
├── Dockerfile # Multi-stage build (Tailwind -> Python + gunicorn)
**Dashboard Features:** ├── docker-compose.yml # Prod (nginx + varnish optional) + dashboard
* **Main Page:** Shows the Top 10 most mentioned tickers across all scanned subreddits. ├── docker-compose-dev.yml # Dev compose (local nginx)
* **Subreddit Pages:** Click any subreddit in the navigation bar to see a dashboard specific to that community. ├── requirements.txt # Python deps
* **Deep Dive:** In any table, click on a ticker's symbol to see a detailed breakdown of every post it was mentioned in. ├── setup.py # Installs console scripts
* **Shareable Images:** On a subreddit's page, click "(View Daily Image)" or "(View Weekly Image)" to generate a polished, shareable summary card. ├── subreddits.json # Default subreddits list
├── reddit_stocks.db # SQLite database (generated/updated by CLI)
├── export_image.py # Generate shareable PNGs (Playwright)
### 3. Exporting Shareable Images (`.png`) ├── post_to_reddit.py # Post latest PNG to Reddit
├── get_refresh_token.py # One-time OAuth2 refresh token helper
In addition to viewing the dashboards in a browser, the project includes a powerful script to programmatically save the 'image views' as static `.png` files. This is ideal for automation, scheduled tasks (cron jobs), or sharing the results on social media platforms like your `r/rstat` subreddit. ├── fetch_close_price.py # Utility for closing price (yfinance)
├── fetch_market_cap.py # Utility for market cap (yfinance)
#### One-Time Setup ├── rstat_tool/
│ ├── main.py # CLI entry (rstat)
The image exporter uses the Playwright library to control a headless browser. Before using it for the first time, you must install the necessary browser runtimes with this command: │ ├── dashboard.py # Flask app entry (rstat-dashboard)
│ ├── database.py # SQLite schema + queries
```bash │ ├── ticker_extractor.py # Ticker parsing + blacklist
playwright install │ ├── sentiment_analyzer.py # VADER sentiment
│ ├── cleanup.py # Cleanup utilities (rstat-cleanup)
│ ├── flair_finder.py # Fetch subreddit flair IDs (rstat-flairs)
│ ├── logger_setup.py # Logging
│ └── setup_nltk.py # One-time VADER download
├── templates/ # Jinja2 templates (Tailwind 4 styling)
└── static/ # Favicon + generated CSS (style.css)
``` ```
#### Usage Workflow ## Requirements
The exporter works by taking a high-quality screenshot of the live web page. Therefore, the process requires two steps running in two separate terminals. - Python 3.10+ (Docker image uses Python 3.13-slim)
- Reddit API app (script type) for read + submit
- For optional image export: Playwright browsers
- For UI development (optional): Node 18+ to rebuild Tailwind CSS
**Step 1: Start the Web Dashboard** ## Setup
The web server must be running for the exporter to have a page to screenshot. Open a terminal and run: 1) Clone and enter the repo
```bash
git clone <your-repo>
cd reddit_stock_analyzer
```
2) Create and activate a virtualenv
- bash/zsh:
```bash
python3 -m venv .venv
source .venv/bin/activate
```
- fish:
```fish
python3 -m venv .venv
source .venv/bin/activate.fish
```
3) Install Python dependencies and commands
```bash
pip install -r requirements.txt
pip install -e .
```
4) Configure environment
Create a `.env` file in the repo root with your Reddit app credentials:
```
REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USER_AGENT=python:rstat:v1.0 (by u/yourname)
```
Optional (after OAuth step below):
```
REDDIT_REFRESH_TOKEN=your_refresh_token
```
5) One-time NLTK setup
```bash
python rstat_tool/setup_nltk.py
```
6) Configure subreddits (optional)
Edit `subreddits.json` to your liking. It ships with a sane default list.
## CLI usage (rstat)
The `rstat` command collects Reddit data and updates the database. Credentials are read from `.env`.
Common flags (see `rstat --help`):
- `--config FILE` Use a JSON file with `{"subreddits": [ ... ]}` (default: `subreddits.json`)
- `--subreddit NAME` Scan a single subreddit instead of the config
- `--days N` Only scan posts from the last N days (default 1)
- `--posts N` Max posts per subreddit to check (default 200)
- `--comments N` Max comments per post to scan (default 100)
- `--no-financials` Skip Yahoo Finance during the scan (faster)
- `--update-top-tickers` Update financials for tickers that are currently top daily/weekly
- `--update-financials-only [TICKER]` Update all or a single tickers market cap/close
- `--stdout` Log to console as well as file; `--debug` for verbose
Examples:
```bash
# Scan configured subs for last 24h, including financials
rstat --days 1
# Target a single subreddit for the past week, scan more comments
rstat --subreddit wallstreetbets --days 7 --comments 250
# Skip financials during scan, then update only top tickers
rstat --no-financials
rstat --update-top-tickers
# Update financials for all tickers in DB
rstat --update-financials-only
# Update a single ticker (case-insensitive)
rstat --update-financials-only TSLA
```
How mentions are detected:
- If a post contains any $TICKER (e.g., `$TSLA`) anywhere, we use “golden-only” mode: only $-prefixed tickers are considered.
- Otherwise, we fall back to filtered ALL-CAPS 25 letter words, excluding a large blacklist to avoid false positives.
- Title tickers attribute all comments in the thread; otherwise, we scan comments directly for mentions.
## Web dashboard (rstat-dashboard)
Start the dashboard and open http://127.0.0.1:5000
```bash ```bash
rstat-dashboard rstat-dashboard
``` ```
Leave this terminal running.
**Step 2: Run the Export Script** Features:
Open a **second terminal** in the same project directory. You can now run the `export_image.py` script with the desired arguments. - Overall top 10 (daily/weekly) across all subs
- Per-subreddit dashboards (daily/weekly)
- Deep Dive pages listing posts analyzed for a ticker
- Shareable image-friendly views (UI hides nav when `?image=true`)
**Examples:** The dashboard reads from `reddit_stocks.db`. Run `rstat` first so you have data.
## Image export (export_image.py)
Exports a high-res PNG of the dashboard views via Playwright. Note: the script currently uses `https://rstat.net` as its base URL.
* To export the **daily** summary image for `r/wallstreetbets`:
```bash
python export_image.py wallstreetbets
```
* To export the **weekly** summary image for `r/wallstreetbets`:
```bash
python export_image.py wallstreetbets --weekly
```
* To export the **overall** summary image (across all subreddits):
```bash ```bash
# Overall daily image
python export_image.py --overall python export_image.py --overall
# Subreddit daily image
python export_image.py --subreddit wallstreetbets
# Weekly view
python export_image.py --subreddit wallstreetbets --weekly
``` ```
#### Output Output files are saved into the `images/` folder, e.g. `overall_summary_daily_1700000000.png`.
After running a command, a new `.png` file (e.g., `wallstreetbets_daily_1690000000.png`) will be saved in the images-directory in the root directory of the project. Tip: If you want to export from a local dashboard instead of rstat.net, edit `base_url` in `export_image.py`.
## Post images to Reddit (post_to_reddit.py)
## 4. Full Automation: Posting to Reddit via Cron Job One-time OAuth2 step to obtain a refresh token:
The final piece of the project is a script that automates the entire pipeline: scraping data, generating an image, and posting it to a target subreddit like `r/rstat`. This is designed to be run via a scheduled task or cron job. 1) In your Reddit app settings, set the redirect URI to exactly `http://localhost:5000` (matches the script).
2) Run:
### Prerequisites: One-Time Account Authorization (OAuth2)
To post on your behalf, the script needs to be authorized with your Reddit account. This is done securely using OAuth2 and a `refresh_token`, which is compatible with 2-Factor Authentication (2FA). This is a **one-time setup process**.
**Step 1: Get Your Refresh Token**
1. First, ensure the "redirect uri" in your [Reddit App settings](https://www.reddit.com/prefs/apps) is set to **exactly** `http://localhost:8080`.
2. Run the temporary helper script included in the project:
```bash ```bash
python get_refresh_token.py python get_refresh_token.py
``` ```
3. The script will print a unique URL. Copy this URL and paste it into your web browser.
4. Log in to the Reddit account you want to post from and click **"Allow"** when prompted.
5. You'll be redirected to a `localhost:8080` page that says "This site cant be reached". **This is normal and expected.**
6. Copy the **full URL** from your browser's address bar. It will look something like `http://localhost:8080/?state=...&code=...`.
7. Paste this full URL back into the terminal where the script is waiting and press Enter.
8. The script will output your unique **refresh token**.
**Step 2: Update Your `.env` File** Follow the on-screen steps: open the generated URL, allow, copy the redirected URL, paste back. Add the printed token to `.env` as `REDDIT_REFRESH_TOKEN`.
1. Open your `.env` file. Now you can post:
2. Add a new line and paste your refresh token into it.
3. Ensure your file now contains the following (your username and password are no longer needed):
```
REDDIT_CLIENT_ID=your_client_id_from_reddit
REDDIT_CLIENT_SECRET=your_client_secret_from_reddit
REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.2)
REDDIT_REFRESH_TOKEN=the_long_refresh_token_string_you_just_copied
```
You can now safely delete the `get_refresh_token.py` script. Your application is now authorized to post on your behalf indefinitely.
### The `post_to_reddit.py` Script
This is the standalone script that finds the most recently generated image and posts it to Reddit using your new authorization.
**Manual Usage:**
* **Post the latest OVERALL summary image to `r/rstat`:**
```bash ```bash
# Post the most recent overall image to r/rstat
python post_to_reddit.py python post_to_reddit.py
```
* **Post the latest DAILY image for a specific subreddit:** # Post the most recent daily image for a subreddit
```bash
python post_to_reddit.py --subreddit wallstreetbets python post_to_reddit.py --subreddit wallstreetbets
```
* **Post the latest WEEKLY image for a specific subreddit:** # Post weekly image for a subreddit
```bash
python post_to_reddit.py --subreddit wallstreetbets --weekly python post_to_reddit.py --subreddit wallstreetbets --weekly
# Choose a target subreddit and (optionally) a flair ID
python post_to_reddit.py --subreddit wallstreetbets --target-subreddit rstat --flair-id <ID>
``` ```
### Setting Up the Cron Job Need a flair ID? Use the helper:
To run the entire pipeline automatically every day, you can use a simple shell script controlled by `cron`.
**Step 1: Create a Job Script**
Create a file named `run_daily_job.sh` in the root of your project directory.
**`run_daily_job.sh`:**
```bash ```bash
#!/bin/bash rstat-flairs wallstreetbets
# CRITICAL: Navigate to the project directory using an absolute path.
# Replace '/path/to/your/project/reddit_stock_analyzer' with your actual path.
cd /path/to/your/project/reddit_stock_analyzer
# CRITICAL: Activate the virtual environment using an absolute path.
source /path/to/your/project/reddit_stock_analyzer/.venv/bin/activate
echo "--- Starting RSTAT Daily Job on $(date) ---"
# 1. Scrape data from the last 24 hours.
echo "Step 1: Scraping new data..."
rstat --days 1
# 2. Start the dashboard in the background.
echo "Step 2: Starting dashboard in background..."
rstat-dashboard &
DASHBOARD_PID=$!
sleep 10
# 3. Export the overall summary image.
echo "Step 3: Exporting overall summary image..."
python export_image.py --overall
# 4. Post the image to r/rstat.
echo "Step 4: Posting image to Reddit..."
python post_to_reddit.py --target-subreddit rstat
# 5. Clean up by stopping the dashboard server.
echo "Step 5: Stopping dashboard server..."
kill $DASHBOARD_PID
echo "--- RSTAT Daily Job Complete ---"
``` ```
**Before proceeding, you must edit the two absolute paths at the top of this script to match your system.**
**Step 2: Make the Script Executable** ## Cleanup utilities (rstat-cleanup)
Remove blacklisted “ticker” rows and/or purge data for subreddits no longer in your config.
```bash
# Show help
rstat-cleanup --help
# Remove tickers that are in the internal COMMON_WORDS_BLACKLIST
rstat-cleanup --tickers
# Remove any subreddit data not in subreddits.json
rstat-cleanup --subreddits
# Use a custom config file
rstat-cleanup --subreddits my_subs.json
# Run both tasks
rstat-cleanup --all
```
## Automation (cron)
An example `run_daily_job.sh` is provided. Update `BASE_DIR` and make it executable:
```bash ```bash
chmod +x run_daily_job.sh chmod +x run_daily_job.sh
``` ```
**Step 3: Schedule the Cron Job** Add a cron entry (example 22:00 daily):
1. Run `crontab -e` to open your crontab editor.
2. Add the following line to run the script every day at 10:00 PM and log its output:
``` ```
0 22 * * * /path/to/your/project/reddit_stock_analyzer/run_daily_job.sh >> /path/to/your/project/reddit_stock_analyzer/cron.log 2>&1 0 22 * * * /absolute/path/to/reddit_stock_analyzer/run_daily_job.sh >> /absolute/path/to/reddit_stock_analyzer/cron.log 2>&1
``` ```
Your project is now fully and securely automated. ## Docker
Builds a Tailwind CSS layer, then a Python runtime with gunicorn. The compose files include optional nginx and varnish.
Quick start for the dashboard only (uses your host `reddit_stocks.db`):
```bash
docker compose up -d rstat-dashboard
```
Notes:
- The `rstat-dashboard` container mounts `./reddit_stocks.db` read-only. Populate it by running `rstat` on the host (or add a separate CLI container).
- Prod compose includes nginx (and optional certbot/varnish) configs under `config/`.
## Data model (SQLite)
- `tickers(id, symbol UNIQUE, market_cap, closing_price, last_updated)`
- `subreddits(id, name UNIQUE)`
- `mentions(id, ticker_id, subreddit_id, post_id, comment_id NULLABLE, mention_type, mention_sentiment, mention_timestamp, UNIQUE(ticker_id, post_id, comment_id))`
- `posts(id, post_id UNIQUE, title, post_url, subreddit_id, post_timestamp, comment_count, avg_comment_sentiment)`
Uniqueness prevents duplicates across post/comment granularity. Cleanup helpers remove blacklisted “tickers” and stale subreddits.
## UI and Tailwind
The CSS (`static/css/style.css`) is generated from `static/css/input.css` using Tailwind 4 during Docker build. If you want to tweak UI locally:
```bash
npm install
npx tailwindcss -i ./static/css/input.css -o ./static/css/style.css --minify
```
## Troubleshooting
- Missing VADER: Run `python rstat_tool/setup_nltk.py` once (in your venv).
- Playwright errors: Run `playwright install` once; ensure lib dependencies are present on your OS.
- yfinance returns None: Retry later; some tickers or regions can be spotty. The app tolerates missing financials.
- Flair required: If posting fails with flair errors, fetch a valid flair ID and pass `--flair-id`.
- Empty dashboards: Make sure `rstat` ran recently and `.env` is set; check `rstat.log`.
- DB locked: If you edit while the dashboard is reading, wait or stop the server; SQLite locks are short-lived.
## Safety and notes
- Do not commit `.env` or your database if it contains sensitive data.
- This project is for research/entertainment. Not investment advice.
---
Made with Python, Flask, NLTK, Playwright, and Tailwind.