diff --git a/README.md b/README.md index 75cd8b1..b3b7b7e 100644 --- a/README.md +++ b/README.md @@ -149,188 +149,311 @@ This is the command-line tool you will use to populate the database. It is highl This command starts a local web server to let you explore the data you've collected. -**How to Run:** -1. Make sure you have run the `rstat` scraper at least once to populate the database. -2. Start the web server: - ```bash - rstat-dashboard - ``` -3. Open your web browser and navigate to **http://127.0.0.1:5000**. +
-**Dashboard Features:** -* **Main Page:** Shows the Top 10 most mentioned tickers across all scanned subreddits. -* **Subreddit Pages:** Click any subreddit in the navigation bar to see a dashboard specific to that community. -* **Deep Dive:** In any table, click on a ticker's symbol to see a detailed breakdown of every post it was mentioned in. -* **Shareable Images:** On a subreddit's page, click "(View Daily Image)" or "(View Weekly Image)" to generate a polished, shareable summary card. +# RSTAT — Reddit Stock Analyzer +Scan Reddit for stock ticker mentions, score sentiment, enrich with price/market cap, and explore the results in a clean web dashboard. Automate shareable images and post them to Reddit. -### 3. Exporting Shareable Images (`.png`) +
-In addition to viewing the dashboards in a browser, the project includes a powerful script to programmatically save the 'image views' as static `.png` files. This is ideal for automation, scheduled tasks (cron jobs), or sharing the results on social media platforms like your `r/rstat` subreddit. +## Highlights -#### One-Time Setup +- CLI + Web UI: Collect data with `rstat`, browse it with `rstat-dashboard`. +- Smart ticker parsing: Prefer $TSLA/$AAPL “golden” matches; fall back to filtered ALL-CAPS words. +- Sentiment: VADER (NLTK) scores for titles and comments; “deep dive” averages per post. +- Storage: Local SQLite database `reddit_stocks.db` with de-duped mentions and post analytics. +- Enrichment: Yahoo Finance market cap + latest close fetched in batch and on-demand. +- Images: Export polished daily/weekly summary PNGs for subreddits or “overall”. +- Automation: Optional cron job plus one-command posting to Reddit with OAuth refresh tokens. -The image exporter uses the Playwright library to control a headless browser. Before using it for the first time, you must install the necessary browser runtimes with this command: +## Repository layout -```bash -playwright install +``` +. +├── Dockerfile # Multi-stage build (Tailwind -> Python + gunicorn) +├── docker-compose.yml # Prod (nginx + varnish optional) + dashboard +├── docker-compose-dev.yml # Dev compose (local nginx) +├── requirements.txt # Python deps +├── setup.py # Installs console scripts +├── subreddits.json # Default subreddits list +├── reddit_stocks.db # SQLite database (generated/updated by CLI) +├── export_image.py # Generate shareable PNGs (Playwright) +├── post_to_reddit.py # Post latest PNG to Reddit +├── get_refresh_token.py # One-time OAuth2 refresh token helper +├── fetch_close_price.py # Utility for closing price (yfinance) +├── fetch_market_cap.py # Utility for market cap (yfinance) +├── rstat_tool/ +│ ├── main.py # CLI entry (rstat) +│ ├── dashboard.py # Flask app entry (rstat-dashboard) +│ ├── database.py # SQLite schema + queries +│ ├── ticker_extractor.py # Ticker parsing + blacklist +│ ├── sentiment_analyzer.py # VADER sentiment +│ ├── cleanup.py # Cleanup utilities (rstat-cleanup) +│ ├── flair_finder.py # Fetch subreddit flair IDs (rstat-flairs) +│ ├── logger_setup.py # Logging +│ └── setup_nltk.py # One-time VADER download +├── templates/ # Jinja2 templates (Tailwind 4 styling) +└── static/ # Favicon + generated CSS (style.css) ``` -#### Usage Workflow +## Requirements -The exporter works by taking a high-quality screenshot of the live web page. Therefore, the process requires two steps running in two separate terminals. +- Python 3.10+ (Docker image uses Python 3.13-slim) +- Reddit API app (script type) for read + submit +- For optional image export: Playwright browsers +- For UI development (optional): Node 18+ to rebuild Tailwind CSS -**Step 1: Start the Web Dashboard** +## Setup -The web server must be running for the exporter to have a page to screenshot. Open a terminal and run: +1) Clone and enter the repo + +```bash +git clone +cd reddit_stock_analyzer +``` + +2) Create and activate a virtualenv + +- bash/zsh: + ```bash + python3 -m venv .venv + source .venv/bin/activate + ``` +- fish: + ```fish + python3 -m venv .venv + source .venv/bin/activate.fish + ``` + +3) Install Python dependencies and commands + +```bash +pip install -r requirements.txt +pip install -e . +``` + +4) Configure environment + +Create a `.env` file in the repo root with your Reddit app credentials: + +``` +REDDIT_CLIENT_ID=your_client_id +REDDIT_CLIENT_SECRET=your_client_secret +REDDIT_USER_AGENT=python:rstat:v1.0 (by u/yourname) +``` + +Optional (after OAuth step below): + +``` +REDDIT_REFRESH_TOKEN=your_refresh_token +``` + +5) One-time NLTK setup + +```bash +python rstat_tool/setup_nltk.py +``` + +6) Configure subreddits (optional) + +Edit `subreddits.json` to your liking. It ships with a sane default list. + +## CLI usage (rstat) + +The `rstat` command collects Reddit data and updates the database. Credentials are read from `.env`. + +Common flags (see `rstat --help`): + +- `--config FILE` Use a JSON file with `{"subreddits": [ ... ]}` (default: `subreddits.json`) +- `--subreddit NAME` Scan a single subreddit instead of the config +- `--days N` Only scan posts from the last N days (default 1) +- `--posts N` Max posts per subreddit to check (default 200) +- `--comments N` Max comments per post to scan (default 100) +- `--no-financials` Skip Yahoo Finance during the scan (faster) +- `--update-top-tickers` Update financials for tickers that are currently top daily/weekly +- `--update-financials-only [TICKER]` Update all or a single ticker’s market cap/close +- `--stdout` Log to console as well as file; `--debug` for verbose + +Examples: + +```bash +# Scan configured subs for last 24h, including financials +rstat --days 1 + +# Target a single subreddit for the past week, scan more comments +rstat --subreddit wallstreetbets --days 7 --comments 250 + +# Skip financials during scan, then update only top tickers +rstat --no-financials +rstat --update-top-tickers + +# Update financials for all tickers in DB +rstat --update-financials-only + +# Update a single ticker (case-insensitive) +rstat --update-financials-only TSLA +``` + +How mentions are detected: + +- If a post contains any $TICKER (e.g., `$TSLA`) anywhere, we use “golden-only” mode: only $-prefixed tickers are considered. +- Otherwise, we fall back to filtered ALL-CAPS 2–5 letter words, excluding a large blacklist to avoid false positives. +- Title tickers attribute all comments in the thread; otherwise, we scan comments directly for mentions. + +## Web dashboard (rstat-dashboard) + +Start the dashboard and open http://127.0.0.1:5000 ```bash rstat-dashboard ``` -Leave this terminal running. -**Step 2: Run the Export Script** +Features: -Open a **second terminal** in the same project directory. You can now run the `export_image.py` script with the desired arguments. +- Overall top 10 (daily/weekly) across all subs +- Per-subreddit dashboards (daily/weekly) +- Deep Dive pages listing posts analyzed for a ticker +- Shareable image-friendly views (UI hides nav when `?image=true`) -**Examples:** +The dashboard reads from `reddit_stocks.db`. Run `rstat` first so you have data. -* To export the **daily** summary image for `r/wallstreetbets`: - ```bash - python export_image.py wallstreetbets - ``` +## Image export (export_image.py) -* To export the **weekly** summary image for `r/wallstreetbets`: - ```bash - python export_image.py wallstreetbets --weekly - ``` +Exports a high-res PNG of the dashboard views via Playwright. Note: the script currently uses `https://rstat.net` as its base URL. -* To export the **overall** summary image (across all subreddits): - ```bash - python export_image.py --overall - ``` - -#### Output - -After running a command, a new `.png` file (e.g., `wallstreetbets_daily_1690000000.png`) will be saved in the images-directory in the root directory of the project. - - -## 4. Full Automation: Posting to Reddit via Cron Job - -The final piece of the project is a script that automates the entire pipeline: scraping data, generating an image, and posting it to a target subreddit like `r/rstat`. This is designed to be run via a scheduled task or cron job. - -### Prerequisites: One-Time Account Authorization (OAuth2) - -To post on your behalf, the script needs to be authorized with your Reddit account. This is done securely using OAuth2 and a `refresh_token`, which is compatible with 2-Factor Authentication (2FA). This is a **one-time setup process**. - -**Step 1: Get Your Refresh Token** - -1. First, ensure the "redirect uri" in your [Reddit App settings](https://www.reddit.com/prefs/apps) is set to **exactly** `http://localhost:8080`. -2. Run the temporary helper script included in the project: - ```bash - python get_refresh_token.py - ``` -3. The script will print a unique URL. Copy this URL and paste it into your web browser. -4. Log in to the Reddit account you want to post from and click **"Allow"** when prompted. -5. You'll be redirected to a `localhost:8080` page that says "This site can’t be reached". **This is normal and expected.** -6. Copy the **full URL** from your browser's address bar. It will look something like `http://localhost:8080/?state=...&code=...`. -7. Paste this full URL back into the terminal where the script is waiting and press Enter. -8. The script will output your unique **refresh token**. - -**Step 2: Update Your `.env` File** - -1. Open your `.env` file. -2. Add a new line and paste your refresh token into it. -3. Ensure your file now contains the following (your username and password are no longer needed): - ``` - REDDIT_CLIENT_ID=your_client_id_from_reddit - REDDIT_CLIENT_SECRET=your_client_secret_from_reddit - REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.2) - REDDIT_REFRESH_TOKEN=the_long_refresh_token_string_you_just_copied - ``` -You can now safely delete the `get_refresh_token.py` script. Your application is now authorized to post on your behalf indefinitely. - -### The `post_to_reddit.py` Script - -This is the standalone script that finds the most recently generated image and posts it to Reddit using your new authorization. - -**Manual Usage:** - -* **Post the latest OVERALL summary image to `r/rstat`:** - ```bash - python post_to_reddit.py - ``` - -* **Post the latest DAILY image for a specific subreddit:** - ```bash - python post_to_reddit.py --subreddit wallstreetbets - ``` - -* **Post the latest WEEKLY image for a specific subreddit:** - ```bash - python post_to_reddit.py --subreddit wallstreetbets --weekly - ``` - -### Setting Up the Cron Job - -To run the entire pipeline automatically every day, you can use a simple shell script controlled by `cron`. - -**Step 1: Create a Job Script** - -Create a file named `run_daily_job.sh` in the root of your project directory. - -**`run_daily_job.sh`:** ```bash -#!/bin/bash - -# CRITICAL: Navigate to the project directory using an absolute path. -# Replace '/path/to/your/project/reddit_stock_analyzer' with your actual path. -cd /path/to/your/project/reddit_stock_analyzer - -# CRITICAL: Activate the virtual environment using an absolute path. -source /path/to/your/project/reddit_stock_analyzer/.venv/bin/activate - -echo "--- Starting RSTAT Daily Job on $(date) ---" - -# 1. Scrape data from the last 24 hours. -echo "Step 1: Scraping new data..." -rstat --days 1 - -# 2. Start the dashboard in the background. -echo "Step 2: Starting dashboard in background..." -rstat-dashboard & -DASHBOARD_PID=$! -sleep 10 - -# 3. Export the overall summary image. -echo "Step 3: Exporting overall summary image..." +# Overall daily image python export_image.py --overall -# 4. Post the image to r/rstat. -echo "Step 4: Posting image to Reddit..." -python post_to_reddit.py --target-subreddit rstat +# Subreddit daily image +python export_image.py --subreddit wallstreetbets -# 5. Clean up by stopping the dashboard server. -echo "Step 5: Stopping dashboard server..." -kill $DASHBOARD_PID - -echo "--- RSTAT Daily Job Complete ---" +# Weekly view +python export_image.py --subreddit wallstreetbets --weekly ``` -**Before proceeding, you must edit the two absolute paths at the top of this script to match your system.** -**Step 2: Make the Script Executable** +Output files are saved into the `images/` folder, e.g. `overall_summary_daily_1700000000.png`. + +Tip: If you want to export from a local dashboard instead of rstat.net, edit `base_url` in `export_image.py`. + +## Post images to Reddit (post_to_reddit.py) + +One-time OAuth2 step to obtain a refresh token: + +1) In your Reddit app settings, set the redirect URI to exactly `http://localhost:5000` (matches the script). +2) Run: + +```bash +python get_refresh_token.py +``` + +Follow the on-screen steps: open the generated URL, allow, copy the redirected URL, paste back. Add the printed token to `.env` as `REDDIT_REFRESH_TOKEN`. + +Now you can post: + +```bash +# Post the most recent overall image to r/rstat +python post_to_reddit.py + +# Post the most recent daily image for a subreddit +python post_to_reddit.py --subreddit wallstreetbets + +# Post weekly image for a subreddit +python post_to_reddit.py --subreddit wallstreetbets --weekly + +# Choose a target subreddit and (optionally) a flair ID +python post_to_reddit.py --subreddit wallstreetbets --target-subreddit rstat --flair-id +``` + +Need a flair ID? Use the helper: + +```bash +rstat-flairs wallstreetbets +``` + +## Cleanup utilities (rstat-cleanup) + +Remove blacklisted “ticker” rows and/or purge data for subreddits no longer in your config. + +```bash +# Show help +rstat-cleanup --help + +# Remove tickers that are in the internal COMMON_WORDS_BLACKLIST +rstat-cleanup --tickers + +# Remove any subreddit data not in subreddits.json +rstat-cleanup --subreddits + +# Use a custom config file +rstat-cleanup --subreddits my_subs.json + +# Run both tasks +rstat-cleanup --all +``` + +## Automation (cron) + +An example `run_daily_job.sh` is provided. Update `BASE_DIR` and make it executable: ```bash chmod +x run_daily_job.sh ``` -**Step 3: Schedule the Cron Job** +Add a cron entry (example 22:00 daily): -1. Run `crontab -e` to open your crontab editor. -2. Add the following line to run the script every day at 10:00 PM and log its output: +``` +0 22 * * * /absolute/path/to/reddit_stock_analyzer/run_daily_job.sh >> /absolute/path/to/reddit_stock_analyzer/cron.log 2>&1 +``` - ``` - 0 22 * * * /path/to/your/project/reddit_stock_analyzer/run_daily_job.sh >> /path/to/your/project/reddit_stock_analyzer/cron.log 2>&1 - ``` +## Docker -Your project is now fully and securely automated. \ No newline at end of file +Builds a Tailwind CSS layer, then a Python runtime with gunicorn. The compose files include optional nginx and varnish. + +Quick start for the dashboard only (uses your host `reddit_stocks.db`): + +```bash +docker compose up -d rstat-dashboard +``` + +Notes: + +- The `rstat-dashboard` container mounts `./reddit_stocks.db` read-only. Populate it by running `rstat` on the host (or add a separate CLI container). +- Prod compose includes nginx (and optional certbot/varnish) configs under `config/`. + +## Data model (SQLite) + +- `tickers(id, symbol UNIQUE, market_cap, closing_price, last_updated)` +- `subreddits(id, name UNIQUE)` +- `mentions(id, ticker_id, subreddit_id, post_id, comment_id NULLABLE, mention_type, mention_sentiment, mention_timestamp, UNIQUE(ticker_id, post_id, comment_id))` +- `posts(id, post_id UNIQUE, title, post_url, subreddit_id, post_timestamp, comment_count, avg_comment_sentiment)` + +Uniqueness prevents duplicates across post/comment granularity. Cleanup helpers remove blacklisted “tickers” and stale subreddits. + +## UI and Tailwind + +The CSS (`static/css/style.css`) is generated from `static/css/input.css` using Tailwind 4 during Docker build. If you want to tweak UI locally: + +```bash +npm install +npx tailwindcss -i ./static/css/input.css -o ./static/css/style.css --minify +``` + +## Troubleshooting + +- Missing VADER: Run `python rstat_tool/setup_nltk.py` once (in your venv). +- Playwright errors: Run `playwright install` once; ensure lib dependencies are present on your OS. +- yfinance returns None: Retry later; some tickers or regions can be spotty. The app tolerates missing financials. +- Flair required: If posting fails with flair errors, fetch a valid flair ID and pass `--flair-id`. +- Empty dashboards: Make sure `rstat` ran recently and `.env` is set; check `rstat.log`. +- DB locked: If you edit while the dashboard is reading, wait or stop the server; SQLite locks are short-lived. + +## Safety and notes + +- Do not commit `.env` or your database if it contains sensitive data. +- This project is for research/entertainment. Not investment advice. + +--- + +Made with Python, Flask, NLTK, Playwright, and Tailwind. \ No newline at end of file