# rstat - Reddit Stock Analyzer A powerful, installable command-line tool and web dashboard to scan Reddit for stock ticker mentions, perform sentiment analysis, generate insightful reports, and create shareable summary images. ## Key Features * **Dual-Interface:** Use a flexible command-line tool (`rstat`) for data collection and a simple web dashboard (`rstat-dashboard`) for data visualization. * **Flexible Data Scraping:** * Scan subreddits from a config file or target a single subreddit on the fly. * Configure the time window to scan posts from the last 24 hours (for daily cron jobs) or back-fill data from several past days (e.g., last 7 days). * Fetches from `/new` to capture the most recent discussions. * **Deep Analysis & Storage:** * Scans both post titles and comments, differentiating between the two. * Performs a "deep dive" analysis on posts to calculate the average sentiment of the entire comment section. * Persists all data in a local SQLite database (`reddit_stocks.db`) to track trends over time. * **Rich Data Enrichment:** * Calculates sentiment (Bullish, Bearish, Neutral) for every mention using NLTK. * Fetches and stores daily closing prices and market capitalization from Yahoo Finance. * **Interactive Web Dashboard:** * View Top 10 tickers across all subreddits or on a per-subreddit basis. * Click any ticker to get a "Deep Dive" page, showing every post it was mentioned in. * **Shareable Summary Images:** * Generate clean, dark-mode summary images for both daily and weekly sentiment for any subreddit, perfect for sharing. * **High-Quality Data:** * Uses a configurable blacklist and smart filtering to reduce false positives. * Automatically cleans the database of invalid tickers if the blacklist is updated. ## Project Structure ``` reddit_stock_analyzer/ ├── .env # Your secret API keys ├── requirements.txt # Project dependencies ├── setup.py # Installation script for the tool ├── subreddits.json # Default list of subreddits to scan ├── templates/ # HTML templates for the web dashboard │ ├── base.html │ ├── index.html │ ├── subreddit.html │ ├── deep_dive.html │ ├── image_view.html │ └── weekly_image_view.html └── rstat_tool/ # The main source code package ├── __init__.py ├── main.py # Scraper entry point and CLI logic ├── dashboard.py # Web dashboard entry point (Flask app) ├── database.py # All SQLite database functions └── ... ``` ## Setup and Installation Follow these steps to set up the project on your local machine. ### 1. Prerequisites * Python 3.7+ * Git ### 2. Clone the Repository ```bash git clone cd reddit_stock_analyzer ``` ### 3. Set Up a Python Virtual Environment It is highly recommended to use a virtual environment to manage dependencies. **On macOS / Linux:** ```bash python3 -m venv .venv source .venv/bin/activate ``` **On Windows:** ```bash python -m venv .venv .\.venv\Scripts\activate ``` ### 4. Install Dependencies ```bash pip install -r requirements.txt ``` ### 5. Configure Reddit API Credentials 1. Go to the [Reddit Apps preferences page](https://www.reddit.com/prefs/apps) and create a new "script" app. 2. Create a file named `.env` in the root of the project directory. 3. Add your credentials to the `.env` file like this: ``` REDDIT_CLIENT_ID=your_client_id_from_reddit REDDIT_CLIENT_SECRET=your_client_secret_from_reddit REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.2) ``` ### 6. Set Up NLTK Run the included setup script **once** to download the required `vader_lexicon` for sentiment analysis. ```bash python rstat_tool/setup_nltk.py ``` ### 7. Set Up Playwright Run the install routine for playwright. You might need to install some dependencies. Follow on-screen instruction if that's the case. ```bash playwright install ``` ### 8. Build and Install the Commands Install the tool in "editable" mode. This creates the `rstat` and `rstat-dashboard` commands in your virtual environment and links them to your source code. ```bash pip install -e . ``` The installation is now complete. --- ## Usage The tool is split into two commands: one for gathering data and one for viewing it. ### 1. The Scraper (`rstat`) This is the command-line tool you will use to populate the database. It is highly flexible. **Common Commands:** * **Run a daily scan (for cron jobs):** Scans subreddits from `subreddits.json` for posts in the last 24 hours. ```bash rstat --config subreddits.json --days 1 ``` * **Scan a single subreddit:** Ignores the config file and scans just one subreddit. ```bash rstat --subreddit wallstreetbets --days 1 ``` * **Back-fill data for last week:** Scans a specific subreddit for all new posts in the last 7 days. ```bash rstat --subreddit Tollbugatabets --days 7 ``` * **Get help and see all options:** ```bash rstat --help ``` ### 2. The Web Dashboard (`rstat-dashboard`) This command starts a local web server to let you explore the data you've collected. **How to Run:** 1. Make sure you have run the `rstat` scraper at least once to populate the database. 2. Start the web server: ```bash rstat-dashboard ``` 3. Open your web browser and navigate to **http://127.0.0.1:5000**. **Dashboard Features:** * **Main Page:** Shows the Top 10 most mentioned tickers across all scanned subreddits. * **Subreddit Pages:** Click any subreddit in the navigation bar to see a dashboard specific to that community. * **Deep Dive:** In any table, click on a ticker's symbol to see a detailed breakdown of every post it was mentioned in. * **Shareable Images:** On a subreddit's page, click "(View Daily Image)" or "(View Weekly Image)" to generate a polished, shareable summary card. ### 3. Exporting Shareable Images (`.png`) In addition to viewing the dashboards in a browser, the project includes a powerful script to programmatically save the 'image views' as static `.png` files. This is ideal for automation, scheduled tasks (cron jobs), or sharing the results on social media platforms like your `r/rstat` subreddit. #### One-Time Setup The image exporter uses the Playwright library to control a headless browser. Before using it for the first time, you must install the necessary browser runtimes with this command: ```bash playwright install ``` #### Usage Workflow The exporter works by taking a high-quality screenshot of the live web page. Therefore, the process requires two steps running in two separate terminals. **Step 1: Start the Web Dashboard** The web server must be running for the exporter to have a page to screenshot. Open a terminal and run: ```bash rstat-dashboard ``` Leave this terminal running. **Step 2: Run the Export Script** Open a **second terminal** in the same project directory. You can now run the `export_image.py` script with the desired arguments. **Examples:** * To export the **daily** summary image for `r/wallstreetbets`: ```bash python export_image.py wallstreetbets ``` * To export the **weekly** summary image for `r/wallstreetbets`: ```bash python export_image.py wallstreetbets --weekly ``` * To export the **overall** summary image (across all subreddits): ```bash python export_image.py --overall ``` #### Output After running a command, a new `.png` file (e.g., `wallstreetbets_daily_1690000000.png`) will be saved in the images-directory in the root directory of the project. ## 4. Full Automation: Posting to Reddit via Cron Job The final piece of the project is a script that automates the entire pipeline: scraping data, generating an image, and posting it to a target subreddit like `r/rstat`. This is designed to be run via a scheduled task or cron job. ### Prerequisites: One-Time Account Authorization (OAuth2) To post on your behalf, the script needs to be authorized with your Reddit account. This is done securely using OAuth2 and a `refresh_token`, which is compatible with 2-Factor Authentication (2FA). This is a **one-time setup process**. **Step 1: Get Your Refresh Token** 1. First, ensure the "redirect uri" in your [Reddit App settings](https://www.reddit.com/prefs/apps) is set to **exactly** `http://localhost:8080`. 2. Run the temporary helper script included in the project: ```bash python get_refresh_token.py ``` 3. The script will print a unique URL. Copy this URL and paste it into your web browser. 4. Log in to the Reddit account you want to post from and click **"Allow"** when prompted. 5. You'll be redirected to a `localhost:8080` page that says "This site can’t be reached". **This is normal and expected.** 6. Copy the **full URL** from your browser's address bar. It will look something like `http://localhost:8080/?state=...&code=...`. 7. Paste this full URL back into the terminal where the script is waiting and press Enter. 8. The script will output your unique **refresh token**. **Step 2: Update Your `.env` File** 1. Open your `.env` file. 2. Add a new line and paste your refresh token into it. 3. Ensure your file now contains the following (your username and password are no longer needed): ``` REDDIT_CLIENT_ID=your_client_id_from_reddit REDDIT_CLIENT_SECRET=your_client_secret_from_reddit REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.2) REDDIT_REFRESH_TOKEN=the_long_refresh_token_string_you_just_copied ``` You can now safely delete the `get_refresh_token.py` script. Your application is now authorized to post on your behalf indefinitely. ### The `post_to_reddit.py` Script This is the standalone script that finds the most recently generated image and posts it to Reddit using your new authorization. **Manual Usage:** * **Post the latest OVERALL summary image to `r/rstat`:** ```bash python post_to_reddit.py ``` * **Post the latest DAILY image for a specific subreddit:** ```bash python post_to_reddit.py --subreddit wallstreetbets ``` * **Post the latest WEEKLY image for a specific subreddit:** ```bash python post_to_reddit.py --subreddit wallstreetbets --weekly ``` ### Setting Up the Cron Job To run the entire pipeline automatically every day, you can use a simple shell script controlled by `cron`. **Step 1: Create a Job Script** Create a file named `run_daily_job.sh` in the root of your project directory. **`run_daily_job.sh`:** ```bash #!/bin/bash # CRITICAL: Navigate to the project directory using an absolute path. # Replace '/path/to/your/project/reddit_stock_analyzer' with your actual path. cd /path/to/your/project/reddit_stock_analyzer # CRITICAL: Activate the virtual environment using an absolute path. source /path/to/your/project/reddit_stock_analyzer/.venv/bin/activate echo "--- Starting RSTAT Daily Job on $(date) ---" # 1. Scrape data from the last 24 hours. echo "Step 1: Scraping new data..." rstat --days 1 # 2. Start the dashboard in the background. echo "Step 2: Starting dashboard in background..." rstat-dashboard & DASHBOARD_PID=$! sleep 10 # 3. Export the overall summary image. echo "Step 3: Exporting overall summary image..." python export_image.py --overall # 4. Post the image to r/rstat. echo "Step 4: Posting image to Reddit..." python post_to_reddit.py --target-subreddit rstat # 5. Clean up by stopping the dashboard server. echo "Step 5: Stopping dashboard server..." kill $DASHBOARD_PID echo "--- RSTAT Daily Job Complete ---" ``` **Before proceeding, you must edit the two absolute paths at the top of this script to match your system.** **Step 2: Make the Script Executable** ```bash chmod +x run_daily_job.sh ``` **Step 3: Schedule the Cron Job** 1. Run `crontab -e` to open your crontab editor. 2. Add the following line to run the script every day at 10:00 PM and log its output: ``` 0 22 * * * /path/to/your/project/reddit_stock_analyzer/run_daily_job.sh >> /path/to/your/project/reddit_stock_analyzer/cron.log 2>&1 ``` Your project is now fully and securely automated.