Files
reddit_stock_analyzer/README.md
2025-07-29 19:23:31 +02:00

336 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# rstat - Reddit Stock Analyzer
A powerful, installable command-line tool and web dashboard to scan Reddit for stock ticker mentions, perform sentiment analysis, generate insightful reports, and create shareable summary images.
## Key Features
* **Dual-Interface:** Use a flexible command-line tool (`rstat`) for data collection and a simple web dashboard (`rstat-dashboard`) for data visualization.
* **Flexible Data Scraping:**
* Scan subreddits from a config file or target a single subreddit on the fly.
* Configure the time window to scan posts from the last 24 hours (for daily cron jobs) or back-fill data from several past days (e.g., last 7 days).
* Fetches from `/new` to capture the most recent discussions.
* **Deep Analysis & Storage:**
* Scans both post titles and comments, differentiating between the two.
* Performs a "deep dive" analysis on posts to calculate the average sentiment of the entire comment section.
* Persists all data in a local SQLite database (`reddit_stocks.db`) to track trends over time.
* **Rich Data Enrichment:**
* Calculates sentiment (Bullish, Bearish, Neutral) for every mention using NLTK.
* Fetches and stores daily closing prices and market capitalization from Yahoo Finance.
* **Interactive Web Dashboard:**
* View Top 10 tickers across all subreddits or on a per-subreddit basis.
* Click any ticker to get a "Deep Dive" page, showing every post it was mentioned in.
* **Shareable Summary Images:**
* Generate clean, dark-mode summary images for both daily and weekly sentiment for any subreddit, perfect for sharing.
* **High-Quality Data:**
* Uses a configurable blacklist and smart filtering to reduce false positives.
* Automatically cleans the database of invalid tickers if the blacklist is updated.
## Project Structure
```
reddit_stock_analyzer/
├── .env # Your secret API keys
├── requirements.txt # Project dependencies
├── setup.py # Installation script for the tool
├── subreddits.json # Default list of subreddits to scan
├── templates/ # HTML templates for the web dashboard
│ ├── base.html
│ ├── index.html
│ ├── subreddit.html
│ ├── deep_dive.html
│ ├── image_view.html
│ └── weekly_image_view.html
└── rstat_tool/ # The main source code package
├── __init__.py
├── main.py # Scraper entry point and CLI logic
├── dashboard.py # Web dashboard entry point (Flask app)
├── database.py # All SQLite database functions
└── ...
```
## Setup and Installation
Follow these steps to set up the project on your local machine.
### 1. Prerequisites
* Python 3.7+
* Git
### 2. Clone the Repository
```bash
git clone <your-repository-url>
cd reddit_stock_analyzer
```
### 3. Set Up a Python Virtual Environment
It is highly recommended to use a virtual environment to manage dependencies.
**On macOS / Linux:**
```bash
python3 -m venv .venv
source .venv/bin/activate
```
**On Windows:**
```bash
python -m venv .venv
.\.venv\Scripts\activate
```
### 4. Install Dependencies
```bash
pip install -r requirements.txt
```
### 5. Configure Reddit API Credentials
1. Go to the [Reddit Apps preferences page](https://www.reddit.com/prefs/apps) and create a new "script" app.
2. Create a file named `.env` in the root of the project directory.
3. Add your credentials to the `.env` file like this:
```
REDDIT_CLIENT_ID=your_client_id_from_reddit
REDDIT_CLIENT_SECRET=your_client_secret_from_reddit
REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.2)
```
### 6. Set Up NLTK
Run the included setup script **once** to download the required `vader_lexicon` for sentiment analysis.
```bash
python rstat_tool/setup_nltk.py
```
### 7. Set Up Playwright
Run the install routine for playwright. You might need to install some dependencies. Follow on-screen instruction if that's the case.
```bash
playwright install
```
### 8. Build and Install the Commands
Install the tool in "editable" mode. This creates the `rstat` and `rstat-dashboard` commands in your virtual environment and links them to your source code.
```bash
pip install -e .
```
The installation is now complete.
---
## Usage
The tool is split into two commands: one for gathering data and one for viewing it.
### 1. The Scraper (`rstat`)
This is the command-line tool you will use to populate the database. It is highly flexible.
**Common Commands:**
* **Run a daily scan (for cron jobs):** Scans subreddits from `subreddits.json` for posts in the last 24 hours.
```bash
rstat --config subreddits.json --days 1
```
* **Scan a single subreddit:** Ignores the config file and scans just one subreddit.
```bash
rstat --subreddit wallstreetbets --days 1
```
* **Back-fill data for last week:** Scans a specific subreddit for all new posts in the last 7 days.
```bash
rstat --subreddit Tollbugatabets --days 7
```
* **Get help and see all options:**
```bash
rstat --help
```
### 2. The Web Dashboard (`rstat-dashboard`)
This command starts a local web server to let you explore the data you've collected.
**How to Run:**
1. Make sure you have run the `rstat` scraper at least once to populate the database.
2. Start the web server:
```bash
rstat-dashboard
```
3. Open your web browser and navigate to **http://127.0.0.1:5000**.
**Dashboard Features:**
* **Main Page:** Shows the Top 10 most mentioned tickers across all scanned subreddits.
* **Subreddit Pages:** Click any subreddit in the navigation bar to see a dashboard specific to that community.
* **Deep Dive:** In any table, click on a ticker's symbol to see a detailed breakdown of every post it was mentioned in.
* **Shareable Images:** On a subreddit's page, click "(View Daily Image)" or "(View Weekly Image)" to generate a polished, shareable summary card.
### 3. Exporting Shareable Images (`.png`)
In addition to viewing the dashboards in a browser, the project includes a powerful script to programmatically save the 'image views' as static `.png` files. This is ideal for automation, scheduled tasks (cron jobs), or sharing the results on social media platforms like your `r/rstat` subreddit.
#### One-Time Setup
The image exporter uses the Playwright library to control a headless browser. Before using it for the first time, you must install the necessary browser runtimes with this command:
```bash
playwright install
```
#### Usage Workflow
The exporter works by taking a high-quality screenshot of the live web page. Therefore, the process requires two steps running in two separate terminals.
**Step 1: Start the Web Dashboard**
The web server must be running for the exporter to have a page to screenshot. Open a terminal and run:
```bash
rstat-dashboard
```
Leave this terminal running.
**Step 2: Run the Export Script**
Open a **second terminal** in the same project directory. You can now run the `export_image.py` script with the desired arguments.
**Examples:**
* To export the **daily** summary image for `r/wallstreetbets`:
```bash
python export_image.py wallstreetbets
```
* To export the **weekly** summary image for `r/wallstreetbets`:
```bash
python export_image.py wallstreetbets --weekly
```
* To export the **overall** summary image (across all subreddits):
```bash
python export_image.py --overall
```
#### Output
After running a command, a new `.png` file (e.g., `wallstreetbets_daily_1690000000.png`) will be saved in the images-directory in the root directory of the project.
## 4. Full Automation: Posting to Reddit via Cron Job
The final piece of the project is a script that automates the entire pipeline: scraping data, generating an image, and posting it to a target subreddit like `r/rstat`. This is designed to be run via a scheduled task or cron job.
### Prerequisites: One-Time Account Authorization (OAuth2)
To post on your behalf, the script needs to be authorized with your Reddit account. This is done securely using OAuth2 and a `refresh_token`, which is compatible with 2-Factor Authentication (2FA). This is a **one-time setup process**.
**Step 1: Get Your Refresh Token**
1. First, ensure the "redirect uri" in your [Reddit App settings](https://www.reddit.com/prefs/apps) is set to **exactly** `http://localhost:8080`.
2. Run the temporary helper script included in the project:
```bash
python get_refresh_token.py
```
3. The script will print a unique URL. Copy this URL and paste it into your web browser.
4. Log in to the Reddit account you want to post from and click **"Allow"** when prompted.
5. You'll be redirected to a `localhost:8080` page that says "This site cant be reached". **This is normal and expected.**
6. Copy the **full URL** from your browser's address bar. It will look something like `http://localhost:8080/?state=...&code=...`.
7. Paste this full URL back into the terminal where the script is waiting and press Enter.
8. The script will output your unique **refresh token**.
**Step 2: Update Your `.env` File**
1. Open your `.env` file.
2. Add a new line and paste your refresh token into it.
3. Ensure your file now contains the following (your username and password are no longer needed):
```
REDDIT_CLIENT_ID=your_client_id_from_reddit
REDDIT_CLIENT_SECRET=your_client_secret_from_reddit
REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.2)
REDDIT_REFRESH_TOKEN=the_long_refresh_token_string_you_just_copied
```
You can now safely delete the `get_refresh_token.py` script. Your application is now authorized to post on your behalf indefinitely.
### The `post_to_reddit.py` Script
This is the standalone script that finds the most recently generated image and posts it to Reddit using your new authorization.
**Manual Usage:**
* **Post the latest OVERALL summary image to `r/rstat`:**
```bash
python post_to_reddit.py
```
* **Post the latest DAILY image for a specific subreddit:**
```bash
python post_to_reddit.py --subreddit wallstreetbets
```
* **Post the latest WEEKLY image for a specific subreddit:**
```bash
python post_to_reddit.py --subreddit wallstreetbets --weekly
```
### Setting Up the Cron Job
To run the entire pipeline automatically every day, you can use a simple shell script controlled by `cron`.
**Step 1: Create a Job Script**
Create a file named `run_daily_job.sh` in the root of your project directory.
**`run_daily_job.sh`:**
```bash
#!/bin/bash
# CRITICAL: Navigate to the project directory using an absolute path.
# Replace '/path/to/your/project/reddit_stock_analyzer' with your actual path.
cd /path/to/your/project/reddit_stock_analyzer
# CRITICAL: Activate the virtual environment using an absolute path.
source /path/to/your/project/reddit_stock_analyzer/.venv/bin/activate
echo "--- Starting RSTAT Daily Job on $(date) ---"
# 1. Scrape data from the last 24 hours.
echo "Step 1: Scraping new data..."
rstat --days 1
# 2. Start the dashboard in the background.
echo "Step 2: Starting dashboard in background..."
rstat-dashboard &
DASHBOARD_PID=$!
sleep 10
# 3. Export the overall summary image.
echo "Step 3: Exporting overall summary image..."
python export_image.py --overall
# 4. Post the image to r/rstat.
echo "Step 4: Posting image to Reddit..."
python post_to_reddit.py --target-subreddit rstat
# 5. Clean up by stopping the dashboard server.
echo "Step 5: Stopping dashboard server..."
kill $DASHBOARD_PID
echo "--- RSTAT Daily Job Complete ---"
```
**Before proceeding, you must edit the two absolute paths at the top of this script to match your system.**
**Step 2: Make the Script Executable**
```bash
chmod +x run_daily_job.sh
```
**Step 3: Schedule the Cron Job**
1. Run `crontab -e` to open your crontab editor.
2. Add the following line to run the script every day at 10:00 PM and log its output:
```
0 22 * * * /path/to/your/project/reddit_stock_analyzer/run_daily_job.sh >> /path/to/your/project/reddit_stock_analyzer/cron.log 2>&1
```
Your project is now fully and securely automated.