322 lines
11 KiB
Markdown
322 lines
11 KiB
Markdown
# rstat - Reddit Stock Analyzer
|
|
|
|
A powerful, installable command-line tool and web dashboard to scan Reddit for stock ticker mentions, perform sentiment analysis, generate insightful reports, and create shareable summary images.
|
|
|
|
## Key Features
|
|
|
|
* **Dual-Interface:** Use a flexible command-line tool (`rstat`) for data collection and a simple web dashboard (`rstat-dashboard`) for data visualization.
|
|
* **Flexible Data Scraping:**
|
|
* Scan subreddits from a config file or target a single subreddit on the fly.
|
|
* Configure the time window to scan posts from the last 24 hours (for daily cron jobs) or back-fill data from several past days (e.g., last 7 days).
|
|
* Fetches from `/new` to capture the most recent discussions.
|
|
* **Deep Analysis & Storage:**
|
|
* Scans both post titles and comments, differentiating between the two.
|
|
* Performs a "deep dive" analysis on posts to calculate the average sentiment of the entire comment section.
|
|
* Persists all data in a local SQLite database (`reddit_stocks.db`) to track trends over time.
|
|
* **Rich Data Enrichment:**
|
|
* Calculates sentiment (Bullish, Bearish, Neutral) for every mention using NLTK.
|
|
* Fetches and stores daily closing prices and market capitalization from Yahoo Finance.
|
|
* **Interactive Web Dashboard:**
|
|
* View Top 10 tickers across all subreddits or on a per-subreddit basis.
|
|
* Click any ticker to get a "Deep Dive" page, showing every post it was mentioned in.
|
|
* **Shareable Summary Images:**
|
|
* Generate clean, dark-mode summary images for both daily and weekly sentiment for any subreddit, perfect for sharing.
|
|
* **High-Quality Data:**
|
|
* Uses a configurable blacklist and smart filtering to reduce false positives.
|
|
* Automatically cleans the database of invalid tickers if the blacklist is updated.
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
reddit_stock_analyzer/
|
|
├── .env # Your secret API keys
|
|
├── requirements.txt # Project dependencies
|
|
├── setup.py # Installation script for the tool
|
|
├── subreddits.json # Default list of subreddits to scan
|
|
├── templates/ # HTML templates for the web dashboard
|
|
│ ├── base.html
|
|
│ ├── index.html
|
|
│ ├── subreddit.html
|
|
│ ├── deep_dive.html
|
|
│ ├── image_view.html
|
|
│ └── weekly_image_view.html
|
|
└── rstat_tool/ # The main source code package
|
|
├── __init__.py
|
|
├── main.py # Scraper entry point and CLI logic
|
|
├── dashboard.py # Web dashboard entry point (Flask app)
|
|
├── database.py # All SQLite database functions
|
|
└── ...
|
|
```
|
|
|
|
## Setup and Installation
|
|
|
|
Follow these steps to set up the project on your local machine.
|
|
|
|
### 1. Prerequisites
|
|
* Python 3.7+
|
|
* Git
|
|
|
|
### 2. Clone the Repository
|
|
```bash
|
|
git clone <your-repository-url>
|
|
cd reddit_stock_analyzer
|
|
```
|
|
|
|
### 3. Set Up a Python Virtual Environment
|
|
It is highly recommended to use a virtual environment to manage dependencies.
|
|
|
|
**On macOS / Linux:**
|
|
```bash
|
|
python3 -m venv .venv
|
|
source .venv/bin/activate
|
|
```
|
|
|
|
**On Windows:**
|
|
```bash
|
|
python -m venv .venv
|
|
.\.venv\Scripts\activate
|
|
```
|
|
|
|
### 4. Install Dependencies
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### 5. Configure Reddit API Credentials
|
|
1. Go to the [Reddit Apps preferences page](https://www.reddit.com/prefs/apps) and create a new "script" app.
|
|
2. Create a file named `.env` in the root of the project directory.
|
|
3. Add your credentials to the `.env` file like this:
|
|
|
|
```
|
|
REDDIT_CLIENT_ID=your_client_id_from_reddit
|
|
REDDIT_CLIENT_SECRET=your_client_secret_from_reddit
|
|
REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.2)
|
|
```
|
|
|
|
### 6. Set Up NLTK
|
|
Run the included setup script **once** to download the required `vader_lexicon` for sentiment analysis.
|
|
```bash
|
|
python rstat_tool/setup_nltk.py
|
|
```
|
|
|
|
### 7. Build and Install the Commands
|
|
Install the tool in "editable" mode. This creates the `rstat` and `rstat-dashboard` commands in your virtual environment and links them to your source code.
|
|
|
|
```bash
|
|
pip install -e .
|
|
```
|
|
The installation is now complete.
|
|
|
|
---
|
|
|
|
## Usage
|
|
|
|
The tool is split into two commands: one for gathering data and one for viewing it.
|
|
|
|
### 1. The Scraper (`rstat`)
|
|
|
|
This is the command-line tool you will use to populate the database. It is highly flexible.
|
|
|
|
**Common Commands:**
|
|
|
|
* **Run a daily scan (for cron jobs):** Scans subreddits from `subreddits.json` for posts in the last 24 hours.
|
|
```bash
|
|
rstat --config subreddits.json --days 1
|
|
```
|
|
|
|
* **Scan a single subreddit:** Ignores the config file and scans just one subreddit.
|
|
```bash
|
|
rstat --subreddit wallstreetbets --days 1
|
|
```
|
|
|
|
* **Back-fill data for last week:** Scans a specific subreddit for all new posts in the last 7 days.
|
|
```bash
|
|
rstat --subreddit Tollbugatabets --days 7
|
|
```
|
|
|
|
* **Get help and see all options:**
|
|
```bash
|
|
rstat --help
|
|
```
|
|
|
|
### 2. The Web Dashboard (`rstat-dashboard`)
|
|
|
|
This command starts a local web server to let you explore the data you've collected.
|
|
|
|
**How to Run:**
|
|
1. Make sure you have run the `rstat` scraper at least once to populate the database.
|
|
2. Start the web server:
|
|
```bash
|
|
rstat-dashboard
|
|
```
|
|
3. Open your web browser and navigate to **http://127.0.0.1:5000**.
|
|
|
|
**Dashboard Features:**
|
|
* **Main Page:** Shows the Top 10 most mentioned tickers across all scanned subreddits.
|
|
* **Subreddit Pages:** Click any subreddit in the navigation bar to see a dashboard specific to that community.
|
|
* **Deep Dive:** In any table, click on a ticker's symbol to see a detailed breakdown of every post it was mentioned in.
|
|
* **Shareable Images:** On a subreddit's page, click "(View Daily Image)" or "(View Weekly Image)" to generate a polished, shareable summary card.
|
|
|
|
|
|
### 3. Exporting Shareable Images (`.png`)
|
|
|
|
In addition to viewing the dashboards in a browser, the project includes a powerful script to programmatically save the 'image views' as static `.png` files. This is ideal for automation, scheduled tasks (cron jobs), or sharing the results on social media platforms like your `r/rstat` subreddit.
|
|
|
|
#### One-Time Setup
|
|
|
|
The image exporter uses the Playwright library to control a headless browser. Before using it for the first time, you must install the necessary browser runtimes with this command:
|
|
|
|
```bash
|
|
playwright install
|
|
```
|
|
|
|
#### Usage Workflow
|
|
|
|
The exporter works by taking a high-quality screenshot of the live web page. Therefore, the process requires two steps running in two separate terminals.
|
|
|
|
**Step 1: Start the Web Dashboard**
|
|
|
|
The web server must be running for the exporter to have a page to screenshot. Open a terminal and run:
|
|
|
|
```bash
|
|
rstat-dashboard
|
|
```
|
|
Leave this terminal running.
|
|
|
|
**Step 2: Run the Export Script**
|
|
|
|
Open a **second terminal** in the same project directory. You can now run the `export_image.py` script with the desired arguments.
|
|
|
|
**Examples:**
|
|
|
|
* To export the **daily** summary image for `r/wallstreetbets`:
|
|
```bash
|
|
python export_image.py wallstreetbets
|
|
```
|
|
|
|
* To export the **weekly** summary image for `r/wallstreetbets`:
|
|
```bash
|
|
python export_image.py wallstreetbets --weekly
|
|
```
|
|
|
|
* To export the **overall** summary image (across all subreddits):
|
|
```bash
|
|
python export_image.py --overall
|
|
```
|
|
|
|
#### Output
|
|
|
|
After running a command, a new `.png` file (e.g., `wallstreetbets_daily_1690000000.png`) will be saved in the images-directory in the root directory of the project.
|
|
|
|
|
|
|
|
## 4. Full Automation: Posting to Reddit via Cron Job
|
|
|
|
The final piece of the project is a script that automates the entire process: scraping data, generating an image, and posting it to a target subreddit like `r/rstat`. This is designed to be run via a scheduled task or cron job.
|
|
|
|
### Prerequisites for Posting
|
|
|
|
The posting script needs to log in to your Reddit account. You must add your Reddit username and password to your `.env` file.
|
|
|
|
**Add these two lines to your `.env` file:**
|
|
```
|
|
REDDIT_USERNAME=YourRedditUsername
|
|
REDDIT_PASSWORD=YourRedditPassword
|
|
```
|
|
*(For security, it's recommended to use a dedicated bot account for this, not your personal account.)*
|
|
|
|
### The `post_to_reddit.py` Script
|
|
|
|
This is a standalone script located in the project's root directory that finds the most recently generated image and posts it to Reddit.
|
|
|
|
**Manual Usage:**
|
|
|
|
You can run this script manually from your terminal. This is great for testing or one-off posts.
|
|
|
|
* **Post the latest OVERALL summary image to `r/rstat`:**
|
|
```bash
|
|
python post_to_reddit.py
|
|
```
|
|
|
|
* **Post the latest DAILY image for a specific subreddit:**
|
|
```bash
|
|
python post_to_reddit.py --subreddit wallstreetbets
|
|
```
|
|
|
|
* **Post the latest WEEKLY image for a specific subreddit:**
|
|
```bash
|
|
python post_to_reddit.py --subreddit wallstreetbets --weekly
|
|
```
|
|
|
|
* **Post to a different target subreddit (e.g., a test subreddit):**
|
|
```bash
|
|
python post_to_reddit.py --target-subreddit MyTestSub
|
|
```
|
|
|
|
### Setting Up the Cron Job for Full Automation
|
|
|
|
To run the entire pipeline automatically every day, you can use a simple shell script controlled by `cron`.
|
|
|
|
**Step 1: Create a Job Script**
|
|
|
|
Create a file named `run_daily_job.sh` in the root of your project directory. This script will run all the necessary commands in the correct order.
|
|
|
|
**`run_daily_job.sh`:**
|
|
```bash
|
|
#!/bin/bash
|
|
|
|
# CRITICAL: Navigate to the project directory using an absolute path.
|
|
# Replace '/path/to/your/project/reddit_stock_analyzer' with your actual path.
|
|
cd /path/to/your/project/reddit_stock_analyzer
|
|
|
|
# CRITICAL: Activate the virtual environment using an absolute path.
|
|
source /path/to/your/project/reddit_stock_analyzer/.venv/bin/activate
|
|
|
|
echo "--- Starting RSTAT Daily Job on $(date) ---"
|
|
|
|
# 1. Scrape data from the last 24 hours for all subreddits in the config.
|
|
echo "Step 1: Scraping new data..."
|
|
rstat --config subreddits.json --days 1
|
|
|
|
# 2. Start the dashboard in the background so the exporter can access it.
|
|
echo "Step 2: Starting dashboard in background..."
|
|
rstat-dashboard &
|
|
DASHBOARD_PID=$!
|
|
|
|
# Give the server a moment to start up.
|
|
sleep 10
|
|
|
|
# 3. Export the overall summary image.
|
|
echo "Step 3: Exporting overall summary image..."
|
|
python export_image.py --overall
|
|
|
|
# 4. Post the newly created overall summary image to r/rstat.
|
|
echo "Step 4: Posting image to Reddit..."
|
|
python post_to_reddit.py --target-subreddit rstat
|
|
|
|
# 5. Clean up by stopping the background dashboard server.
|
|
echo "Step 5: Stopping dashboard server..."
|
|
kill $DASHBOARD_PID
|
|
|
|
echo "--- RSTAT Daily Job Complete ---"
|
|
```**Before proceeding, you must edit the two absolute paths at the top of this script to match your system.**
|
|
|
|
**Step 2: Make the Script Executable**
|
|
|
|
In your terminal, run the following command:
|
|
```bash
|
|
chmod +x run_daily_job.sh
|
|
```
|
|
|
|
**Step 3: Schedule the Cron Job**
|
|
|
|
1. Open your crontab editor by running `crontab -e`.
|
|
2. Add a new line to the file to schedule the job. For example, to run the script **every day at 10:00 PM**, add the following line:
|
|
|
|
```
|
|
0 22 * * * /path/to/your/project/reddit_stock_analyzer/run_daily_job.sh >> /path/to/your/project/reddit_stock_analyzer/cron.log 2>&1
|
|
```
|
|
* `0 22 * * *` means at minute 0 of hour 22, every day, every month, every day of the week.
|
|
* `>> /path/to/your/.../cron.log 2>&1` is highly recommended. It redirects all output (both standard and error) from the script into a log file, so you can check if the job ran successfully.
|
|
|
|
Your project is now fully automated to scrape, analyze, visualize, and post data every day. |