12 KiB
rstat - Reddit Stock Analyzer
A powerful, installable command-line tool and web dashboard to scan Reddit for stock ticker mentions, perform sentiment analysis, generate insightful reports, and create shareable summary images.
Key Features
- Dual-Interface: Use a flexible command-line tool (
rstat
) for data collection and a simple web dashboard (rstat-dashboard
) for data visualization. - Flexible Data Scraping:
- Scan subreddits from a config file or target a single subreddit on the fly.
- Configure the time window to scan posts from the last 24 hours (for daily cron jobs) or back-fill data from several past days (e.g., last 7 days).
- Fetches from
/new
to capture the most recent discussions.
- Deep Analysis & Storage:
- Scans both post titles and comments, differentiating between the two.
- Performs a "deep dive" analysis on posts to calculate the average sentiment of the entire comment section.
- Persists all data in a local SQLite database (
reddit_stocks.db
) to track trends over time.
- Rich Data Enrichment:
- Calculates sentiment (Bullish, Bearish, Neutral) for every mention using NLTK.
- Fetches and stores daily closing prices and market capitalization from Yahoo Finance.
- Interactive Web Dashboard:
- View Top 10 tickers across all subreddits or on a per-subreddit basis.
- Click any ticker to get a "Deep Dive" page, showing every post it was mentioned in.
- Shareable Summary Images:
- Generate clean, dark-mode summary images for both daily and weekly sentiment for any subreddit, perfect for sharing.
- High-Quality Data:
- Uses a configurable blacklist and smart filtering to reduce false positives.
- Automatically cleans the database of invalid tickers if the blacklist is updated.
Project Structure
reddit_stock_analyzer/
├── .env # Your secret API keys
├── requirements.txt # Project dependencies
├── setup.py # Installation script for the tool
├── subreddits.json # Default list of subreddits to scan
├── templates/ # HTML templates for the web dashboard
│ ├── base.html
│ ├── index.html
│ ├── subreddit.html
│ ├── deep_dive.html
│ ├── image_view.html
│ └── weekly_image_view.html
└── rstat_tool/ # The main source code package
├── __init__.py
├── main.py # Scraper entry point and CLI logic
├── dashboard.py # Web dashboard entry point (Flask app)
├── database.py # All SQLite database functions
└── ...
Setup and Installation
Follow these steps to set up the project on your local machine.
1. Prerequisites
- Python 3.7+
- Git
2. Clone the Repository
git clone <your-repository-url>
cd reddit_stock_analyzer
3. Set Up a Python Virtual Environment
It is highly recommended to use a virtual environment to manage dependencies.
On macOS / Linux:
python3 -m venv .venv
source .venv/bin/activate
On Windows:
python -m venv .venv
.\.venv\Scripts\activate
4. Install Dependencies
pip install -r requirements.txt
5. Configure Reddit API Credentials
-
Go to the Reddit Apps preferences page and create a new "script" app.
-
Create a file named
.env
in the root of the project directory. -
Add your credentials to the
.env
file like this:REDDIT_CLIENT_ID=your_client_id_from_reddit REDDIT_CLIENT_SECRET=your_client_secret_from_reddit REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.2)
6. Set Up NLTK
Run the included setup script once to download the required vader_lexicon
for sentiment analysis.
python rstat_tool/setup_nltk.py
7. Build and Install the Commands
Install the tool in "editable" mode. This creates the rstat
and rstat-dashboard
commands in your virtual environment and links them to your source code.
pip install -e .
The installation is now complete.
Usage
The tool is split into two commands: one for gathering data and one for viewing it.
1. The Scraper (rstat
)
This is the command-line tool you will use to populate the database. It is highly flexible.
Common Commands:
-
Run a daily scan (for cron jobs): Scans subreddits from
subreddits.json
for posts in the last 24 hours.rstat --config subreddits.json --days 1
-
Scan a single subreddit: Ignores the config file and scans just one subreddit.
rstat --subreddit wallstreetbets --days 1
-
Back-fill data for last week: Scans a specific subreddit for all new posts in the last 7 days.
rstat --subreddit Tollbugatabets --days 7
-
Get help and see all options:
rstat --help
2. The Web Dashboard (rstat-dashboard
)
This command starts a local web server to let you explore the data you've collected.
How to Run:
- Make sure you have run the
rstat
scraper at least once to populate the database. - Start the web server:
rstat-dashboard
- Open your web browser and navigate to http://127.0.0.1:5000.
Dashboard Features:
- Main Page: Shows the Top 10 most mentioned tickers across all scanned subreddits.
- Subreddit Pages: Click any subreddit in the navigation bar to see a dashboard specific to that community.
- Deep Dive: In any table, click on a ticker's symbol to see a detailed breakdown of every post it was mentioned in.
- Shareable Images: On a subreddit's page, click "(View Daily Image)" or "(View Weekly Image)" to generate a polished, shareable summary card.
3. Exporting Shareable Images (.png
)
In addition to viewing the dashboards in a browser, the project includes a powerful script to programmatically save the 'image views' as static .png
files. This is ideal for automation, scheduled tasks (cron jobs), or sharing the results on social media platforms like your r/rstat
subreddit.
One-Time Setup
The image exporter uses the Playwright library to control a headless browser. Before using it for the first time, you must install the necessary browser runtimes with this command:
playwright install
Usage Workflow
The exporter works by taking a high-quality screenshot of the live web page. Therefore, the process requires two steps running in two separate terminals.
Step 1: Start the Web Dashboard
The web server must be running for the exporter to have a page to screenshot. Open a terminal and run:
rstat-dashboard
Leave this terminal running.
Step 2: Run the Export Script
Open a second terminal in the same project directory. You can now run the export_image.py
script with the desired arguments.
Examples:
-
To export the daily summary image for
r/wallstreetbets
:python export_image.py wallstreetbets
-
To export the weekly summary image for
r/wallstreetbets
:python export_image.py wallstreetbets --weekly
-
To export the overall summary image (across all subreddits):
python export_image.py --overall
Output
After running a command, a new .png
file (e.g., wallstreetbets_daily_1690000000.png
) will be saved in the images-directory in the root directory of the project.
4. Full Automation: Posting to Reddit via Cron Job
The final piece of the project is a script that automates the entire pipeline: scraping data, generating an image, and posting it to a target subreddit like r/rstat
. This is designed to be run via a scheduled task or cron job.
Prerequisites: One-Time Account Authorization (OAuth2)
To post on your behalf, the script needs to be authorized with your Reddit account. This is done securely using OAuth2 and a refresh_token
, which is compatible with 2-Factor Authentication (2FA). This is a one-time setup process.
Step 1: Get Your Refresh Token
- First, ensure the "redirect uri" in your Reddit App settings is set to exactly
http://localhost:8080
. - Run the temporary helper script included in the project:
python get_refresh_token.py
- The script will print a unique URL. Copy this URL and paste it into your web browser.
- Log in to the Reddit account you want to post from and click "Allow" when prompted.
- You'll be redirected to a
localhost:8080
page that says "This site can’t be reached". This is normal and expected. - Copy the full URL from your browser's address bar. It will look something like
http://localhost:8080/?state=...&code=...
. - Paste this full URL back into the terminal where the script is waiting and press Enter.
- The script will output your unique refresh token.
Step 2: Update Your .env
File
- Open your
.env
file. - Add a new line and paste your refresh token into it.
- Ensure your file now contains the following (your username and password are no longer needed):
REDDIT_CLIENT_ID=your_client_id_from_reddit REDDIT_CLIENT_SECRET=your_client_secret_from_reddit REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.2) REDDIT_REFRESH_TOKEN=the_long_refresh_token_string_you_just_copied
You can now safely delete the get_refresh_token.py
script. Your application is now authorized to post on your behalf indefinitely.
The post_to_reddit.py
Script
This is the standalone script that finds the most recently generated image and posts it to Reddit using your new authorization.
Manual Usage:
-
Post the latest OVERALL summary image to
r/rstat
:python post_to_reddit.py
-
Post the latest DAILY image for a specific subreddit:
python post_to_reddit.py --subreddit wallstreetbets
-
Post the latest WEEKLY image for a specific subreddit:
python post_to_reddit.py --subreddit wallstreetbets --weekly
Setting Up the Cron Job
To run the entire pipeline automatically every day, you can use a simple shell script controlled by cron
.
Step 1: Create a Job Script
Create a file named run_daily_job.sh
in the root of your project directory.
run_daily_job.sh
:
#!/bin/bash
# CRITICAL: Navigate to the project directory using an absolute path.
# Replace '/path/to/your/project/reddit_stock_analyzer' with your actual path.
cd /path/to/your/project/reddit_stock_analyzer
# CRITICAL: Activate the virtual environment using an absolute path.
source /path/to/your/project/reddit_stock_analyzer/.venv/bin/activate
echo "--- Starting RSTAT Daily Job on $(date) ---"
# 1. Scrape data from the last 24 hours.
echo "Step 1: Scraping new data..."
rstat --days 1
# 2. Start the dashboard in the background.
echo "Step 2: Starting dashboard in background..."
rstat-dashboard &
DASHBOARD_PID=$!
sleep 10
# 3. Export the overall summary image.
echo "Step 3: Exporting overall summary image..."
python export_image.py --overall
# 4. Post the image to r/rstat.
echo "Step 4: Posting image to Reddit..."
python post_to_reddit.py --target-subreddit rstat
# 5. Clean up by stopping the dashboard server.
echo "Step 5: Stopping dashboard server..."
kill $DASHBOARD_PID
echo "--- RSTAT Daily Job Complete ---"
Before proceeding, you must edit the two absolute paths at the top of this script to match your system.
Step 2: Make the Script Executable```bash chmod +x run_daily_job.sh
**Step 3: Schedule the Cron Job**
1. Run `crontab -e` to open your crontab editor.
2. Add the following line to run the script every day at 10:00 PM and log its output:
```
0 22 * * * /path/to/your/project/reddit_stock_analyzer/run_daily_job.sh >> /path/to/your/project/reddit_stock_analyzer/cron.log 2>&1
```
Your project is now fully and securely automated.