5.1 KiB
rstat - Reddit Stock Analyzer Tool
A powerful, installable command-line tool to scan Reddit for stock ticker mentions, perform sentiment analysis, and generate insightful summary reports.
Key Features
- Persistent Storage: Scraped data is stored in a local SQLite database (
reddit_stocks.db
), so you can track trends over time. - Deep Scanning: Analyzes both post titles and comments from a user-defined list of subreddits.
- Sentiment Analysis: Uses NLTK's VADER engine to calculate a sentiment score (Bullish, Bearish, or Neutral) for each mention.
- Financial Data: Enriches ticker data by fetching market capitalization from Yahoo Finance, with intelligent caching to minimize API calls.
- Data Quality: Utilizes a configurable blacklist and smart filtering to ignore common words and reduce false positives (e.g., "YOLO", "CEO", "A").
- Automatic Cleanup: Automatically purges old, invalid data from the database if you update the ticker blacklist.
- Installable Command: Packaged with
setuptools
, allowing you to install the tool and run it from anywhere on your system using therstat
command. - Flexible Reporting: The final report can be customized using command-line arguments to control the number of results shown.
Project Structure
reddit_stock_analyzer/
├── .env # Your secret API keys
├── requirements.txt # Project dependencies
├── setup.py # Installation script for the tool
├── subreddits.json # Configuration for which subreddits to scan
├── rstat_tool/ # The main source code package
│ ├── __init__.py
│ ├── main.py # Main entry point and CLI logic
│ ├── database.py # All SQLite database functions
│ ├── sentiment_analyzer.py
│ ├── setup_nltk.py # One-time NLTK setup script
│ └── ticker_extractor.py
└── ...
Setup and Installation
Follow these steps to set up the project on your local machine.
1. Prerequisites
- Python 3.7+
- Git
2. Clone the Repository
git clone <your-repository-url>
cd reddit_stock_analyzer
3. Set Up a Python Virtual Environment
It is highly recommended to use a virtual environment to manage dependencies.
On macOS / Linux:
python3 -m venv .venv
source .venv/bin/activate
On Windows:
python -m venv .venv
.\.venv\Scripts\activate
4. Install Dependencies
pip install -r requirements.txt
5. Configure Reddit API Credentials
The tool needs API access to read data from Reddit.
-
Go to the Reddit Apps preferences page and create a new "script" app.
-
Create a file named
.env
in the root of the project directory. -
Add your credentials to the
.env
file like this:REDDIT_CLIENT_ID=your_client_id_from_reddit REDDIT_CLIENT_SECRET=your_client_secret_from_reddit REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.0)
IMPORTANT: Never commit your
.env
file to version control.
6. Set Up NLTK
Run the included setup script once to download the required vader_lexicon
for sentiment analysis.
python rstat_tool/setup_nltk.py
7. Build and Install the rstat
Command
Install the tool in "editable" mode. This creates the rstat
command in your virtual environment and links it to your source code. Any changes you make to the code will be immediately available.
pip install -e .
The installation is now complete.
Configuration
Subreddits
Modify the subreddits.json
file to define which communities the tool should scan.
{
"subreddits": [
"wallstreetbets",
"stocks",
"investing",
"options"
]
}
Ticker Blacklist (Advanced)
To improve data quality, you can add common words that are mistaken for tickers to the COMMON_WORDS_BLACKLIST
set inside the rstat_tool/ticker_extractor.py
file. The tool will automatically clean the database of these tickers on the next run.
Usage
Once installed, you can run the tool from any directory using the rstat
command.
Basic Usage
Run an analysis using the default settings (scans 25 posts, 100 comments/post, shows top 20 tickers).
rstat subreddits.json
Advanced Usage with Arguments
Use command-line arguments to control the scan and the report.
# Scan only 10 posts, 50 comments per post, and show a report of the top 5 tickers
rstat subreddits.json --posts 10 --comments 50 --limit 5
Getting Help
To see all available commands and their descriptions:
rstat --help
Example Output
--- Top 5 Tickers by Mention Count ---
Ticker | Mentions | Bullish | Bearish | Neutral | Market Cap
---------------------------------------------------------------------------
TSLA | 183 | 95 | 48 | 40 | $580.45B
NVDA | 155 | 110 | 15 | 30 | $1.15T
AAPL | 98 | 50 | 21 | 27 | $2.78T
SPY | 76 | 30 | 35 | 11 | N/A
AMD | 62 | 45 | 8 | 9 | $175.12B