Files
reddit_stock_analyzer/README.md
2025-07-21 15:46:41 +02:00

154 lines
5.1 KiB
Markdown

# rstat - Reddit Stock Analyzer Tool
A powerful, installable command-line tool to scan Reddit for stock ticker mentions, perform sentiment analysis, and generate insightful summary reports.
## Key Features
* **Persistent Storage:** Scraped data is stored in a local SQLite database (`reddit_stocks.db`), so you can track trends over time.
* **Deep Scanning:** Analyzes both post titles and comments from a user-defined list of subreddits.
* **Sentiment Analysis:** Uses NLTK's VADER engine to calculate a sentiment score (Bullish, Bearish, or Neutral) for each mention.
* **Financial Data:** Enriches ticker data by fetching market capitalization from Yahoo Finance, with intelligent caching to minimize API calls.
* **Data Quality:** Utilizes a configurable blacklist and smart filtering to ignore common words and reduce false positives (e.g., "YOLO", "CEO", "A").
* **Automatic Cleanup:** Automatically purges old, invalid data from the database if you update the ticker blacklist.
* **Installable Command:** Packaged with `setuptools`, allowing you to install the tool and run it from anywhere on your system using the `rstat` command.
* **Flexible Reporting:** The final report can be customized using command-line arguments to control the number of results shown.
## Project Structure
```
reddit_stock_analyzer/
├── .env # Your secret API keys
├── requirements.txt # Project dependencies
├── setup.py # Installation script for the tool
├── subreddits.json # Configuration for which subreddits to scan
├── rstat_tool/ # The main source code package
│ ├── __init__.py
│ ├── main.py # Main entry point and CLI logic
│ ├── database.py # All SQLite database functions
│ ├── sentiment_analyzer.py
│ ├── setup_nltk.py # One-time NLTK setup script
│ └── ticker_extractor.py
└── ...
```
## Setup and Installation
Follow these steps to set up the project on your local machine.
### 1. Prerequisites
* Python 3.7+
* Git
### 2. Clone the Repository
```bash
git clone <your-repository-url>
cd reddit_stock_analyzer
```
### 3. Set Up a Python Virtual Environment
It is highly recommended to use a virtual environment to manage dependencies.
**On macOS / Linux:**
```bash
python3 -m venv .venv
source .venv/bin/activate
```
**On Windows:**
```bash
python -m venv .venv
.\.venv\Scripts\activate
```
### 4. Install Dependencies
```bash
pip install -r requirements.txt
```
### 5. Configure Reddit API Credentials
The tool needs API access to read data from Reddit.
1. Go to the [Reddit Apps preferences page](https://www.reddit.com/prefs/apps) and create a new "script" app.
2. Create a file named `.env` in the root of the project directory.
3. Add your credentials to the `.env` file like this:
```
REDDIT_CLIENT_ID=your_client_id_from_reddit
REDDIT_CLIENT_SECRET=your_client_secret_from_reddit
REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.0)
```
**IMPORTANT:** Never commit your `.env` file to version control.
### 6. Set Up NLTK
Run the included setup script **once** to download the required `vader_lexicon` for sentiment analysis.
```bash
python rstat_tool/setup_nltk.py
```
### 7. Build and Install the `rstat` Command
Install the tool in "editable" mode. This creates the `rstat` command in your virtual environment and links it to your source code. Any changes you make to the code will be immediately available.
```bash
pip install -e .
```
The installation is now complete.
---
## Configuration
### Subreddits
Modify the `subreddits.json` file to define which communities the tool should scan.
```json
{
"subreddits": [
"wallstreetbets",
"stocks",
"investing",
"options"
]
}
```
### Ticker Blacklist (Advanced)
To improve data quality, you can add common words that are mistaken for tickers to the `COMMON_WORDS_BLACKLIST` set inside the `rstat_tool/ticker_extractor.py` file. The tool will automatically clean the database of these tickers on the next run.
---
## Usage
Once installed, you can run the tool from any directory using the `rstat` command.
### Basic Usage
Run an analysis using the default settings (scans 25 posts, 100 comments/post, shows top 20 tickers).
```bash
rstat subreddits.json
```
### Advanced Usage with Arguments
Use command-line arguments to control the scan and the report.
```bash
# Scan only 10 posts, 50 comments per post, and show a report of the top 5 tickers
rstat subreddits.json --posts 10 --comments 50 --limit 5
```
### Getting Help
To see all available commands and their descriptions:
```bash
rstat --help
```
### Example Output
```
--- Top 5 Tickers by Mention Count ---
Ticker | Mentions | Bullish | Bearish | Neutral | Market Cap
---------------------------------------------------------------------------
TSLA | 183 | 95 | 48 | 40 | $580.45B
NVDA | 155 | 110 | 15 | 30 | $1.15T
AAPL | 98 | 50 | 21 | 27 | $2.78T
SPY | 76 | 30 | 35 | 11 | N/A
AMD | 62 | 45 | 8 | 9 | $175.12B
```