2025-07-21 23:06:29 +02:00
2025-07-21 23:06:29 +02:00
2025-07-21 15:49:15 +02:00
2025-07-21 15:46:41 +02:00
2025-07-21 20:13:18 +02:00
2025-07-21 20:13:18 +02:00
2025-07-21 23:06:29 +02:00

rstat - Reddit Stock Analyzer Tool

A powerful, installable command-line tool to scan Reddit for stock ticker mentions, perform sentiment analysis, and generate insightful summary reports.

Key Features

  • Persistent Storage: Scraped data is stored in a local SQLite database (reddit_stocks.db), so you can track trends over time.
  • Deep Scanning: Analyzes both post titles and comments from a user-defined list of subreddits.
  • Sentiment Analysis: Uses NLTK's VADER engine to calculate a sentiment score (Bullish, Bearish, or Neutral) for each mention.
  • Financial Data: Enriches ticker data by fetching market capitalization from Yahoo Finance, with intelligent caching to minimize API calls.
  • Data Quality: Utilizes a configurable blacklist and smart filtering to ignore common words and reduce false positives (e.g., "YOLO", "CEO", "A").
  • Automatic Cleanup: Automatically purges old, invalid data from the database if you update the ticker blacklist.
  • Installable Command: Packaged with setuptools, allowing you to install the tool and run it from anywhere on your system using the rstat command.
  • Flexible Reporting: The final report can be customized using command-line arguments to control the number of results shown.

Project Structure

reddit_stock_analyzer/
├── .env                  # Your secret API keys
├── requirements.txt      # Project dependencies
├── setup.py              # Installation script for the tool
├── subreddits.json       # Configuration for which subreddits to scan
├── rstat_tool/           # The main source code package
│   ├── __init__.py
│   ├── main.py           # Main entry point and CLI logic
│   ├── database.py       # All SQLite database functions
│   ├── sentiment_analyzer.py
│   ├── setup_nltk.py     # One-time NLTK setup script
│   └── ticker_extractor.py
└── ...

Setup and Installation

Follow these steps to set up the project on your local machine.

1. Prerequisites

  • Python 3.7+
  • Git

2. Clone the Repository

git clone <your-repository-url>
cd reddit_stock_analyzer

3. Set Up a Python Virtual Environment

It is highly recommended to use a virtual environment to manage dependencies.

On macOS / Linux:

python3 -m venv .venv
source .venv/bin/activate

On Windows:

python -m venv .venv
.\.venv\Scripts\activate

4. Install Dependencies

pip install -r requirements.txt

5. Configure Reddit API Credentials

The tool needs API access to read data from Reddit.

  1. Go to the Reddit Apps preferences page and create a new "script" app.

  2. Create a file named .env in the root of the project directory.

  3. Add your credentials to the .env file like this:

    REDDIT_CLIENT_ID=your_client_id_from_reddit
    REDDIT_CLIENT_SECRET=your_client_secret_from_reddit
    REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.0)
    

    IMPORTANT: Never commit your .env file to version control.

6. Set Up NLTK

Run the included setup script once to download the required vader_lexicon for sentiment analysis.

python rstat_tool/setup_nltk.py

7. Build and Install the rstat Command

Install the tool in "editable" mode. This creates the rstat command in your virtual environment and links it to your source code. Any changes you make to the code will be immediately available.

pip install -e .

The installation is now complete.


Configuration

Subreddits

Modify the subreddits.json file to define which communities the tool should scan.

{
  "subreddits": [
    "wallstreetbets",
    "stocks",
    "investing",
    "options"
  ]
}

Ticker Blacklist (Advanced)

To improve data quality, you can add common words that are mistaken for tickers to the COMMON_WORDS_BLACKLIST set inside the rstat_tool/ticker_extractor.py file. The tool will automatically clean the database of these tickers on the next run.


Usage

Once installed, you can run the tool from any directory using the rstat command.

Basic Usage

Run an analysis using the default settings (scans 25 posts, 100 comments/post, shows top 20 tickers).

rstat subreddits.json

Advanced Usage with Arguments

Use command-line arguments to control the scan and the report.

# Scan only 10 posts, 50 comments per post, and show a report of the top 5 tickers
rstat subreddits.json --posts 10 --comments 50 --limit 5

Getting Help

To see all available commands and their descriptions:

rstat --help

Example Output

--- Top 5 Tickers by Mention Count ---
Ticker   | Mentions | Bullish  | Bearish  | Neutral  | Market Cap
---------------------------------------------------------------------------
TSLA     | 183      | 95       | 48       | 40       | $580.45B
NVDA     | 155      | 110      | 15       | 30       | $1.15T
AAPL     | 98       | 50       | 21       | 27       | $2.78T
SPY      | 76       | 30       | 35       | 11       | N/A
AMD      | 62       | 45       | 8        | 9        | $175.12B
Description
A powerful, installable command-line tool to scan Reddit for stock ticker mentions, perform sentiment analysis, and generate insightful summary reports.
Readme 804 KiB
Languages
Python 54.3%
CSS 27.6%
HTML 12%
VCL 4.5%
Shell 0.8%
Other 0.8%