# rstat - Reddit Stock Analyzer Tool A powerful, installable command-line tool to scan Reddit for stock ticker mentions, perform sentiment analysis, and generate insightful summary reports. ## Key Features * **Persistent Storage:** Scraped data is stored in a local SQLite database (`reddit_stocks.db`), so you can track trends over time. * **Deep Scanning:** Analyzes both post titles and comments from a user-defined list of subreddits. * **Sentiment Analysis:** Uses NLTK's VADER engine to calculate a sentiment score (Bullish, Bearish, or Neutral) for each mention. * **Financial Data:** Enriches ticker data by fetching market capitalization from Yahoo Finance, with intelligent caching to minimize API calls. * **Data Quality:** Utilizes a configurable blacklist and smart filtering to ignore common words and reduce false positives (e.g., "YOLO", "CEO", "A"). * **Automatic Cleanup:** Automatically purges old, invalid data from the database if you update the ticker blacklist. * **Installable Command:** Packaged with `setuptools`, allowing you to install the tool and run it from anywhere on your system using the `rstat` command. * **Flexible Reporting:** The final report can be customized using command-line arguments to control the number of results shown. ## Project Structure ``` reddit_stock_analyzer/ ├── .env # Your secret API keys ├── requirements.txt # Project dependencies ├── setup.py # Installation script for the tool ├── subreddits.json # Configuration for which subreddits to scan ├── rstat_tool/ # The main source code package │ ├── __init__.py │ ├── main.py # Main entry point and CLI logic │ ├── database.py # All SQLite database functions │ ├── sentiment_analyzer.py │ ├── setup_nltk.py # One-time NLTK setup script │ └── ticker_extractor.py └── ... ``` ## Setup and Installation Follow these steps to set up the project on your local machine. ### 1. Prerequisites * Python 3.7+ * Git ### 2. Clone the Repository ```bash git clone cd reddit_stock_analyzer ``` ### 3. Set Up a Python Virtual Environment It is highly recommended to use a virtual environment to manage dependencies. **On macOS / Linux:** ```bash python3 -m venv .venv source .venv/bin/activate ``` **On Windows:** ```bash python -m venv .venv .\.venv\Scripts\activate ``` ### 4. Install Dependencies ```bash pip install -r requirements.txt ``` ### 5. Configure Reddit API Credentials The tool needs API access to read data from Reddit. 1. Go to the [Reddit Apps preferences page](https://www.reddit.com/prefs/apps) and create a new "script" app. 2. Create a file named `.env` in the root of the project directory. 3. Add your credentials to the `.env` file like this: ``` REDDIT_CLIENT_ID=your_client_id_from_reddit REDDIT_CLIENT_SECRET=your_client_secret_from_reddit REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.0) ``` **IMPORTANT:** Never commit your `.env` file to version control. ### 6. Set Up NLTK Run the included setup script **once** to download the required `vader_lexicon` for sentiment analysis. ```bash python rstat_tool/setup_nltk.py ``` ### 7. Build and Install the `rstat` Command Install the tool in "editable" mode. This creates the `rstat` command in your virtual environment and links it to your source code. Any changes you make to the code will be immediately available. ```bash pip install -e . ``` The installation is now complete. --- ## Configuration ### Subreddits Modify the `subreddits.json` file to define which communities the tool should scan. ```json { "subreddits": [ "wallstreetbets", "stocks", "investing", "options" ] } ``` ### Ticker Blacklist (Advanced) To improve data quality, you can add common words that are mistaken for tickers to the `COMMON_WORDS_BLACKLIST` set inside the `rstat_tool/ticker_extractor.py` file. The tool will automatically clean the database of these tickers on the next run. --- ## Usage Once installed, you can run the tool from any directory using the `rstat` command. ### Basic Usage Run an analysis using the default settings (scans 25 posts, 100 comments/post, shows top 20 tickers). ```bash rstat subreddits.json ``` ### Advanced Usage with Arguments Use command-line arguments to control the scan and the report. ```bash # Scan only 10 posts, 50 comments per post, and show a report of the top 5 tickers rstat subreddits.json --posts 10 --comments 50 --limit 5 ``` ### Getting Help To see all available commands and their descriptions: ```bash rstat --help ``` ### Example Output ``` --- Top 5 Tickers by Mention Count --- Ticker | Mentions | Bullish | Bearish | Neutral | Market Cap --------------------------------------------------------------------------- TSLA | 183 | 95 | 48 | 40 | $580.45B NVDA | 155 | 110 | 15 | 30 | $1.15T AAPL | 98 | 50 | 21 | 27 | $2.78T SPY | 76 | 30 | 35 | 11 | N/A AMD | 62 | 45 | 8 | 9 | $175.12B ```