diff --git a/README.md b/README.md index 3b377f6..606c27b 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,29 @@ -# rstat - Reddit Stock Analyzer Tool +# rstat - Reddit Stock Analyzer -A powerful, installable command-line tool to scan Reddit for stock ticker mentions, perform sentiment analysis, and generate insightful summary reports. +A powerful, installable command-line tool and web dashboard to scan Reddit for stock ticker mentions, perform sentiment analysis, generate insightful reports, and create shareable summary images. ## Key Features -* **Persistent Storage:** Scraped data is stored in a local SQLite database (`reddit_stocks.db`), so you can track trends over time. -* **Deep Scanning:** Analyzes both post titles and comments from a user-defined list of subreddits. -* **Sentiment Analysis:** Uses NLTK's VADER engine to calculate a sentiment score (Bullish, Bearish, or Neutral) for each mention. -* **Financial Data:** Enriches ticker data by fetching market capitalization from Yahoo Finance, with intelligent caching to minimize API calls. -* **Data Quality:** Utilizes a configurable blacklist and smart filtering to ignore common words and reduce false positives (e.g., "YOLO", "CEO", "A"). -* **Automatic Cleanup:** Automatically purges old, invalid data from the database if you update the ticker blacklist. -* **Installable Command:** Packaged with `setuptools`, allowing you to install the tool and run it from anywhere on your system using the `rstat` command. -* **Flexible Reporting:** The final report can be customized using command-line arguments to control the number of results shown. +* **Dual-Interface:** Use a flexible command-line tool (`rstat`) for data collection and a simple web dashboard (`rstat-dashboard`) for data visualization. +* **Flexible Data Scraping:** + * Scan subreddits from a config file or target a single subreddit on the fly. + * Configure the time window to scan posts from the last 24 hours (for daily cron jobs) or back-fill data from several past days (e.g., last 7 days). + * Fetches from `/new` to capture the most recent discussions. +* **Deep Analysis & Storage:** + * Scans both post titles and comments, differentiating between the two. + * Performs a "deep dive" analysis on posts to calculate the average sentiment of the entire comment section. + * Persists all data in a local SQLite database (`reddit_stocks.db`) to track trends over time. +* **Rich Data Enrichment:** + * Calculates sentiment (Bullish, Bearish, Neutral) for every mention using NLTK. + * Fetches and stores daily closing prices and market capitalization from Yahoo Finance. +* **Interactive Web Dashboard:** + * View Top 10 tickers across all subreddits or on a per-subreddit basis. + * Click any ticker to get a "Deep Dive" page, showing every post it was mentioned in. +* **Shareable Summary Images:** + * Generate clean, dark-mode summary images for both daily and weekly sentiment for any subreddit, perfect for sharing. +* **High-Quality Data:** + * Uses a configurable blacklist and smart filtering to reduce false positives. + * Automatically cleans the database of invalid tickers if the blacklist is updated. ## Project Structure @@ -20,15 +32,20 @@ reddit_stock_analyzer/ ├── .env # Your secret API keys ├── requirements.txt # Project dependencies ├── setup.py # Installation script for the tool -├── subreddits.json # Configuration for which subreddits to scan -├── rstat_tool/ # The main source code package -│ ├── __init__.py -│ ├── main.py # Main entry point and CLI logic -│ ├── database.py # All SQLite database functions -│ ├── sentiment_analyzer.py -│ ├── setup_nltk.py # One-time NLTK setup script -│ └── ticker_extractor.py -└── ... +├── subreddits.json # Default list of subreddits to scan +├── templates/ # HTML templates for the web dashboard +│ ├── base.html +│ ├── index.html +│ ├── subreddit.html +│ ├── deep_dive.html +│ ├── image_view.html +│ └── weekly_image_view.html +└── rstat_tool/ # The main source code package + ├── __init__.py + ├── main.py # Scraper entry point and CLI logic + ├── dashboard.py # Web dashboard entry point (Flask app) + ├── database.py # All SQLite database functions + └── ... ``` ## Setup and Installation @@ -66,8 +83,6 @@ pip install -r requirements.txt ``` ### 5. Configure Reddit API Credentials -The tool needs API access to read data from Reddit. - 1. Go to the [Reddit Apps preferences page](https://www.reddit.com/prefs/apps) and create a new "script" app. 2. Create a file named `.env` in the root of the project directory. 3. Add your credentials to the `.env` file like this: @@ -75,9 +90,8 @@ The tool needs API access to read data from Reddit. ``` REDDIT_CLIENT_ID=your_client_id_from_reddit REDDIT_CLIENT_SECRET=your_client_secret_from_reddit - REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.0) + REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.2) ``` - **IMPORTANT:** Never commit your `.env` file to version control. ### 6. Set Up NLTK Run the included setup script **once** to download the required `vader_lexicon` for sentiment analysis. @@ -85,8 +99,8 @@ Run the included setup script **once** to download the required `vader_lexicon` python rstat_tool/setup_nltk.py ``` -### 7. Build and Install the `rstat` Command -Install the tool in "editable" mode. This creates the `rstat` command in your virtual environment and links it to your source code. Any changes you make to the code will be immediately available. +### 7. Build and Install the Commands +Install the tool in "editable" mode. This creates the `rstat` and `rstat-dashboard` commands in your virtual environment and links them to your source code. ```bash pip install -e . @@ -95,60 +109,50 @@ The installation is now complete. --- -## Configuration - -### Subreddits -Modify the `subreddits.json` file to define which communities the tool should scan. -```json -{ - "subreddits": [ - "wallstreetbets", - "stocks", - "investing", - "options" - ] -} -``` - -### Ticker Blacklist (Advanced) -To improve data quality, you can add common words that are mistaken for tickers to the `COMMON_WORDS_BLACKLIST` set inside the `rstat_tool/ticker_extractor.py` file. The tool will automatically clean the database of these tickers on the next run. - ---- - ## Usage -Once installed, you can run the tool from any directory using the `rstat` command. +The tool is split into two commands: one for gathering data and one for viewing it. -### Basic Usage -Run an analysis using the default settings (scans 25 posts, 100 comments/post, shows top 20 tickers). +### 1. The Scraper (`rstat`) -```bash -rstat subreddits.json -``` +This is the command-line tool you will use to populate the database. It is highly flexible. -### Advanced Usage with Arguments -Use command-line arguments to control the scan and the report. +**Common Commands:** -```bash -# Scan only 10 posts, 50 comments per post, and show a report of the top 5 tickers -rstat subreddits.json --posts 10 --comments 50 --limit 5 -``` +* **Run a daily scan (for cron jobs):** Scans subreddits from `subreddits.json` for posts in the last 24 hours. + ```bash + rstat --config subreddits.json --days 1 + ``` -### Getting Help -To see all available commands and their descriptions: -```bash -rstat --help -``` +* **Scan a single subreddit:** Ignores the config file and scans just one subreddit. + ```bash + rstat --subreddit wallstreetbets --days 1 + ``` -### Example Output +* **Back-fill data for last week:** Scans a specific subreddit for all new posts in the last 7 days. + ```bash + rstat --subreddit Tollbugatabets --days 7 + ``` -``` ---- Top 5 Tickers by Mention Count --- -Ticker | Mentions | Bullish | Bearish | Neutral | Market Cap ---------------------------------------------------------------------------- -TSLA | 183 | 95 | 48 | 40 | $580.45B -NVDA | 155 | 110 | 15 | 30 | $1.15T -AAPL | 98 | 50 | 21 | 27 | $2.78T -SPY | 76 | 30 | 35 | 11 | N/A -AMD | 62 | 45 | 8 | 9 | $175.12B -``` \ No newline at end of file +* **Get help and see all options:** + ```bash + rstat --help + ``` + +### 2. The Web Dashboard (`rstat-dashboard`) + +This command starts a local web server to let you explore the data you've collected. + +**How to Run:** +1. Make sure you have run the `rstat` scraper at least once to populate the database. +2. Start the web server: + ```bash + rstat-dashboard + ``` +3. Open your web browser and navigate to **http://127.0.0.1:5000**. + +**Dashboard Features:** +* **Main Page:** Shows the Top 10 most mentioned tickers across all scanned subreddits. +* **Subreddit Pages:** Click any subreddit in the navigation bar to see a dashboard specific to that community. +* **Deep Dive:** In any table, click on a ticker's symbol to see a detailed breakdown of every post it was mentioned in. +* **Shareable Images:** On a subreddit's page, click "(View Daily Image)" or "(View Weekly Image)" to generate a polished, shareable summary card. \ No newline at end of file