Update doc.

This commit is contained in:
2025-07-21 23:08:55 +02:00
parent d375f4ef38
commit 728fe43571

134
README.md
View File

@@ -1,17 +1,29 @@
# rstat - Reddit Stock Analyzer Tool # rstat - Reddit Stock Analyzer
A powerful, installable command-line tool to scan Reddit for stock ticker mentions, perform sentiment analysis, and generate insightful summary reports. A powerful, installable command-line tool and web dashboard to scan Reddit for stock ticker mentions, perform sentiment analysis, generate insightful reports, and create shareable summary images.
## Key Features ## Key Features
* **Persistent Storage:** Scraped data is stored in a local SQLite database (`reddit_stocks.db`), so you can track trends over time. * **Dual-Interface:** Use a flexible command-line tool (`rstat`) for data collection and a simple web dashboard (`rstat-dashboard`) for data visualization.
* **Deep Scanning:** Analyzes both post titles and comments from a user-defined list of subreddits. * **Flexible Data Scraping:**
* **Sentiment Analysis:** Uses NLTK's VADER engine to calculate a sentiment score (Bullish, Bearish, or Neutral) for each mention. * Scan subreddits from a config file or target a single subreddit on the fly.
* **Financial Data:** Enriches ticker data by fetching market capitalization from Yahoo Finance, with intelligent caching to minimize API calls. * Configure the time window to scan posts from the last 24 hours (for daily cron jobs) or back-fill data from several past days (e.g., last 7 days).
* **Data Quality:** Utilizes a configurable blacklist and smart filtering to ignore common words and reduce false positives (e.g., "YOLO", "CEO", "A"). * Fetches from `/new` to capture the most recent discussions.
* **Automatic Cleanup:** Automatically purges old, invalid data from the database if you update the ticker blacklist. * **Deep Analysis & Storage:**
* **Installable Command:** Packaged with `setuptools`, allowing you to install the tool and run it from anywhere on your system using the `rstat` command. * Scans both post titles and comments, differentiating between the two.
* **Flexible Reporting:** The final report can be customized using command-line arguments to control the number of results shown. * Performs a "deep dive" analysis on posts to calculate the average sentiment of the entire comment section.
* Persists all data in a local SQLite database (`reddit_stocks.db`) to track trends over time.
* **Rich Data Enrichment:**
* Calculates sentiment (Bullish, Bearish, Neutral) for every mention using NLTK.
* Fetches and stores daily closing prices and market capitalization from Yahoo Finance.
* **Interactive Web Dashboard:**
* View Top 10 tickers across all subreddits or on a per-subreddit basis.
* Click any ticker to get a "Deep Dive" page, showing every post it was mentioned in.
* **Shareable Summary Images:**
* Generate clean, dark-mode summary images for both daily and weekly sentiment for any subreddit, perfect for sharing.
* **High-Quality Data:**
* Uses a configurable blacklist and smart filtering to reduce false positives.
* Automatically cleans the database of invalid tickers if the blacklist is updated.
## Project Structure ## Project Structure
@@ -20,14 +32,19 @@ reddit_stock_analyzer/
├── .env # Your secret API keys ├── .env # Your secret API keys
├── requirements.txt # Project dependencies ├── requirements.txt # Project dependencies
├── setup.py # Installation script for the tool ├── setup.py # Installation script for the tool
├── subreddits.json # Configuration for which subreddits to scan ├── subreddits.json # Default list of subreddits to scan
├── rstat_tool/ # The main source code package ├── templates/ # HTML templates for the web dashboard
│ ├── __init__.py │ ├── base.html
│ ├── main.py # Main entry point and CLI logic │ ├── index.html
│ ├── database.py # All SQLite database functions │ ├── subreddit.html
│ ├── sentiment_analyzer.py │ ├── deep_dive.html
│ ├── setup_nltk.py # One-time NLTK setup script │ ├── image_view.html
│ └── ticker_extractor.py │ └── weekly_image_view.html
└── rstat_tool/ # The main source code package
├── __init__.py
├── main.py # Scraper entry point and CLI logic
├── dashboard.py # Web dashboard entry point (Flask app)
├── database.py # All SQLite database functions
└── ... └── ...
``` ```
@@ -66,8 +83,6 @@ pip install -r requirements.txt
``` ```
### 5. Configure Reddit API Credentials ### 5. Configure Reddit API Credentials
The tool needs API access to read data from Reddit.
1. Go to the [Reddit Apps preferences page](https://www.reddit.com/prefs/apps) and create a new "script" app. 1. Go to the [Reddit Apps preferences page](https://www.reddit.com/prefs/apps) and create a new "script" app.
2. Create a file named `.env` in the root of the project directory. 2. Create a file named `.env` in the root of the project directory.
3. Add your credentials to the `.env` file like this: 3. Add your credentials to the `.env` file like this:
@@ -75,9 +90,8 @@ The tool needs API access to read data from Reddit.
``` ```
REDDIT_CLIENT_ID=your_client_id_from_reddit REDDIT_CLIENT_ID=your_client_id_from_reddit
REDDIT_CLIENT_SECRET=your_client_secret_from_reddit REDDIT_CLIENT_SECRET=your_client_secret_from_reddit
REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.0) REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.2)
``` ```
**IMPORTANT:** Never commit your `.env` file to version control.
### 6. Set Up NLTK ### 6. Set Up NLTK
Run the included setup script **once** to download the required `vader_lexicon` for sentiment analysis. Run the included setup script **once** to download the required `vader_lexicon` for sentiment analysis.
@@ -85,8 +99,8 @@ Run the included setup script **once** to download the required `vader_lexicon`
python rstat_tool/setup_nltk.py python rstat_tool/setup_nltk.py
``` ```
### 7. Build and Install the `rstat` Command ### 7. Build and Install the Commands
Install the tool in "editable" mode. This creates the `rstat` command in your virtual environment and links it to your source code. Any changes you make to the code will be immediately available. Install the tool in "editable" mode. This creates the `rstat` and `rstat-dashboard` commands in your virtual environment and links them to your source code.
```bash ```bash
pip install -e . pip install -e .
@@ -95,60 +109,50 @@ The installation is now complete.
--- ---
## Configuration
### Subreddits
Modify the `subreddits.json` file to define which communities the tool should scan.
```json
{
"subreddits": [
"wallstreetbets",
"stocks",
"investing",
"options"
]
}
```
### Ticker Blacklist (Advanced)
To improve data quality, you can add common words that are mistaken for tickers to the `COMMON_WORDS_BLACKLIST` set inside the `rstat_tool/ticker_extractor.py` file. The tool will automatically clean the database of these tickers on the next run.
---
## Usage ## Usage
Once installed, you can run the tool from any directory using the `rstat` command. The tool is split into two commands: one for gathering data and one for viewing it.
### Basic Usage ### 1. The Scraper (`rstat`)
Run an analysis using the default settings (scans 25 posts, 100 comments/post, shows top 20 tickers).
This is the command-line tool you will use to populate the database. It is highly flexible.
**Common Commands:**
* **Run a daily scan (for cron jobs):** Scans subreddits from `subreddits.json` for posts in the last 24 hours.
```bash ```bash
rstat subreddits.json rstat --config subreddits.json --days 1
``` ```
### Advanced Usage with Arguments * **Scan a single subreddit:** Ignores the config file and scans just one subreddit.
Use command-line arguments to control the scan and the report.
```bash ```bash
# Scan only 10 posts, 50 comments per post, and show a report of the top 5 tickers rstat --subreddit wallstreetbets --days 1
rstat subreddits.json --posts 10 --comments 50 --limit 5
``` ```
### Getting Help * **Back-fill data for last week:** Scans a specific subreddit for all new posts in the last 7 days.
To see all available commands and their descriptions: ```bash
rstat --subreddit Tollbugatabets --days 7
```
* **Get help and see all options:**
```bash ```bash
rstat --help rstat --help
``` ```
### Example Output ### 2. The Web Dashboard (`rstat-dashboard`)
This command starts a local web server to let you explore the data you've collected.
**How to Run:**
1. Make sure you have run the `rstat` scraper at least once to populate the database.
2. Start the web server:
```bash
rstat-dashboard
``` ```
--- Top 5 Tickers by Mention Count --- 3. Open your web browser and navigate to **http://127.0.0.1:5000**.
Ticker | Mentions | Bullish | Bearish | Neutral | Market Cap
--------------------------------------------------------------------------- **Dashboard Features:**
TSLA | 183 | 95 | 48 | 40 | $580.45B * **Main Page:** Shows the Top 10 most mentioned tickers across all scanned subreddits.
NVDA | 155 | 110 | 15 | 30 | $1.15T * **Subreddit Pages:** Click any subreddit in the navigation bar to see a dashboard specific to that community.
AAPL | 98 | 50 | 21 | 27 | $2.78T * **Deep Dive:** In any table, click on a ticker's symbol to see a detailed breakdown of every post it was mentioned in.
SPY | 76 | 30 | 35 | 11 | N/A * **Shareable Images:** On a subreddit's page, click "(View Daily Image)" or "(View Weekly Image)" to generate a polished, shareable summary card.
AMD | 62 | 45 | 8 | 9 | $175.12B
```