Update doc.
This commit is contained in:
134
README.md
134
README.md
@@ -1,17 +1,29 @@
|
|||||||
# rstat - Reddit Stock Analyzer Tool
|
# rstat - Reddit Stock Analyzer
|
||||||
|
|
||||||
A powerful, installable command-line tool to scan Reddit for stock ticker mentions, perform sentiment analysis, and generate insightful summary reports.
|
A powerful, installable command-line tool and web dashboard to scan Reddit for stock ticker mentions, perform sentiment analysis, generate insightful reports, and create shareable summary images.
|
||||||
|
|
||||||
## Key Features
|
## Key Features
|
||||||
|
|
||||||
* **Persistent Storage:** Scraped data is stored in a local SQLite database (`reddit_stocks.db`), so you can track trends over time.
|
* **Dual-Interface:** Use a flexible command-line tool (`rstat`) for data collection and a simple web dashboard (`rstat-dashboard`) for data visualization.
|
||||||
* **Deep Scanning:** Analyzes both post titles and comments from a user-defined list of subreddits.
|
* **Flexible Data Scraping:**
|
||||||
* **Sentiment Analysis:** Uses NLTK's VADER engine to calculate a sentiment score (Bullish, Bearish, or Neutral) for each mention.
|
* Scan subreddits from a config file or target a single subreddit on the fly.
|
||||||
* **Financial Data:** Enriches ticker data by fetching market capitalization from Yahoo Finance, with intelligent caching to minimize API calls.
|
* Configure the time window to scan posts from the last 24 hours (for daily cron jobs) or back-fill data from several past days (e.g., last 7 days).
|
||||||
* **Data Quality:** Utilizes a configurable blacklist and smart filtering to ignore common words and reduce false positives (e.g., "YOLO", "CEO", "A").
|
* Fetches from `/new` to capture the most recent discussions.
|
||||||
* **Automatic Cleanup:** Automatically purges old, invalid data from the database if you update the ticker blacklist.
|
* **Deep Analysis & Storage:**
|
||||||
* **Installable Command:** Packaged with `setuptools`, allowing you to install the tool and run it from anywhere on your system using the `rstat` command.
|
* Scans both post titles and comments, differentiating between the two.
|
||||||
* **Flexible Reporting:** The final report can be customized using command-line arguments to control the number of results shown.
|
* Performs a "deep dive" analysis on posts to calculate the average sentiment of the entire comment section.
|
||||||
|
* Persists all data in a local SQLite database (`reddit_stocks.db`) to track trends over time.
|
||||||
|
* **Rich Data Enrichment:**
|
||||||
|
* Calculates sentiment (Bullish, Bearish, Neutral) for every mention using NLTK.
|
||||||
|
* Fetches and stores daily closing prices and market capitalization from Yahoo Finance.
|
||||||
|
* **Interactive Web Dashboard:**
|
||||||
|
* View Top 10 tickers across all subreddits or on a per-subreddit basis.
|
||||||
|
* Click any ticker to get a "Deep Dive" page, showing every post it was mentioned in.
|
||||||
|
* **Shareable Summary Images:**
|
||||||
|
* Generate clean, dark-mode summary images for both daily and weekly sentiment for any subreddit, perfect for sharing.
|
||||||
|
* **High-Quality Data:**
|
||||||
|
* Uses a configurable blacklist and smart filtering to reduce false positives.
|
||||||
|
* Automatically cleans the database of invalid tickers if the blacklist is updated.
|
||||||
|
|
||||||
## Project Structure
|
## Project Structure
|
||||||
|
|
||||||
@@ -20,14 +32,19 @@ reddit_stock_analyzer/
|
|||||||
├── .env # Your secret API keys
|
├── .env # Your secret API keys
|
||||||
├── requirements.txt # Project dependencies
|
├── requirements.txt # Project dependencies
|
||||||
├── setup.py # Installation script for the tool
|
├── setup.py # Installation script for the tool
|
||||||
├── subreddits.json # Configuration for which subreddits to scan
|
├── subreddits.json # Default list of subreddits to scan
|
||||||
├── rstat_tool/ # The main source code package
|
├── templates/ # HTML templates for the web dashboard
|
||||||
│ ├── __init__.py
|
│ ├── base.html
|
||||||
│ ├── main.py # Main entry point and CLI logic
|
│ ├── index.html
|
||||||
│ ├── database.py # All SQLite database functions
|
│ ├── subreddit.html
|
||||||
│ ├── sentiment_analyzer.py
|
│ ├── deep_dive.html
|
||||||
│ ├── setup_nltk.py # One-time NLTK setup script
|
│ ├── image_view.html
|
||||||
│ └── ticker_extractor.py
|
│ └── weekly_image_view.html
|
||||||
|
└── rstat_tool/ # The main source code package
|
||||||
|
├── __init__.py
|
||||||
|
├── main.py # Scraper entry point and CLI logic
|
||||||
|
├── dashboard.py # Web dashboard entry point (Flask app)
|
||||||
|
├── database.py # All SQLite database functions
|
||||||
└── ...
|
└── ...
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -66,8 +83,6 @@ pip install -r requirements.txt
|
|||||||
```
|
```
|
||||||
|
|
||||||
### 5. Configure Reddit API Credentials
|
### 5. Configure Reddit API Credentials
|
||||||
The tool needs API access to read data from Reddit.
|
|
||||||
|
|
||||||
1. Go to the [Reddit Apps preferences page](https://www.reddit.com/prefs/apps) and create a new "script" app.
|
1. Go to the [Reddit Apps preferences page](https://www.reddit.com/prefs/apps) and create a new "script" app.
|
||||||
2. Create a file named `.env` in the root of the project directory.
|
2. Create a file named `.env` in the root of the project directory.
|
||||||
3. Add your credentials to the `.env` file like this:
|
3. Add your credentials to the `.env` file like this:
|
||||||
@@ -75,9 +90,8 @@ The tool needs API access to read data from Reddit.
|
|||||||
```
|
```
|
||||||
REDDIT_CLIENT_ID=your_client_id_from_reddit
|
REDDIT_CLIENT_ID=your_client_id_from_reddit
|
||||||
REDDIT_CLIENT_SECRET=your_client_secret_from_reddit
|
REDDIT_CLIENT_SECRET=your_client_secret_from_reddit
|
||||||
REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.0)
|
REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.2)
|
||||||
```
|
```
|
||||||
**IMPORTANT:** Never commit your `.env` file to version control.
|
|
||||||
|
|
||||||
### 6. Set Up NLTK
|
### 6. Set Up NLTK
|
||||||
Run the included setup script **once** to download the required `vader_lexicon` for sentiment analysis.
|
Run the included setup script **once** to download the required `vader_lexicon` for sentiment analysis.
|
||||||
@@ -85,8 +99,8 @@ Run the included setup script **once** to download the required `vader_lexicon`
|
|||||||
python rstat_tool/setup_nltk.py
|
python rstat_tool/setup_nltk.py
|
||||||
```
|
```
|
||||||
|
|
||||||
### 7. Build and Install the `rstat` Command
|
### 7. Build and Install the Commands
|
||||||
Install the tool in "editable" mode. This creates the `rstat` command in your virtual environment and links it to your source code. Any changes you make to the code will be immediately available.
|
Install the tool in "editable" mode. This creates the `rstat` and `rstat-dashboard` commands in your virtual environment and links them to your source code.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install -e .
|
pip install -e .
|
||||||
@@ -95,60 +109,50 @@ The installation is now complete.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Configuration
|
|
||||||
|
|
||||||
### Subreddits
|
|
||||||
Modify the `subreddits.json` file to define which communities the tool should scan.
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"subreddits": [
|
|
||||||
"wallstreetbets",
|
|
||||||
"stocks",
|
|
||||||
"investing",
|
|
||||||
"options"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Ticker Blacklist (Advanced)
|
|
||||||
To improve data quality, you can add common words that are mistaken for tickers to the `COMMON_WORDS_BLACKLIST` set inside the `rstat_tool/ticker_extractor.py` file. The tool will automatically clean the database of these tickers on the next run.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
Once installed, you can run the tool from any directory using the `rstat` command.
|
The tool is split into two commands: one for gathering data and one for viewing it.
|
||||||
|
|
||||||
### Basic Usage
|
### 1. The Scraper (`rstat`)
|
||||||
Run an analysis using the default settings (scans 25 posts, 100 comments/post, shows top 20 tickers).
|
|
||||||
|
|
||||||
|
This is the command-line tool you will use to populate the database. It is highly flexible.
|
||||||
|
|
||||||
|
**Common Commands:**
|
||||||
|
|
||||||
|
* **Run a daily scan (for cron jobs):** Scans subreddits from `subreddits.json` for posts in the last 24 hours.
|
||||||
```bash
|
```bash
|
||||||
rstat subreddits.json
|
rstat --config subreddits.json --days 1
|
||||||
```
|
```
|
||||||
|
|
||||||
### Advanced Usage with Arguments
|
* **Scan a single subreddit:** Ignores the config file and scans just one subreddit.
|
||||||
Use command-line arguments to control the scan and the report.
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Scan only 10 posts, 50 comments per post, and show a report of the top 5 tickers
|
rstat --subreddit wallstreetbets --days 1
|
||||||
rstat subreddits.json --posts 10 --comments 50 --limit 5
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Getting Help
|
* **Back-fill data for last week:** Scans a specific subreddit for all new posts in the last 7 days.
|
||||||
To see all available commands and their descriptions:
|
```bash
|
||||||
|
rstat --subreddit Tollbugatabets --days 7
|
||||||
|
```
|
||||||
|
|
||||||
|
* **Get help and see all options:**
|
||||||
```bash
|
```bash
|
||||||
rstat --help
|
rstat --help
|
||||||
```
|
```
|
||||||
|
|
||||||
### Example Output
|
### 2. The Web Dashboard (`rstat-dashboard`)
|
||||||
|
|
||||||
|
This command starts a local web server to let you explore the data you've collected.
|
||||||
|
|
||||||
|
**How to Run:**
|
||||||
|
1. Make sure you have run the `rstat` scraper at least once to populate the database.
|
||||||
|
2. Start the web server:
|
||||||
|
```bash
|
||||||
|
rstat-dashboard
|
||||||
```
|
```
|
||||||
--- Top 5 Tickers by Mention Count ---
|
3. Open your web browser and navigate to **http://127.0.0.1:5000**.
|
||||||
Ticker | Mentions | Bullish | Bearish | Neutral | Market Cap
|
|
||||||
---------------------------------------------------------------------------
|
**Dashboard Features:**
|
||||||
TSLA | 183 | 95 | 48 | 40 | $580.45B
|
* **Main Page:** Shows the Top 10 most mentioned tickers across all scanned subreddits.
|
||||||
NVDA | 155 | 110 | 15 | 30 | $1.15T
|
* **Subreddit Pages:** Click any subreddit in the navigation bar to see a dashboard specific to that community.
|
||||||
AAPL | 98 | 50 | 21 | 27 | $2.78T
|
* **Deep Dive:** In any table, click on a ticker's symbol to see a detailed breakdown of every post it was mentioned in.
|
||||||
SPY | 76 | 30 | 35 | 11 | N/A
|
* **Shareable Images:** On a subreddit's page, click "(View Daily Image)" or "(View Weekly Image)" to generate a polished, shareable summary card.
|
||||||
AMD | 62 | 45 | 8 | 9 | $175.12B
|
|
||||||
```
|
|
Reference in New Issue
Block a user