Improve doc.

This commit is contained in:
2025-07-21 15:46:41 +02:00
parent 8f385733ed
commit 03e6e56a35

179
README.md
View File

@@ -1,107 +1,154 @@
# rstat - Reddit Stock Ticker Analyzer Tool
# rstat - Reddit Stock Analyzer Tool
This is a command-line tool to analyze stock ticker mentions across a predefined list of subreddits. It scrapes posts and comments, counts the number of times each ticker is mentioned, fetches the ticker's market capitalization, and will calculate a sentiment score for each mention.
A powerful, installable command-line tool to scan Reddit for stock ticker mentions, perform sentiment analysis, and generate insightful summary reports.
## Features
## Key Features
* Scans a user-defined list of subreddits from a JSON configuration file.
* Identifies stock tickers (e.g., `$AAPL`, `TSLA`) in Reddit posts and comments.
* Fetches market capitalization for each identified ticker using the Yahoo Finance API.
* Summarizes the findings in a clear, command-line-based report.
* (Future) Performs sentiment analysis on each mention.
* **Persistent Storage:** Scraped data is stored in a local SQLite database (`reddit_stocks.db`), so you can track trends over time.
* **Deep Scanning:** Analyzes both post titles and comments from a user-defined list of subreddits.
* **Sentiment Analysis:** Uses NLTK's VADER engine to calculate a sentiment score (Bullish, Bearish, or Neutral) for each mention.
* **Financial Data:** Enriches ticker data by fetching market capitalization from Yahoo Finance, with intelligent caching to minimize API calls.
* **Data Quality:** Utilizes a configurable blacklist and smart filtering to ignore common words and reduce false positives (e.g., "YOLO", "CEO", "A").
* **Automatic Cleanup:** Automatically purges old, invalid data from the database if you update the ticker blacklist.
* **Installable Command:** Packaged with `setuptools`, allowing you to install the tool and run it from anywhere on your system using the `rstat` command.
* **Flexible Reporting:** The final report can be customized using command-line arguments to control the number of results shown.
## Installation
## Project Structure
Follow these steps to set up the project and its dependencies on your local machine.
```
reddit_stock_analyzer/
├── .env # Your secret API keys
├── requirements.txt # Project dependencies
├── setup.py # Installation script for the tool
├── subreddits.json # Configuration for which subreddits to scan
├── rstat_tool/ # The main source code package
│ ├── __init__.py
│ ├── main.py # Main entry point and CLI logic
│ ├── database.py # All SQLite database functions
│ ├── sentiment_analyzer.py
│ ├── setup_nltk.py # One-time NLTK setup script
│ └── ticker_extractor.py
└── ...
```
### 1. Clone the Repository
## Setup and Installation
First, clone this repository to your local machine (or simply download and create the files as described).
Follow these steps to set up the project on your local machine.
### 1. Prerequisites
* Python 3.7+
* Git
### 2. Clone the Repository
```bash
git clone <your-repository-url>
cd rstat
cd reddit_stock_analyzer
```
### 2. Set Up a Python Virtual Environment
It is highly recommended to use a virtual environment to manage project-specific dependencies, preventing conflicts with your global Python installation.
### 3. Set Up a Python Virtual Environment
It is highly recommended to use a virtual environment to manage dependencies.
**On macOS / Linux:**
```bash
# Create a virtual environment named 'venv'
python3 -m venv venv
# Activate the virtual environment
source venv/bin/activate
python3 -m venv .venv
source .venv/bin/activate
```
*You will know it's active when you see `(venv)` at the beginning of your terminal prompt.*
**On Windows:**
```bash
# Create a virtual environment named 'venv'
python -m venv venv
# Activate the virtual environment
.\venv\Scripts\activate
python -m venv .venv
.\.venv\Scripts\activate
```
*You will know it's active when you see `(venv)` at the beginning of your command prompt.*
### 3. Install Dependencies
Once your virtual environment is activated, install the required Python libraries using the `requirements.txt` file.
### 4. Install Dependencies
```bash
pip install -r requirements.txt
```
### 5. Configure Reddit API Credentials
The tool needs API access to read data from Reddit.
1. Go to the [Reddit Apps preferences page](https://www.reddit.com/prefs/apps) and create a new "script" app.
2. Create a file named `.env` in the root of the project directory.
3. Add your credentials to the `.env` file like this:
```
REDDIT_CLIENT_ID=your_client_id_from_reddit
REDDIT_CLIENT_SECRET=your_client_secret_from_reddit
REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.0)
```
**IMPORTANT:** Never commit your `.env` file to version control.
### 6. Set Up NLTK
Run the included setup script **once** to download the required `vader_lexicon` for sentiment analysis.
```bash
python rstat_tool/setup_nltk.py
```
### 7. Build and Install the `rstat` Command
Install the tool in "editable" mode. This creates the `rstat` command in your virtual environment and links it to your source code. Any changes you make to the code will be immediately available.
```bash
pip install -e .
```
The installation is now complete.
---
## Configuration
Before running the tool, you need to configure the list of subreddits you want to analyze.
1. Open the `subreddits.json` file.
2. Modify the list of strings to include your desired subreddits.
**Example `subreddits.json`:**
### Subreddits
Modify the `subreddits.json` file to define which communities the tool should scan.
```json
{
"subreddits": [
"wallstreetbets",
"stocks",
"investing",
"pennystocks"
"options"
]
}
```
## Usage
To run the tool, execute the `main.py` script from the root directory of the project, passing the path to your configuration file as an argument.
Make sure your virtual environment is activated before running the script.
```bash
python main.py subreddits.json
```
### Expected Output
The tool will first confirm the loaded subreddits and then proceed with its analysis, printing the results directly to the terminal.
```
Loading configuration...
Successfully loaded 4 subreddits: wallstreetbets, stocks, investing, pennystocks
------------------------------
Testing market data functionality...
Market Cap for AAPL: $2,912,488,124,416
------------------------------
Next up: Integrating the Reddit API to find tickers...
```
### Ticker Blacklist (Advanced)
To improve data quality, you can add common words that are mistaken for tickers to the `COMMON_WORDS_BLACKLIST` set inside the `rstat_tool/ticker_extractor.py` file. The tool will automatically clean the database of these tickers on the next run.
---
This `README.md` provides a clear and concise guide for anyone (including your future self) to get the project up and running quickly.
## Usage
We are now ready to move on to the next implementation step. Shall we proceed with integrating the Reddit API using PRAW?
Once installed, you can run the tool from any directory using the `rstat` command.
### Basic Usage
Run an analysis using the default settings (scans 25 posts, 100 comments/post, shows top 20 tickers).
```bash
rstat subreddits.json
```
### Advanced Usage with Arguments
Use command-line arguments to control the scan and the report.
```bash
# Scan only 10 posts, 50 comments per post, and show a report of the top 5 tickers
rstat subreddits.json --posts 10 --comments 50 --limit 5
```
### Getting Help
To see all available commands and their descriptions:
```bash
rstat --help
```
### Example Output
```
--- Top 5 Tickers by Mention Count ---
Ticker | Mentions | Bullish | Bearish | Neutral | Market Cap
---------------------------------------------------------------------------
TSLA | 183 | 95 | 48 | 40 | $580.45B
NVDA | 155 | 110 | 15 | 30 | $1.15T
AAPL | 98 | 50 | 21 | 27 | $2.78T
SPY | 76 | 30 | 35 | 11 | N/A
AMD | 62 | 45 | 8 | 9 | $175.12B
```