Improve doc.

2025-07-21 15:46:41 +02:00
parent 8f385733ed
commit 03e6e56a35
1 changed files with 113 additions and 66 deletions
--- a/README.md
+++ b/README.md
@@ -1,107 +1,154 @@
-# rstat - Reddit Stock Ticker Analyzer Tool
+# rstat - Reddit Stock Analyzer Tool

-This is a command-line tool to analyze stock ticker mentions across a predefined list of subreddits. It scrapes posts and comments, counts the number of times each ticker is mentioned, fetches the ticker's market capitalization, and will calculate a sentiment score for each mention.
+A powerful, installable command-line tool to scan Reddit for stock ticker mentions, perform sentiment analysis, and generate insightful summary reports.

-## Features
+## Key Features

-*   Scans a user-defined list of subreddits from a JSON configuration file.
-*   Identifies stock tickers (e.g., `$AAPL`, `TSLA`) in Reddit posts and comments.
-*   Fetches market capitalization for each identified ticker using the Yahoo Finance API.
-*   Summarizes the findings in a clear, command-line-based report.
-*   (Future) Performs sentiment analysis on each mention.
+*   **Persistent Storage:** Scraped data is stored in a local SQLite database (`reddit_stocks.db`), so you can track trends over time.
+*   **Deep Scanning:** Analyzes both post titles and comments from a user-defined list of subreddits.
+*   **Sentiment Analysis:** Uses NLTK's VADER engine to calculate a sentiment score (Bullish, Bearish, or Neutral) for each mention.
+*   **Financial Data:** Enriches ticker data by fetching market capitalization from Yahoo Finance, with intelligent caching to minimize API calls.
+*   **Data Quality:** Utilizes a configurable blacklist and smart filtering to ignore common words and reduce false positives (e.g., "YOLO", "CEO", "A").
+*   **Automatic Cleanup:** Automatically purges old, invalid data from the database if you update the ticker blacklist.
+*   **Installable Command:** Packaged with `setuptools`, allowing you to install the tool and run it from anywhere on your system using the `rstat` command.
+*   **Flexible Reporting:** The final report can be customized using command-line arguments to control the number of results shown.

-## Installation
+## Project Structure

-Follow these steps to set up the project and its dependencies on your local machine.
+```
+reddit_stock_analyzer/
+├── .env                  # Your secret API keys
+├── requirements.txt      # Project dependencies
+├── setup.py              # Installation script for the tool
+├── subreddits.json       # Configuration for which subreddits to scan
+├── rstat_tool/           # The main source code package
+│   ├── __init__.py
+│   ├── main.py           # Main entry point and CLI logic
+│   ├── database.py       # All SQLite database functions
+│   ├── sentiment_analyzer.py
+│   ├── setup_nltk.py     # One-time NLTK setup script
+│   └── ticker_extractor.py
+└── ...
+```

-### 1. Clone the Repository
+## Setup and Installation

-First, clone this repository to your local machine (or simply download and create the files as described).
+Follow these steps to set up the project on your local machine.

+### 1. Prerequisites
+*   Python 3.7+
+*   Git
+
+### 2. Clone the Repository
 ```bash
 git clone <your-repository-url>
-cd rstat
+cd reddit_stock_analyzer
 ```

-### 2. Set Up a Python Virtual Environment
-
-It is highly recommended to use a virtual environment to manage project-specific dependencies, preventing conflicts with your global Python installation.
+### 3. Set Up a Python Virtual Environment
+It is highly recommended to use a virtual environment to manage dependencies.

 **On macOS / Linux:**
-
 ```bash
-# Create a virtual environment named 'venv'
-python3 -m venv venv
-
-# Activate the virtual environment
-source venv/bin/activate
+python3 -m venv .venv
+source .venv/bin/activate
 ```
-*You will know it's active when you see `(venv)` at the beginning of your terminal prompt.*

 **On Windows:**
-
 ```bash
-# Create a virtual environment named 'venv'
-python -m venv venv
-
-# Activate the virtual environment
-.\venv\Scripts\activate
+python -m venv .venv
+.\.venv\Scripts\activate
 ```
-*You will know it's active when you see `(venv)` at the beginning of your command prompt.*
-
-### 3. Install Dependencies
-
-Once your virtual environment is activated, install the required Python libraries using the `requirements.txt` file.

+### 4. Install Dependencies
 ```bash
 pip install -r requirements.txt
 ```

+### 5. Configure Reddit API Credentials
+The tool needs API access to read data from Reddit.
+
+1.  Go to the [Reddit Apps preferences page](https://www.reddit.com/prefs/apps) and create a new "script" app.
+2.  Create a file named `.env` in the root of the project directory.
+3.  Add your credentials to the `.env` file like this:
+
+    ```
+    REDDIT_CLIENT_ID=your_client_id_from_reddit
+    REDDIT_CLIENT_SECRET=your_client_secret_from_reddit
+    REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.0)
+    ```
+    **IMPORTANT:** Never commit your `.env` file to version control.
+
+### 6. Set Up NLTK
+Run the included setup script **once** to download the required `vader_lexicon` for sentiment analysis.
+```bash
+python rstat_tool/setup_nltk.py
+```
+
+### 7. Build and Install the `rstat` Command
+Install the tool in "editable" mode. This creates the `rstat` command in your virtual environment and links it to your source code. Any changes you make to the code will be immediately available.
+
+```bash
+pip install -e .
+```
+The installation is now complete.
+
+---
+
 ## Configuration

-Before running the tool, you need to configure the list of subreddits you want to analyze.
-
-1.  Open the `subreddits.json` file.
-2.  Modify the list of strings to include your desired subreddits.
-
-**Example `subreddits.json`:**
+### Subreddits
+Modify the `subreddits.json` file to define which communities the tool should scan.
 ```json
 {
  "subreddits": [
    "wallstreetbets",
    "stocks",
    "investing",
-    "pennystocks"
+    "options"
  ]
 }
 ```

-## Usage
-
-To run the tool, execute the `main.py` script from the root directory of the project, passing the path to your configuration file as an argument.
-
-Make sure your virtual environment is activated before running the script.
-
-```bash
-python main.py subreddits.json
-```
-
-### Expected Output
-
-The tool will first confirm the loaded subreddits and then proceed with its analysis, printing the results directly to the terminal.
-
-```
-Loading configuration...
-Successfully loaded 4 subreddits: wallstreetbets, stocks, investing, pennystocks
------------------------------
-Testing market data functionality...
-Market Cap for AAPL: $2,912,488,124,416
------------------------------
-Next up: Integrating the Reddit API to find tickers...
-```
+### Ticker Blacklist (Advanced)
+To improve data quality, you can add common words that are mistaken for tickers to the `COMMON_WORDS_BLACKLIST` set inside the `rstat_tool/ticker_extractor.py` file. The tool will automatically clean the database of these tickers on the next run.

 ---

-This `README.md` provides a clear and concise guide for anyone (including your future self) to get the project up and running quickly.
+## Usage

-We are now ready to move on to the next implementation step. Shall we proceed with integrating the Reddit API using PRAW?
+Once installed, you can run the tool from any directory using the `rstat` command.
+
+### Basic Usage
+Run an analysis using the default settings (scans 25 posts, 100 comments/post, shows top 20 tickers).
+
+```bash
+rstat subreddits.json
+```
+
+### Advanced Usage with Arguments
+Use command-line arguments to control the scan and the report.
+
+```bash
+# Scan only 10 posts, 50 comments per post, and show a report of the top 5 tickers
+rstat subreddits.json --posts 10 --comments 50 --limit 5
+```
+
+### Getting Help
+To see all available commands and their descriptions:
+```bash
+rstat --help
+```
+
+### Example Output
+
+```
+--- Top 5 Tickers by Mention Count ---
+Ticker   | Mentions | Bullish  | Bearish  | Neutral  | Market Cap
+---------------------------------------------------------------------------
+TSLA     | 183      | 95       | 48       | 40       | $580.45B
+NVDA     | 155      | 110      | 15       | 30       | $1.15T
+AAPL     | 98       | 50       | 21       | 27       | $2.78T
+SPY      | 76       | 30       | 35       | 11       | N/A
+AMD      | 62       | 45       | 8        | 9        | $175.12B
+```