From 03e6e56a35cd4560112e534baf0be2605e9f6b48 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A5l-Kristian=20Hamre?= Date: Mon, 21 Jul 2025 15:46:41 +0200 Subject: [PATCH] Improve doc. --- README.md | 179 ++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 113 insertions(+), 66 deletions(-) diff --git a/README.md b/README.md index 68e84ad..3b377f6 100644 --- a/README.md +++ b/README.md @@ -1,107 +1,154 @@ -# rstat - Reddit Stock Ticker Analyzer Tool +# rstat - Reddit Stock Analyzer Tool -This is a command-line tool to analyze stock ticker mentions across a predefined list of subreddits. It scrapes posts and comments, counts the number of times each ticker is mentioned, fetches the ticker's market capitalization, and will calculate a sentiment score for each mention. +A powerful, installable command-line tool to scan Reddit for stock ticker mentions, perform sentiment analysis, and generate insightful summary reports. -## Features +## Key Features -* Scans a user-defined list of subreddits from a JSON configuration file. -* Identifies stock tickers (e.g., `$AAPL`, `TSLA`) in Reddit posts and comments. -* Fetches market capitalization for each identified ticker using the Yahoo Finance API. -* Summarizes the findings in a clear, command-line-based report. -* (Future) Performs sentiment analysis on each mention. +* **Persistent Storage:** Scraped data is stored in a local SQLite database (`reddit_stocks.db`), so you can track trends over time. +* **Deep Scanning:** Analyzes both post titles and comments from a user-defined list of subreddits. +* **Sentiment Analysis:** Uses NLTK's VADER engine to calculate a sentiment score (Bullish, Bearish, or Neutral) for each mention. +* **Financial Data:** Enriches ticker data by fetching market capitalization from Yahoo Finance, with intelligent caching to minimize API calls. +* **Data Quality:** Utilizes a configurable blacklist and smart filtering to ignore common words and reduce false positives (e.g., "YOLO", "CEO", "A"). +* **Automatic Cleanup:** Automatically purges old, invalid data from the database if you update the ticker blacklist. +* **Installable Command:** Packaged with `setuptools`, allowing you to install the tool and run it from anywhere on your system using the `rstat` command. +* **Flexible Reporting:** The final report can be customized using command-line arguments to control the number of results shown. -## Installation +## Project Structure -Follow these steps to set up the project and its dependencies on your local machine. +``` +reddit_stock_analyzer/ +├── .env # Your secret API keys +├── requirements.txt # Project dependencies +├── setup.py # Installation script for the tool +├── subreddits.json # Configuration for which subreddits to scan +├── rstat_tool/ # The main source code package +│ ├── __init__.py +│ ├── main.py # Main entry point and CLI logic +│ ├── database.py # All SQLite database functions +│ ├── sentiment_analyzer.py +│ ├── setup_nltk.py # One-time NLTK setup script +│ └── ticker_extractor.py +└── ... +``` -### 1. Clone the Repository +## Setup and Installation -First, clone this repository to your local machine (or simply download and create the files as described). +Follow these steps to set up the project on your local machine. +### 1. Prerequisites +* Python 3.7+ +* Git + +### 2. Clone the Repository ```bash git clone -cd rstat +cd reddit_stock_analyzer ``` -### 2. Set Up a Python Virtual Environment - -It is highly recommended to use a virtual environment to manage project-specific dependencies, preventing conflicts with your global Python installation. +### 3. Set Up a Python Virtual Environment +It is highly recommended to use a virtual environment to manage dependencies. **On macOS / Linux:** - ```bash -# Create a virtual environment named 'venv' -python3 -m venv venv - -# Activate the virtual environment -source venv/bin/activate +python3 -m venv .venv +source .venv/bin/activate ``` -*You will know it's active when you see `(venv)` at the beginning of your terminal prompt.* **On Windows:** - ```bash -# Create a virtual environment named 'venv' -python -m venv venv - -# Activate the virtual environment -.\venv\Scripts\activate +python -m venv .venv +.\.venv\Scripts\activate ``` -*You will know it's active when you see `(venv)` at the beginning of your command prompt.* - -### 3. Install Dependencies - -Once your virtual environment is activated, install the required Python libraries using the `requirements.txt` file. +### 4. Install Dependencies ```bash pip install -r requirements.txt ``` +### 5. Configure Reddit API Credentials +The tool needs API access to read data from Reddit. + +1. Go to the [Reddit Apps preferences page](https://www.reddit.com/prefs/apps) and create a new "script" app. +2. Create a file named `.env` in the root of the project directory. +3. Add your credentials to the `.env` file like this: + + ``` + REDDIT_CLIENT_ID=your_client_id_from_reddit + REDDIT_CLIENT_SECRET=your_client_secret_from_reddit + REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.0) + ``` + **IMPORTANT:** Never commit your `.env` file to version control. + +### 6. Set Up NLTK +Run the included setup script **once** to download the required `vader_lexicon` for sentiment analysis. +```bash +python rstat_tool/setup_nltk.py +``` + +### 7. Build and Install the `rstat` Command +Install the tool in "editable" mode. This creates the `rstat` command in your virtual environment and links it to your source code. Any changes you make to the code will be immediately available. + +```bash +pip install -e . +``` +The installation is now complete. + +--- + ## Configuration -Before running the tool, you need to configure the list of subreddits you want to analyze. - -1. Open the `subreddits.json` file. -2. Modify the list of strings to include your desired subreddits. - -**Example `subreddits.json`:** +### Subreddits +Modify the `subreddits.json` file to define which communities the tool should scan. ```json { "subreddits": [ "wallstreetbets", "stocks", "investing", - "pennystocks" + "options" ] } ``` -## Usage - -To run the tool, execute the `main.py` script from the root directory of the project, passing the path to your configuration file as an argument. - -Make sure your virtual environment is activated before running the script. - -```bash -python main.py subreddits.json -``` - -### Expected Output - -The tool will first confirm the loaded subreddits and then proceed with its analysis, printing the results directly to the terminal. - -``` -Loading configuration... -Successfully loaded 4 subreddits: wallstreetbets, stocks, investing, pennystocks ------------------------------- -Testing market data functionality... -Market Cap for AAPL: $2,912,488,124,416 ------------------------------- -Next up: Integrating the Reddit API to find tickers... -``` +### Ticker Blacklist (Advanced) +To improve data quality, you can add common words that are mistaken for tickers to the `COMMON_WORDS_BLACKLIST` set inside the `rstat_tool/ticker_extractor.py` file. The tool will automatically clean the database of these tickers on the next run. --- -This `README.md` provides a clear and concise guide for anyone (including your future self) to get the project up and running quickly. +## Usage -We are now ready to move on to the next implementation step. Shall we proceed with integrating the Reddit API using PRAW? \ No newline at end of file +Once installed, you can run the tool from any directory using the `rstat` command. + +### Basic Usage +Run an analysis using the default settings (scans 25 posts, 100 comments/post, shows top 20 tickers). + +```bash +rstat subreddits.json +``` + +### Advanced Usage with Arguments +Use command-line arguments to control the scan and the report. + +```bash +# Scan only 10 posts, 50 comments per post, and show a report of the top 5 tickers +rstat subreddits.json --posts 10 --comments 50 --limit 5 +``` + +### Getting Help +To see all available commands and their descriptions: +```bash +rstat --help +``` + +### Example Output + +``` +--- Top 5 Tickers by Mention Count --- +Ticker | Mentions | Bullish | Bearish | Neutral | Market Cap +--------------------------------------------------------------------------- +TSLA | 183 | 95 | 48 | 40 | $580.45B +NVDA | 155 | 110 | 15 | 30 | $1.15T +AAPL | 98 | 50 | 21 | 27 | $2.78T +SPY | 76 | 30 | 35 | 11 | N/A +AMD | 62 | 45 | 8 | 9 | $175.12B +``` \ No newline at end of file