A lightweight, trainable transformer-based language model with automated daily training via GitHub Actions.
- Scalable Architecture: GPT-style transformer, easily adjustable from tiny (10M) to large (350M+) parameters
- Automated Training: Daily training on different Hugging Face datasets via GitHub Actions
- Day-based Dataset Rotation: Different dataset trains each day (Monday-Sunday)
- Hugging Face Integration: Model automatically uploaded to HuggingFace Hub
- Wandb Integration: Real-time training metrics and visualization
- CPU-Optimized: Runs efficiently on GitHub Actions free tier (Ubuntu CPU)
- Easy Configuration: YAML-based model and dataset configuration
The trained model is available on Hugging Face:
from huggingface_hub import hf_hub_download
# Download model files
hf_hub_download("MeridianAlgo/Fin.AI", "model.pt", local_dir="./model")
hf_hub_download("MeridianAlgo/Fin.AI", "config.json", local_dir="./model")from fin_ai.model import FinAIModel
model = FinAIModel.from_pretrained("./model")# Install dependencies
pip install -r requirements.txt
# Train the model
python train.py --config config/model_config.yaml --datasets config/datasets.yaml
# Generate text
python generate.py --model checkpoints/model --prompt "Once upon a time"The model trains automatically every day at 6 AM UTC. Each day uses a different dataset:
- Monday: WikiText-2 (encyclopedia text)
- Tuesday: TinyStories (short stories)
- Wednesday: CNN News (news articles)
- Thursday: Dolly (instruction data)
- Friday: arXiv (scientific papers)
- Saturday: SQuAD (Q&A data)
- Sunday: WikiText-103 (large encyclopedia)
After training, the model is automatically uploaded to Hugging Face.
| Size | Parameters | Layers | Heads | Embed Dim | Speed |
|---|---|---|---|---|---|
| tiny | ~10M | 4 | 4 | 256 | ⚡ Fast |
| small | ~25M | 6 | 6 | 384 | 🚀 Medium |
| medium | ~85M | 12 | 8 | 512 | 🐢 Slow |
| large | ~350M | 24 | 12 | 768 | 🐌 Very Slow |
Edit config/model_config.yaml to change model size:
model:
size_preset: "tiny" # or small, medium, largeEdit config/datasets.yaml to customize datasets for each day:
datasets:
- name: "wikitext"
subset: "wikitext-2-raw-v1"
split: "train"
text_column: "text"
day: 1 # Monday
max_samples: 100000Adjust in config/model_config.yaml:
training:
batch_size: 4
learning_rate: 5.0e-4
max_steps: 500
warmup_steps: 100
eval_steps: 100fin-ai/
├── fin_ai/ # Main package
│ ├── model/ # Transformer architecture
│ │ ├── config.py # Model configuration
│ │ └── transformer.py # GPT-style model
│ ├── data/ # Dataset loading
│ │ └── dataset.py # HF dataset utilities
│ └── training/ # Training loop
│ └── trainer.py # Trainer with checkpointing
├── config/ # Configuration files
│ ├── model_config.yaml # Model & training config
│ └── datasets.yaml # Dataset configuration
├── train.py # Main training script
├── generate.py # Text generation script
├── requirements.txt # Python dependencies
└── .github/workflows/ # GitHub Actions
└── train.yml # Daily training workflow
# Train with default config
python train.py
# Override max steps
python train.py --max-steps 1000
# Limit dataset samples (for testing)
python train.py --max-samples 10000
# Custom output directory
python train.py --output-dir ./my_checkpoints# Generate from prompt
python generate.py --prompt "The future of AI"
# Customize generation
python generate.py \
--model checkpoints/model \
--prompt "Hello world" \
--max-tokens 200 \
--temperature 0.8 \
--top-k 50 \
--top-p 0.9If you have a Wandb account, add your API key as a GitHub secret:
- Get your API key from wandb.ai
- Add
WANDB_API_KEYto GitHub repo secrets - View live training at wandb.ai/your-username/fin-ai
Checkpoints are saved to checkpoints/:
checkpoints/
├── model/ # Latest model
│ ├── config.json
│ └── model.pt
├── checkpoint-100.pt # Intermediate checkpoints
├── checkpoint-200.pt
└── best_model.pt # Best evaluation checkpoint
On GitHub Actions free tier (Ubuntu CPU):
- Tiny model: ~16 seconds per step
- 500 steps: ~2.2 hours (fits in 3-hour limit)
- Daily training: ~500 steps per day
- Monthly: ~15,000 steps (~7.5M tokens)
Fin.AI uses a GPT-2 style transformer with:
- Multi-head self-attention with rotary positional embeddings
- Feed-forward layers with SwiGLU activation
- Pre-norm architecture for stable training
- Gradient accumulation for larger effective batch sizes
- Mixed precision training (when GPU available)
Edit config/datasets.yaml:
datasets:
- name: "your-dataset"
subset: null
split: "train"
text_column: "text"
day: 1
max_samples: 50000Edit .github/workflows/train.yml:
schedule:
- cron: '0 6 * * *' # Daily at 6 AM UTCEdit config/model_config.yaml:
model:
size_preset: "small" # Larger model- Reduce
batch_sizein config - Use smaller
size_preset(tiny) - Reduce
max_seq_lento 256
- Reduce
batch_size - Reduce
max_seq_len - Use
gradient_accumulation_stepsto simulate larger batches
- Check dataset name on Hugging Face
- Verify
text_columnmatches dataset schema - Try with
max_sampleslimit first
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Areas for enhancement:
- GPU support for faster training
- Distributed training across multiple machines
- Model quantization for inference
- Web UI for generation
- Fine-tuning on custom data
For security concerns, please see SECURITY.md.
This project follows the Contributor Covenant Code of Conduct.
MIT License - see LICENSE file
- Built with PyTorch
- Models from Hugging Face Transformers
- Datasets from Hugging Face Datasets
- Monitoring with Weights & Biases
🚀 Active Development - Daily training on GitHub Actions
- Model: huggingface.co/MeridianAlgo/Fin.AI
- Training Logs: GitHub Actions
- Metrics: Wandb Dashboard
Questions? Open an issue on GitHub!