Skip to content

[New Model] RWKV#1902

Open
JanFidor wants to merge 2 commits intounit8co:masterfrom
JanFidor:feature/rwkv-model
Open

[New Model] RWKV#1902
JanFidor wants to merge 2 commits intounit8co:masterfrom
JanFidor:feature/rwkv-model

Conversation

@JanFidor
Copy link
Contributor

@JanFidor JanFidor commented Jul 17, 2023

Fixes #1817 .

Quick summary

For now the implementation follows pretty closely what was described in the paper. The implementation from the official RWKV repo has quite a few improvements which weren't discussed in the paper, but for now I wanted to get at least a workable model.

Roadmap

  • Update model initializations which were hard coded for now. I know, very bad idea, but the paper used initializations assigning different weights to different embedding which didn't feel like a good idea for a TS model.
  • Use teacher forcing for training
  • Make a benchmark with SOTA models (ex. DLinear, NLinear, TFTModel).
  • Add support for past covariates (wasn't 100% sure how to do it with the model being auto-regressive)
  • More initialization benchmarks
  • Browsing the RWKV repo to look for improvements which would make sense in a TS model
  • This one is a long shot, but I was thinking about adding support for both future and static covariates. It would require fiddling with the attention mechanism, but it feels doable.
  • Add support for probabilistic forecasting
  • Even more benchmarks (especially performance wise, as the RWKV should do pretty well when it come to long input and output chunk lengths)
  • Add tests and update Readme, Changelog and docstrings

There's still a lot of things to be done, but I wanted to put up a PR as a quick update on how everything's going and a simple roadmap for the future

@JanFidor JanFidor requested a review from dennisbader as a code owner July 17, 2023 19:53
@codecov-commenter
Copy link

codecov-commenter commented Jul 17, 2023

Codecov Report

❌ Patch coverage is 23.78049% with 125 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.97%. Comparing base (a5560cc) to head (7a1f0ee).
⚠️ Report is 379 commits behind head on master.

Files with missing lines Patch % Lines
darts/models/forecasting/rwkv_model.py 23.31% 125 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1902      +/-   ##
==========================================
- Coverage   93.95%   92.97%   -0.98%     
==========================================
  Files         125      126       +1     
  Lines       11773    11923     +150     
==========================================
+ Hits        11061    11086      +25     
- Misses        712      837     +125     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dennisbader
Copy link
Collaborator

Hi @JanFidor, and thanks for this PR. Just to let you know that we're wrapping up the last few things for the release in 1-2 weeks. Once that's done we'll come back to this and review 🚀

@gdevos010
Copy link
Contributor

@JanFidor Were you able to benchmark this model?

@JanFidor
Copy link
Contributor Author

JanFidor commented Aug 30, 2023

@gdevos010 just some basic ones, I still have to play around with parameter initializations. On SunspotsDataset I noticed that NLinear and Transformer were having noticeable MAPE changes depending on output_chunk_length (changes around 60 <-> 200 ) while RWKV was consistently performing around 100. I also threw in ETTh1 dataset, with 720 input_chunk _length 336 output_chunk_length. The RWKV had terrible MAPE. Not sure it the architecture was at fault or if it was caused by under fitting. I'll try to make a more comprehensive benchmark next week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[New model] RWKV

4 participants