About the Titans paper
Thoughts
The Titans paper presents interesting and innovative concepts, particularly in the integration of different memory forms, long-term (neural memory), short-term (attention), and persistent (learned tokens), and the “learning by surprise” mechanism. However, there are several areas where further clarification and validation are needed. Below is a summary of key observations from this internal presentation about the paper.
Strengths
- The paper introduces diverse memory forms, which is a novel and valuable approach to long-term memory in AI models.
- The “learning by surprise” concept is intriguing and aligns with human cognitive processes (at least claimed).
- The Memory as Context (MAC) approach appears to be the most promising from a logical standpoint (and experimental results)
- The inclusion of persistent memory is a positive addition, and concatenating different memory structures is an effective strategy.
- The idea of optimizing during test-time (meta-learning) offers a fresh perspective on learning dynamics.
Areas for Improvement
- Lack of empirical validation: While the theory is compelling, real-world applications and empirical results are needed to substantiate the claims (concerning reproducibility).
- Hyperparameters: The methods requires several hyperparameters which are not given in the text.
- Code availability: The absence of public code makes it difficult for the community to verify and reproduce the results.
- Unclear performance impact: It remains uncertain how much of the reported improvements stem from additional context rather than the model’s inherent capabilities.
- Inference speed and efficiency: The paper does not provide enough details on how the proposed approach affects real-world computational performance.
- Inner and outer training loops: While an interesting approach, the way it is presented may be confusing to some readers and requires further clarification.
- Concept paper/draft stage: The paper feels unfinished, with some unclear aspects, such as the effect of persistent memory size and its overall impact.
- 2M Tokens Context Length: Is yet unclear whether this token length is due to the neural memory module or the internally done chunking (in MAC) or Sliding-Window Attention (in MAG and MAL).
Conclusion
Overall, the paper presents exciting ideas but is still in a conceptual stage. Public code (by authors) would make me very happy to check it out even more!
If you have any questions about this, feel free to reach out.