About the Titans paper

Thoughts

The Titans paper presents interesting and innovative concepts, particularly in the integration of different memory forms, long-term (neural memory), short-term (attention), and persistent (learned tokens), and the “learning by surprise” mechanism. However, there are several areas where further clarification and validation are needed. Below is a summary of key observations from this internal presentation about the paper.

This browser does not support PDFs. Please download the PDF to view it: Download PDF.

</embed>

Strengths

  • The paper introduces diverse memory forms, which is a novel and valuable approach to long-term memory in AI models.
  • The “learning by surprise” concept is intriguing and aligns with human cognitive processes (at least claimed).
  • The Memory as Context (MAC) approach appears to be the most promising from a logical standpoint (and experimental results)
  • The inclusion of persistent memory is a positive addition, and concatenating different memory structures is an effective strategy.
  • The idea of optimizing during test-time (meta-learning) offers a fresh perspective on learning dynamics.

Areas for Improvement

  • Lack of empirical validation: While the theory is compelling, real-world applications and empirical results are needed to substantiate the claims (concerning reproducibility).
  • Hyperparameters: The methods requires several hyperparameters which are not given in the text.
  • Code availability: The absence of public code makes it difficult for the community to verify and reproduce the results.
  • Unclear performance impact: It remains uncertain how much of the reported improvements stem from additional context rather than the model’s inherent capabilities.
  • Inference speed and efficiency: The paper does not provide enough details on how the proposed approach affects real-world computational performance.
  • Inner and outer training loops: While an interesting approach, the way it is presented may be confusing to some readers and requires further clarification.
  • Concept paper/draft stage: The paper feels unfinished, with some unclear aspects, such as the effect of persistent memory size and its overall impact.
  • 2M Tokens Context Length: Is yet unclear whether this token length is due to the neural memory module or the internally done chunking (in MAC) or Sliding-Window Attention (in MAG and MAL).

Conclusion

Overall, the paper presents exciting ideas but is still in a conceptual stage. Public code (by authors) would make me very happy to check it out even more!

If you have any questions about this, feel free to reach out.