Thoughts

The Titans paper presents interesting and innovative concepts, particularly in the integration of different memory forms, long-term (neural memory), short-term (attention), and persistent (learned tokens), and the “learning by surprise” mechanism. However, there are several areas where further clarification and validation are needed. Below is a summary of key observations from this internal presentation about the paper.

Strengths

The paper introduces diverse memory forms, which is a novel and valuable approach to long-term memory in AI models.
The “learning by surprise” concept is intriguing and aligns with human cognitive processes (at least claimed).
The Memory as Context (MAC) approach appears to be the most promising from a logical standpoint (and experimental results)
The inclusion of persistent memory is a positive addition, and concatenating different memory structures is an effective strategy.
The idea of optimizing during test-time (meta-learning) offers a fresh perspective on learning dynamics.

Areas for Improvement

Lack of empirical validation: While the theory is compelling, real-world applications and empirical results are needed to substantiate the claims (concerning reproducibility).
Hyperparameters: The methods requires several hyperparameters which are not given in the text.
Code availability: The absence of public code makes it difficult for the community to verify and reproduce the results.
Unclear performance impact: It remains uncertain how much of the reported improvements stem from additional context rather than the model’s inherent capabilities.
Inference speed and efficiency: The paper does not provide enough details on how the proposed approach affects real-world computational performance.
Inner and outer training loops: While an interesting approach, the way it is presented may be confusing to some readers and requires further clarification.
Concept paper/draft stage: The paper feels unfinished, with some unclear aspects, such as the effect of persistent memory size and its overall impact.
2M Tokens Context Length: Is yet unclear whether this token length is due to the neural memory module or the internally done chunking (in MAC) or Sliding-Window Attention (in MAG and MAL).

Conclusion

Overall, the paper presents exciting ideas but is still in a conceptual stage. Public code (by authors) would make me very happy to check it out even more!

If you have any questions about this, feel free to reach out.