- The Rollup
- Posts
- Training Data
Training Data

Was this email forwarded to you? Sign up here
Just like an athlete needs to train, LLM’s need to be trained and data is their fuel. Bloomberg’s Help Desk might be the most powerful training data for financial markets LLM’s
Likely 20+ years of Q&A data on markets, the Bloomberg terminal, relevant events, cross asset class, charting with technical analysis, and more. Can you think of a better training dataset for building a financial markets LLM? Likely Factset has some Q&A data, but I can’t imagine it is anywhere near the depth and breadth of Bloomberg. Quora and Reddit have some financial markets data as well, but nothing like the quality you get from the Bloomberg subscribers.
In the AI data-hungry world we live in, the training datasets are a super powerful asset. Does Bloomberg think of monetizing this asset for clients willing to spend millions? Do they keep it internal and just create their own powerful LLM’s? With the leadership and board changes coming at Bloomberg, I would love for them to think outside of the box and challenge them to start licensing this data. Anonymized Q&A data from the Help Desk to train LLM’s.
Training data is a new opportunity for Data & Information service providers to think about. How do you price this data use case? I can think of a dozen companies that have a very rich dataset that anyone building a financial markets-focused LLM would love to get their hands on. Would Citadel, 2Sigma, Millennium, and the hedge fund titans that are likely building their own LLM pay millions for this training dataset? What about the banks and the largest asset managers in the world? Could this be used for not only assisting analysts and PM’s, but what about reducing the size of the workforce or making analysts 10x more efficient? Feels like the sky is the limit!

Reply