Data Exploration and Predictive Analytics for Music Publishing

Lead Research Organisation: Imperial College London
Department Name: Dept of Mathematics


The PDRA will liaise with the developers at Sentric Music to ensure a broad array of diverse data sources is linked and
preprocessed in a statistically sound manner, and ensuring the final version of the data are in a format conducive to
machine learning and statistical inference (e.g., unstructured data will need to be pre-parsed into structured data). The
PDRA will need to use a broad suite of "data science" skills to achieve this - including computing skills, as well as statistical
The second objective will involve representing the problem from a statistical viewpoint, as a problem of predicting the future
value of a quantity of interest (in this case earnings), on the basis of attributes about the artist and/or their songs, such as
past earnings, genre, fan-base, etc. To choose an appropriate model, two types of considerations come into play: the
format of the data, as well as our expectations about the types of relationships we are trying to capture. We discuss both in
With regards to data format, this particular application is likely to give rise to a large number of attributes, of various types
(e.g., each song, or artist, will be represented in numeric ways, placed into categories, or rated according to possibly
different scales, etc.). Automatic feature selection techniques will be required to ensure that information-poor attributes are
excluded from consideration to avoid contaminating the results. Moreover, there is a natural hierarchical structure to this
problem, introduced by the relationship between an artist and their songs. Both these aspects challenge off-the-shelf
statistical models, and require a bespoke model.
With regards to the choice of model, it is known that typically in Big Data, as the data set size increases, so does the
heterogeneity in the data, and failing to account for this can lead to over-confident and inaccurate predictions. One solution
is to employ a "divide and conquer" approach by using decision trees, which segment the initial dataset and fit a separate
statistical model in each segment. This approach achieves flexibility without compromising on computational efficiency.
Notably, the output of such models remains interpretable by the end user because it closely resembles the manual
segmentation already used extensively in marketing and, currently, by Sentric. The difference is that the segmentation
rules are extracted from the data in a principled, automatic fashion. Another consideration in choosing the model is the
ability for it to output the confidence of its own predictions. Failure to do so can introduce risks since only confident
predictions should be used for decision-making. Adopting a Bayesian framework is a natural way to achieve this objective.
Our favored approach overall is the framework of Bayesian Dynamic Trees, which combines flexibility, statistical
soundness, scalability using cutting-edge methods, as well as a built-in ability to adapt to data evolution at no extra
computational cost [Anagnostopoulos, 2013]. This framework will have to be extended to handle this problem, to handle the
hierarchical relationship between artists and their songs; the diversity of available attributes; and the need to produce
forecasts over possibly longer-time horizons.
Finally, the PRDA will supervise and contribute to the deployment of the model within Sentric, as well as the design of the
User Interface that will be made available to the artists. The former will involve scalability considerations, and the latter will
involve innovation in visualisation, and communication of uncertainty.

Planned Impact

Sentric is likely to gain a competitive advantage on the basis of the work proposed in various ways:
1. rationalise its pricing model and user engagement costs by focusing on artists of greater predicted future value
2. allow its users to capture value with less risk by advancing payments on future royalties in cases where this is strongly
supported by the data
3. embark on joint, data-based strategies for career progression of its users by giving them access to predictive models that
not only offer forecasts of future earnings, but are also able to identify attributes that are likely to be "critical levers" in future
growth. This is facilitated by the emphasis placed by the PI on interpretable models.
4. identify cases where predicted earnings grossly disagree with actual earnings, and investigate for the possibility of
under-reporting, hence capturing lost value
Direct benefits to the UK from strengthening Sentric's position in the UK include increased hiring needs in the Liverpool
area, and boosting the reputation of the UK in general, and Liverpool and the North in particular, as a hostpost of musical
creativity and digital innovation. This can also help drive partnerships between established music catalogues and Sentric,
building further bridges between SMEs and large firms in this sector. Additional direct benefits arise from the added value
that will be made available to Sentric's artists, that are predominantly small independent artists who at present have little
access to support in comparison to top selling artists managed by large music publishers.
More generally, leveraging on public data to offer a competitive advantage to an SME, as well as to the broader user base
of independent artists, is squarely in scope of the Research Councils' vision on Big Data. Music publishing is a sector
undergoing rapid evolution, and the ability to use data to enable artists and small media publishers to navigate this complex field might play a critical role in the long-run in ensuring artistic diversity and independence. This ultimately benefits the
public, as well as the UK economy, for which music has traditionally been a major exported good.
Public dissemination of the results of this work will also contribute to sustaining the momentum of the Big Data
phenomenon in the UK, which is currently reshaping the UK's digital economy.
To sum up, the beneficiaries include:
1. Sentric, a UK SME, who is likely to gain a competitive edge by offering better services and optimising its pricing and client
engagement model
2. The artists represented by Sentric, predominantly comprising small, independent artists that are not well served by larger
media publishers
3. The local Liverpool economy, via potential growth of Sentric as well as gained "brand" value as a hotspot for innovation in
music publishing
4. The UK culture as a whole, by helping small independent artists capture value and survive in an increasingly competitive
5. The UK economy as a whole, by ensuring that it remains on the cutting edge of digital innovation in the rapidly evolving
entertainment industry


10 25 50
Description On the methodological front, we have discovered that there is a gap between the academic state-of-the-art in Bayesian statistical modelling for time-series data, and standard practice as par tof business intelligence and analytics software in industry. One particular cause for this is the lack of available tools that handle temporal variation in flexible ways for highly dynamic real-life settings (e.g., social media).

On the applied side, we have identified that it is possible to forecast accurately the likely evolution of a small artist's career in financial terms, as well as recommend actions the artist can take to improve their chances of success (such as perform in a specific venue, or become active in a particular broadcast channel) using data analysis.
Exploitation Route We are in the process of publishing our research findings, making available an open-source software package of our methodology, and organising a seminar aimed at practitioners where we will explain the business problems addressed by our methodology.
Sectors Creative Economy,Culture, Heritage, Museums and Collections
Description One of the biggest impacts of this grant has been the use of Big Data and forecasting methodology to identify artists that are likely to increase their royalty earnings in the immediate future. These artists are then offered micro-lending facilities by Sentric Music, which can be a critical factor in enabling success for an early-career artist, but also a critical factor in ensuring customer retention for Sentric Music, allowing it to compete with larger international music publishers. This use case is important in that Data and Statistical Analysis opens up new financing avenus by correctly estimating risk.
First Year Of Impact 2016
Sector Creative Economy
Impact Types Cultural,Economic