Highlights
- AI-powered ad scoring system quantifies creativity by analyzing multimodal, structural, stylistic, and contextual features to predict ad performance before launch.
- Client-specific normalization ensures fair, accurate scoring across brands of different scales and performance baselines.
- Neural network models with explainability tools like SHAP or LIME turn ad scores into actionable insights, helping teams understand why an ad performs the way it does.
- Ad scoring transforms agency workflows—enabling data-driven creativity, optimized resource allocation, and continuous improvement in ad performance and client results.
Introduction: The challenges of measuring ad performance
With global digital ad spend projected to surpass $600 billion, the pressure to deliver effective campaigns has never been higher. Yet, a common frustration in the advertising world is the unpredictability of performance; an ad might excel in Click-Through Rate (CTR) but fail on Return on Ad Spend (ROAS). To move beyond intuition and unlock consistent success, we can leverage machine learning to analyze historical data and identify the patterns that separate high-performing ads from the rest.
This post outlines an engineering approach to building an AI-powered ad scoring system that helps agencies improve ad performance by evaluating and optimising their ads before they go live.

Inside the foundation of the ad scoring system
System design and feature engineering
The foundation of our ad scoring system is a robust set of features that comprehensively describe an advertisement.
- Multimodal features (image & text): The core of any ad is its creative content. We must convert these unstructured elements into numerical vectors that a model can understand. Models like CLIP or BERT are excellent for this purpose. They can capture the semantic meaning of the ad's text and the content of its images, which could range from professional product shots and user-generated content to abstract doodles and even memes, depending on the campaign's style and target audience.
- Structural & positional features: The layout of an ad is critical. We can capture text position and image position by overlaying a categorical grid on the ad creative. The grid cell where the main text or image is located can be encoded as a feature.

- Stylistic & contextual features:
Font attributes: Font color and font size are important stylistic elements that can be included as features.
Contextual factors: To provide a broader context, we can include socio-political, economic, and cultural factors. These can be represented as categorical features, such as the country of publication or the target audience demographic.
Data preparation and normalization
A significant challenge is that ad performance metrics are not uniform across different clients. For example, a Fortune 500 company's ad has different success standards than a small business's.
A practical approach is percentile binning. By analyzing each company’s historical data individually, we can bin the performance metrics into percentiles (e.g., top 10%, 10-20%, etc.). This client-specific approach is crucial, as it ensures the resulting performance bins are relevant and fair for each client's unique context. This process also frames our modeling choice: using coarse bins (e.g., "Poor," "Good") turns the task into a classification problem, while using more granular bins can be treated as a regression problem.

The modeling approach
Several algorithms can serve as our model, such as SVM, Random Forest, XGBoost, or a Neural Network. For handling the large volume and complexity of ad data, a Neural Network is often the best-suited option.
We can either train multiple models for each metric (e.g., one for CTR, one for ROAS) or opt for a more sophisticated multi-output model architecture with Neural Networks. This design is highly effective, as it allows the model to learn from a single set of shared features to simultaneously predict different, sometimes conflicting, performance metrics.

Model evaluation
To ensure the ad scoring system is reliable, rigorous evaluation is critical. As we have binned our output, we can use standard classification metrics to assess the model's performance for each performance tier:
- Precision and recall: To measure the accuracy of the model's predictions and its ability to identify all ads within a specific tier.
- F1-score: The harmonic mean of Precision and Recall, providing a balanced measure of performance.
Crucially, the model must be evaluated on a held-out test set—a portion of data it has never seen during training. This provides an unbiased estimate of how the system will perform on new, real-world advertisements.
Model interpretability: From scores to actionable insights
While working on an advertisement, knowing what could improve ad performance is as important as knowing its score. To facilitate this, logic based on explainability algorithms like SHAP or LIME can be used. These tools provide feature-level attributions, showing advertisers exactly which elements (e.g., the image, a specific headline word, the color choice) are positively or negatively impacting the predicted score.

Benefits and drawbacks of AI-powered approach
Key benefits
- Pre-launch evaluation: The system allows agencies to evaluate the potential of their ads before publishing them, which can lead to driving up ad performance metrics.
- Cost and effort savings: By identifying potentially poor ads early, the system helps save the cost and effort that would have been spent on underperforming campaigns.
- Data-driven insights at scale: AI models can process vast amounts of historical ad data to identify subtle patterns in good versus poor ads that a human analyst might miss.
- Actionable feedback: The system can go beyond a simple score to explain what could improve ad performance, which is a critical part of the creative process.
Drawbacks and considerations
- Data dependency: The effectiveness of the system is highly dependent on the volume and quality of historical data. Certain models, like Neural Networks, are best suited for an "enormous volume of data" and may not perform well without it.
- Limitation to past patterns: Because the model learns from a firm's historical ads, it may struggle to accurately score truly novel or groundbreaking ad concepts that have no precedent.
- Complexity of feature representation: Accurately representing abstract "contextual factors like socio-political, economic, and cultural factors" is extremely challenging and can limit the model's predictive power.
- Continuous maintenance and retraining: The advertising landscape is constantly evolving. Consumer tastes, design trends, and platform algorithms change over time. Therefore, the model needs to be constantly retrained with new data to prevent its performance from degrading and to ensure its predictions remain relevant.
Impact of an ad scoring system on advertising firms
Implementing such a system has impacts beyond just scoring ads, fundamentally affecting an ad firm's workflow, strategy, and culture.
- Augmenting the creative process: Rather than replacing creativity, the tool serves as a data-driven partner. It allows creative teams to test ideas and understand what could improve an ad before committing to a final design.
- Streamlining resource allocation: The system provides objective data to help teams decide which ads to launch. This reduces the financial risk of investing in campaigns that are likely to fail, saving the firm cost and effort.
- Driving performance and client value: By consistently optimizing ads pre-launch, the firm can improve its key performance metrics. For agencies, this translates to better client results, higher satisfaction, and a stronger competitive position.
- Building institutional knowledge: The system effectively creates a dynamic, evolving database of what works. This institutional knowledge, learned from all the firm's historical ads, can accelerate the onboarding of new team members and standardize best practices.

The future of AI-powered ad scoring system: Ad performance and creative collaboration
An AI-powered ad scoring system can be a transformative tool. By helping advertisement agencies evaluate their ads before publishing, it saves cost and effort while simultaneously driving up their key performance metrics. It provides a data-driven layer of intelligence that complements creative expertise, leading to more effective advertising. As this technology evolves, the next frontier may involve AI not just scoring ads, but generating creative suggestions or even co-creating high-performance ad variants alongside human designers, further blurring the line between data science and creative art.
AI is evolving. So are we. At KeyValue, we’re shaping the next frontier of artificial intelligence. Let’s build the future together.
FAQs
- What is the AI scoring method?
The AI scoring method uses machine learning to evaluate ads by analyzing text, images, layout, and context. It predicts key metrics like CTR or ROAS and, with tools such as SHAP or LIME, explains which creative elements most influence ad performance.
- What is an AI-powered ad scoring system?
An AI-powered ad scoring system uses machine learning to predict how well an advertisement will perform before it goes live. It analyzes creative elements like text, images, layout, and context to generate a performance score and reveal which factors most affect results such as CTR or ROAS.
- What are the benefits of implementing an AI-powered ad scoring system?
It enables pre-launch ad evaluation, cost and time savings, improved ad performance, and actionable insights that support creative and strategic decisions.
- How does an AI scoring system impact an ad agency’s workflow?
It streamlines resource allocation, enhances collaboration between creative and data teams, and builds institutional knowledge of what drives ad success.
- How to measure ad performance?
Ad performance can be measured using metrics like CTR, Conversion Rate, ROAS, and engagement rate.. An AI-powered ad scoring system improves this by analyzing creative and contextual elements to predict and optimize results before launch.