top of page

Generating keywords by leveraging BERT embeddings

When we want to understand the most valuable information and search topics from a website, we often utilise keyword extraction. Keyword extraction is the process of extracting words and/or phrases that are highly relevant to a website. 

Traditionally in SEO, this process has been largely manual and very time consuming. At Cubed, we wanted to reduce the amount of heavy lifting our search team was having to do when working with a client to generate or enrich a keyword set. 

 

Elevating the traditional approach 

There are plentiful methods to extract keywords from a body of text, including traditional, tried, and tested methods like TF-IDF, or more novel approaches such as Rake and YAKE!. While effective to a degree, these models typically rely on statistical properties of the text in question, which fails to capture the semantic relationships between words and phrases on a page.  

Welcome BERT (Bidirectional Encoder Representations from Transformers) embeddings. Unlike the models we mentioned above, BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.  

In layman’s terms, this means it can capture semantic relationships between words in a body of text - as mentioned earlier. This makes BERT incredibly powerful compared to traditional approaches when we want to extract longtail keywords accurately and quickly with minimal manual effort. 

Harnessing BERT has been incredibly simple thanks to Python’s extensive data science ecosystem; we can tap into the great work people have contributed across the globe and use, for example, KeyBERT; a pre-trained keyword extraction tool that leverages BERT embeddings. 

 

A highly effective solution 

Cubed has already customised and employed this novel keyword generation tool across a multitude of clients and industries: VisitScotland, Three and Epson, to name but a few.  

On a case-by-case basis, we can tailor the tool to the client’s exact needs by incorporating custom blacklisted keywords and/or phrases, upper-bounding of phrase length, and even parsing multiple languages! These fine tunings allow Cubed to generate a keyword pool that is precise and relevant to not only a single page or website section, but to our clients’ holistic goals. 

Businesses are facing an endless wave of artificial-intelligence tools, many of which prove to be no better than “traditional” and arguably simpler approaches.  

However – as I hope this brief introduction evidences - Cubed takes a far more pragmatic approach when considering the deployment of AI tools. We take the best of what AI can offer and use it to our advantage, combining it with our in-house expertise and tailoring to our clients’ business priorities. 

We do the work of parsing the myriad of AI options so that our wider team – and in turn our clients – can enjoy the benefits of automation that works most effectively and efficiently for you. 

14 views0 comments

Comments


bottom of page