Combining Semantic Search with Elastic Search to improve search efficiency for Accessibility & Inclusion using Intel’s Quantization and Neural-Chat-7b

Usha Rengaraju
6 min readApr 26, 2024

--

Traditional search engines rely on keyword matching, often missing the user’s intent. This example explores how semantic search, combined with Elasticsearch, enhances search accuracy. Semantic search understands the meaning behind words, leading to more relevant results. This approach not only improves overall search efficiency but also promotes accessibility and inclusion. For users with disabilities, like visual impairments, semantic search can interpret the intent behind a voice search query and deliver optimized results for assistive technologies. In essence, this approach makes information retrieval more intuitive and effective for everyone.

Elastic Search

Imagine a search engine that dives deeper than keywords, understanding the essence of your query. Elasticsearch is that very tool, a powerful force for data exploration. It sifts through massive datasets with lightning speed, like sunlight through leaves, revealing the precise information you seek. Its flexibility embraces any kind of content, making it a haven for websites, applications, and even security logs. This open-source marvel empowers discovery, working alongside visualization tools to paint a clear picture of your data. Whether you’re a developer crafting a seamless search experience or an analyst unearthing hidden patterns, Elasticsearch becomes the key to unlock a world of knowledge.

But Elastic Search is a keyword-based search, and it has some disadvantages.

  1. Misinterpreting Intent: Keyword-based search often struggles to understand the user’s true meaning behind the keywords. Synonyms, slang, or different phrasings for the same concept can lead to irrelevant results.
  2. Focuses on Words, Not Meaning: Keyword searches prioritize documents with the exact keywords, even if they’re not the most relevant. This can miss content that uses different words but conveys the same information.
  3. Susceptible to Manipulation: Keyword stuffing (adding irrelevant keywords to a webpage) can trick keyword-based search engines, leading to low-quality content ranking higher.
  4. Limited Understanding of Context: Keyword searches don’t consider the context of the query. For example, searching for “bass” could return results about fish or music depending on the context, which keyword searches can’t differentiate.
  5. Struggles with Complex Queries: Keyword searches struggle with nuanced or multi-faceted queries. For example, a user searching for “best laptops for students” might miss relevant results that don’t contain the exact phrase.

Semantic Search

Gone are the days of clunky keyword searches that leave you sifting through irrelevant results. Semantic search emerges as a game-changer, delving deeper than surface-level keywords to understand the true intent behind your query. Imagine a search engine that grasps the nuances of language, recognizing synonyms, and factoring in context. With semantic search, you can search for “healthy meal ideas” and discover recipes packed with vegetables, even if they don’t explicitly mention “healthy” in the title. This intelligent approach unlocks a new level of information retrieval, making search experiences more intuitive and delivering results that truly resonate with your needs.

And if we combine both of these, we can get best of both worlds

Here are 5 advantages of combining keyword-based search and semantic search:

  1. Improved Accuracy: Leveraging both approaches strengthens search accuracy. Keywords provide a foundation for the search, while semantic analysis refines results by understanding intent and context. This reduces irrelevant results and surfaces the most relevant information.
  2. Enhanced User Experience: The combination caters to users who might not express their needs perfectly. Keywords offer a familiar starting point, while semantic search refines the experience by understanding the user’s true meaning behind the keywords. This leads to a more intuitive and satisfying search journey.
  3. Flexibility and Control: The hybrid approach offers flexibility. Keywords provide a clear starting point, while semantic search allows users to explore related concepts and synonyms. This empowers users to refine their search and uncover a broader range of relevant information.
  4. Better Handling of Ambiguity: Natural language can be ambiguous. Combining keywords with semantic analysis helps address this. Keywords provide a base for interpretation, while semantic search considers synonyms, context, and user intent to deliver the most relevant results even for ambiguous queries.
  5. Effective Targeting: For businesses and content creators, this combination is powerful. Keywords ensure content is discoverable for specific searches, while semantic search broadens reach by surfacing content relevant to the user’s intent even if they don’t use the exact keywords.

Let’s start with the code!

!pip install datasets pandas chromadb sentence_transformers langchain unstructured -q
!pip install -U intel-extension-for-pytorch==2.2 transformers==4.35.2 torch==2.2.0 -q
!pip install accelerate bitsandbytes -q
!pip install elasticsearch==7.13.0 -q

Elastic Search has it’s own library which we can easily invoke, and we would leverage Langchain.

The data for this demo we are going to use is about pets. This includes Health care, Nutrition's, Emotional bondage, Training, and Behavior. You can access the website here — Link

Let’s setup Elastic Search now.

wget -q https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-oss-7.9.2-linux-x86_64.tar.gz
wget -q https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-oss-7.9.2-linux-x86_64.tar.gz.sha512
tar -xzf elasticsearch-oss-7.9.2-linux-x86_64.tar.gz
sudo chown -R daemon:daemon elasticsearch-7.9.2/
shasum -a 512 -c elasticsearch-oss-7.9.2-linux-x86_64.tar.gz.sha512

To start the engine, we would need to hit the following command

sudo -H -u daemon elasticsearch-7.9.2/bin/elasticsearch

Wait for 20–25 secs after this for Elastic Search to start. The data is stored in the pets folder. We will setup the Engine with the data

from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from elasticsearch import Elasticsearch, helpers

directory = 'pets'

def load_docs(directory):
loader = DirectoryLoader(directory)
documents = loader.load()
return documents

def split_docs(documents,chunk_size=1000,chunk_overlap=20):
text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
docs = text_splitter.split_documents(documents)
return docs

documents = load_docs(directory)
docs = split_docs(documents)

all_docs = []
for id,doc in enumerate(docs):
all_docs.append({"_index":'test',"_id":id,"_source": {"text": doc.page_content}})
es = Elasticsearch("http://localhost:9200")
helpers.bulk(es, all_docs)

To hit for a query, we will have to do the following

query = "benefits of pet"

resp = es.search(index='test', size=100, body={
"query": {
"match": {
"text":query
}
}
})

res = [hit['_source']['text'] for hit in resp['hits']['hits']]

The “text” variable is the input query and the res will have the ouput. Now, we will use the results which we got in res and create a vector database out of it on which we would be doing the Semantic Search

from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma

embeddings = SentenceTransformerEmbeddings(model_name="sentence-transformers/sentence-t5-large")
db = Chroma.from_texts(res, embeddings)

We will use sentence-t5-large model for create the embeddings. It has 335M parameters and give output embeddings of size 768. Corresponding, Chroma is used to store the vector Embeddings.

Intel Neural Chat

Intel Neural Chat is a large language model fine-tuned on the Intel Gaudi 2 processor. It was aligned using the Direct Performance Optimization (DPO) method with Intel/orca_dpo_pairs. Originally, the model was fine-tuned from mistralai/Mistral-7B-v-0.1. We will use this to complete the Pipeline.

We are going to compress this model using Intel’s Quantization

import torch
import intel_extension_for_pytorch as ipex
from transformers import AutoTokenizer, AutoModelForCausalLM

Model = 'Intel/neural-chat-7b-v3-3'

model = AutoModelForCausalLM.from_pretrained(Model)
tokenizer = AutoTokenizer.from_pretrained(Model)

1. Setting Up Quantization:

This part focuses on configuring the quantization recipe for Weight-Only Quantization (WOQ). You can choose the desired data type for the weights in memory. There are two options:

  • torch.quint4x2: This represents 4-bit signed integer weights packed in pairs.
  • torch.qint8: This represents 8-bit signed integer weights.

2. Deactivating Low-Precision Computation (Optional):

We can define the dequantization precision using the lowp_model argument. However, for now, we'll leave it as ipex.quantization.WoqLowpMode.NONE to maintain the default computation precision of bf16 (Brain Floating Point 16-bit).

3. Applying WOQ and Freeing Memory:

Finally, we use the ipex.llm.optimize() function to apply WOQ to the model. After that, we use del model to delete the original model from memory, freeing up valuable RAM space.

qconfig = ipex.quantization.get_weight_only_quant_qconfig_mapping(
weight_dtype=torch.quint4x2,
lowp_mode=ipex.quantization.WoqLowpMode.NONE,
)

model_ipex = ipex.llm.optimize(model, quantization_config=qconfig, low_precision_checkpoint=checkpoint)

del model

Now we will complete the pipeline. Thus, we will get the output in the form in the user queries.

from transformers import pipeline
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline

pipe = pipeline("text-generation", model=model_ipex, tokenizer=tokenizer, max_new_tokens=10)
hf = HuggingFacePipeline(pipeline=pipe)

This completes the pipeline. Let’s invoke it

from langchain.chains.question_answering import load_qa_chain

chain = load_qa_chain(hf, chain_type="stuff",verbose=False)
matching_docs = db.similarity_search(query)
answer = chain.run(input_documents=matching_docs, question=query)

Now, this system can easily be integrated with voice-based systems for querying.

Here’s a demo hosted on Huggingface:

https://huggingface.co/spaces/tensorgirl/Reranking

References:

  1. https://cohere.com/blog/rerank
  2. https://towardsdatascience.com/meta-llama-3-optimized-cpu-inference-with-hugging-face-and-pytorch-9dde2926be5c
  3. https://huggingface.co/Intel/neural-chat-7b-v3-3
  4. https://www.elastic.co/

--

--

Usha Rengaraju

Chief of Research at Exa Protocol | Autism Advocate | Corporate Trainer | Adjunct Faculty