Search as you type using Elasticsearch on multiple fields
Dipan Saha
February 14 , 2019
Category
Elasticsearch
Search as you type using Elasticsearch on multiple fields
I recently had the pleasure to build a service endpoint for search as you type functionality which gives instant feedback to user as they type. The endpoint will be called for each keyword pressed in the front-end application so response needs to be quick and able to handle queries from large volume of records. In this article, I will share my experience to achieve this functionality using Elasticsearch.

Below tables shows the example data and the expected results I would like to achieve.
Example Data:

Expected Result:

Key Components:
- Tokenizer: splits a whole input into tokens. In our case, I will be using built in edge_ngram tokenizer at index time and keyword tokenizer at search time.
- Token Filter: apply some transformation on each token. I will be using built in lowercase and whitespace filter.
- Analyzer: the way the Lucene (search engine, backbone of elastic search) process and indexes the data. Each analyzer composed of one tokenizer and zero or more token filters. In our case, I will be creating a custom analyzer.
Let’s build a demo index, try it on local kabana.
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"whitespace"
]
}
}
In the above we configure edge_ngram tokenizer to treat letters, digits and whitespace as tokens to produce grams with minimum length 1 and maximum length 20.
"analyzer": {
"autocomplete_analyzer": {
"tokenizer": "autocomplete",
"filter": [
"lowercase"
]
},
"autocomplete_search_analyzer":{
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
The autocomplete_analyzer using edge_ngram tokenizer and lowercase filter which indexes the input “Quick Fox” into below terms
[“q”, “qu”, “qui”, “quic”, “quick”, “quick “, “quick f”, “quick fo”, “quick fox”]
The autocomplete_search_analyzer using keyword tokenizer and lowercase filter translates the input “Quick Fo” to match the terms [quick fo]. This way the query will find the relevant documents as you type each key.
Below are the complete settings for the index:
PUT demo_index
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete_analyzer": {
"tokenizer": "autocomplete",
"filter": [
"lowercase"
]
},
"autocomplete_search_analyzer":{
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 30,
"token_chars": [
"letter",
"digit",
"whitespace"
]
}
}
}
}
}
POST demo_index/_analyze
{
"analyzer": "autocomplete_analyzer",
"text": "Quick Fox"
}
GET demo_index/_analyze
{
"analyzer": "autocomplete_search_analyzer",
"text": "Quick fo"
}
Now let’s build the field mapping using the newly created index settings. We will be creating two fields to implement autocomplete features on multiple fields.
PUT demo_index/_doc/_mapping
{
"_doc": {
"properties": {
"title": {
"type": "text",
"fields": {
"complete":{
"type": "text",
"analyzer": "autocomplete_analyzer",
"search_analyzer": "autocomplete_search_analyzer"
}
}
},
"headline": {
"type": "text",
"fields": {
"complete": {
"type": "text",
"analyzer": "autocomplete_analyzer",
"search_analyzer": "autocomplete_search_analyzer"
}
}
}
}
}
}
In the above example,
title: will be using standard analyzer which will produce terms [“quick”, “fix”]
title.complete: will be using autocomplete_analyzer which will produce terms
[“q”, “qu”, “qui”, “quic”, “quick”, “quick “, “quick f”, “quick fo”, “quick fox”]
Let’s insert some records in the demo_index then we can write some queries.
PUT demo_index/_doc/1
{
"title": "Quick Foxes",
"headline": "Quick Foxes headline"
}
PUT demo_index/_doc/2
{
"title": "Quick Fix",
"headline": "Quick Fix headline"
}
PUT demo_index/_doc/3
{
"title": "Apple Pie 34 Quick foxes",
"headline": "Apple Pie new headline"
}
Search:
The query below is used to find matches on multiple fields. It will also add the matched highlight section which can be displayed on the front-end dropdown.
Multi Match Query:
GET demo_index/_search
{
"query": {
"multi_match" : {
"fields" : ["title.complete", "headline.complete"],
"query" : "q",
"type" : "best_fields"
}
},
"highlight": {
"fields": {
"title.complete": {},
"headline.complete": {}
}
}
}
Match Query:
GET demo_index/_search
{
"query": {
"match": {
"title.complete": {
"query": "Quick f"
}
}
},
"highlight": {
"fields": {
"title.complete": {}
}
}
}
If you use .NET client Nest, then you can use the below query DSL to get the search response from multiple fields.
var response = await this.client.SearchAsync (
s => s.Index("demo_index").Type("_doc")
.From(0)
.Size(10)
.Highlight(h => h
.Fields(f =>
f.Field("title.complete"),
f => f.Field("headline.complete")))
.Query(hcq => hcq
.MultiMatch(m => m
.Fields(f => f
.Field("title.complete")
.Field("headline.complete"))
.Query("Quick fo").Operator(Operator.And)
.Type(TextQueryType.BestFields))));
Wrapping Up
Search as you type functionality can be implemented in different ways. It’s comes down your requirements. In my case I wanted to have exact match from the beginning of the word and that’s why I have used keyword analyzer for search. If you are happy to match middle of the word, then can be use standard analyzer for search. Also, if you would like to support punctuation, symbols for searching then you need to add additional filter in token_chars of the autocomplete tokenizer.
It is always best to use source filter if you have lots of fields as you can imagine this endpoint will be called on each keypress.
Happy Coding!
Capture network traffic using NodeJS, BrowserMob Proxy, and Selenium
Dipan Saha
April 21 , 2018
Category
Javascript
Capture network traffic using NodeJS, BrowserMob Proxy, and Selenium
Manually we can monitor performance of a web application by using browser tools like Developers tools or firebug etc. At the moment only phantomjs has out of box feature to capture the network traffic in acceptance tests. In this post I will explain how to capture network traffic requests for cross browser acceptance tests. It is very important to have acceptance tests in all browser when you building web application or building JavaScript library which will be consume by different application for example dealing with 3rd party tracking library such as Facebook, Google, AppNexus etc.
We want to make it work for all browsers. BrowserMob proxy helps us to capture client side performance data for a web application using selenium web driver automated test.
So I will share my experience and guide you through each steps to capture network traffics in HAR format using NodeJS, BrowserMob Proxy and Selenium Web Driver.

Technology & Environment:
- OS: Windows
- IDE: Visual Studio Code
- Proxy: BrowserMob Proxy
- Selenium Web driver (4.0.0-alpha.1)
- Test: Mocha, Chai
- Server: Express
- Browser Test: Chrome, Firefox, IE, Headless – Chrome, Headless – Firefox
Let’s jump into the code !!
For reference, you can find the full example code for this walk-through on my Github.
#1 Firstly, create n empty node project.
npm init
#2 Create a html file called index.html under the html folder.
Then add two images one with secure and other with non secure source.
#3 Then we setup express server to browse page for acceptance tests. Follow below steps.
npm i express -D
Create a server.js file in the root of your application.
const express = require('express');
const app = express();
app.use(express.static('html'))
app.listen(3000, () => console.log('Listening on port 3000')
Create a task called start in the package.json to start the server.
"scripts": {
"start": "node server.js",
},
#4 Now we will setup our test environment. I will be using mocha and chai for our test.
npm i mocha -D
npm i chai -D
Add a task called `test` in the package.json.
"scripts": {
"start": "node server.js",
"test": "mocha --timeout 300000"
},
#5 Setup and run BrowserMob Proxy server. You can download it here and read the documentation for the setup process. You can also see my GitHub link for this demo at the end of this post. I have included the BrowserMob Proxy files including a PowerShell script which will automate the process of running the server and importing the required certificate to the trusted store. An SSL certificate is required in order to capture https request.
#6 Let’s start writing our test. We need to install the below npm packages.
npm i browsermob-proxy-client -D
npm i selenium-webdriver -D
npm i chromedriver -D
npm i geckodriver -D
npm i iedriver -D
Create a test helper class called testHelper.js to write a few utility functions to avoid duplication. It will make more sense once you jump into code to write tests.
class TestHelper {
static get HTTPS_SRC(){
return "https://images.pexels.com/photos/248797/pexels-photo-248797.jpeg?auto=compress&cs=tinysrgb&h=350";
}
static get HTTP_SRC(){
return "http://www.abc.net.au/news/image/9209116-3x2-940x627.jpg";
}
static get TEST_PAGE_URL(){
return "http://localhost:3000";
}
static getRequestUrls(requestEntries){
var urls = [];
requestEntries.forEach(obj => {
console.log('request: ', obj.request.url);
urls.push(obj.request.url);
});
return urls;
}
static getManualProxy(port){
return { http: 'localhost:' + port, https: 'localhost:' + port };
}
static getCapabilities(browserName){
return { 'browserName': browserName, acceptSslCerts: true, acceptInsecureCerts: true }
}
}
module.exports = TestHelper
Create a test for each browser. Test case: Load a local test page in a specific browser and capture all network traffic.
We required the below packages to load for each browser test. I have used BrowserMob Proxy client library to interact with BrowserMob Proxy’s Rest API.
const chai = require('chai');
const expect = chai.expect;
const BrowserMob = require('browsermob-proxy-client');
const webdriver = require('selenium-webdriver');
const selProxy = require('selenium-webdriver/proxy');
The below code snippet is the common test structure for each browser test.
defaultProxy.start() – starts on the default port but you can specify the port which is optional.
defaultProxy.end() – shut down the proxy server.
defaultProxy.closeProxies() – closes all running proxies.
So before starting each test we start the BrowserMob Proxy server and shutdown after each test.
var defaultProxy;
describe('Browser - Chrome', function () {
beforeEach(async () => {
defaultProxy = BrowserMob.createClient();
await defaultProxy.start();
});
afterEach(async () => {
await defaultProxy.closeProxies();
await defaultProxy.end();
});
it("Chrome", async () => {
});
});
Browser – Chrome:
We need to load chromedriver to open Google Chrome to load our test page.
const chrome = require('selenium-webdriver/chrome');
const chromedriver = require('chromedriver');
The below code will load our test page in Google Chrome and collect network traffic requests in an HAR format.
defaultProxy.createHar() – creates a har file with a default page name. You can specify the name to whatever you like.
defaultProxy.getHar() – returns a har in json format.
The above methods returns promises so we can use async / await to utilise asynchronous behaviour.
After creating the HAR file we need to configure the driver and set the proxy with BrowserMob Proxy port. It is important that you set the capabilities for each driver correctly to load http and https resources.
it("Chrome", async () => {
await defaultProxy.createHar();
let driver = new webdriver.Builder()
.withCapabilities(TestHelper.getCapabilities('chrome'))
.setProxy(selProxy.manual(TestHelper.getManualProxy(defaultProxy.proxy.port)))
.build();
await driver.get(TestHelper.TEST_PAGE_URL);
const har = await defaultProxy.getHar();
var urls = TestHelper.getRequestUrls(har.log.entries);
await driver.close();
expect(urls.includes(TestHelper.HTTPS_SRC)).to.be.eql(true);
expect(urls.includes(TestHelper.HTTP_SRC)).to.be.eql(true);
});
Below I will add the code for other browser’s which has almost the same setup.
Browser – Firefox
Driver:
Required to open Firefox.
const firefox = require('selenium-webdriver/firefox');
const firefoxdriver = require('geckodriver');
Test:
Test will open Firefox and loads the local test page. After loading the page, we will collect HAR and verify if all anticipated requests exists in the HAR response.
it("Firefox", async () => {
await defaultProxy.createHar();
let driver = new webdriver.Builder()
.withCapabilities(TestHelper.getCapabilities('firefox'))
.setProxy(selProxy.manual(TestHelper.getManualProxy(defaultProxy.proxy.port)))
.build();
await driver.get(TestHelper.TEST_PAGE_URL);
const har = await defaultProxy.getHar();
var urls = TestHelper.getRequestUrls(har.log.entries);
await driver.close();
expect(urls.includes(TestHelper.HTTPS_SRC)).to.be.eql(true);
expect(urls.includes(TestHelper.HTTP_SRC)).to.be.eql(true);
});
Browser – IE
Driver:
Required to open IE.
const firefox = require('selenium-webdriver/ie');
const firefoxdriver = require('iedriver');
Test:
it("Internet Explorer", async () => {
await defaultProxy.createHar();
let driver = new webdriver.Builder()
.forBrowser('internet explorer')
.setProxy(selProxy.manual(TestHelper.getManualProxy(defaultProxy.proxy.port)))
.build();
await driver.get(TestHelper.TEST_PAGE_URL);
const har = await defaultProxy.getHar();
var urls = TestHelper.getRequestUrls(har.log.entries);
await driver.close();
expect(urls.includes(TestHelper.HTTPS_SRC)).to.be.eql(true);
expect(urls.includes(TestHelper.HTTP_SRC)).to.be.eql(true);
});
Browser – Headless
Driver:
Required to configure current Chrome and Firefox for headless.
const firefox = require('selenium-webdriver/firefox');
const chrome = require('selenium-webdriver/chrome');
We are going to do a headless test for Chrome and Firefox. So I have written a common function to use the test for Chrome and Firefox. For this test we need to configure headless behaviour for Chrome and Firefox.
setChromeOptions() – set the Chrome headless options
setFirefoxOptions() – set the Firefox headless options
async function SetupHeadlessTest(browserName) {
await defaultProxy.createHar();
let driver = new webdriver.Builder()
.withCapabilities(TestHelper.getCapabilities(browserName))
.setChromeOptions(new chrome.Options().headless().windowSize(screen))
.setFirefoxOptions(new firefox.Options().headless().windowSize(screen))
.setProxy(selProxy.manual(TestHelper.getManualProxy(defaultProxy.proxy.port)))
.build();
await driver.get(TestHelper.TEST_PAGE_URL);
const har = await defaultProxy.getHar();
var urls = TestHelper.getRequestUrls(har.log.entries);
await driver.close();
return urls;
};
Test:
Headless – Chrome:
it("Headless - Chrome", async () => {
var urls = await SetupHeadlessTest('chrome');
expect(urls.includes(TestHelper.HTTPS_SRC)).to.be.eql(true);
expect(urls.includes(TestHelper.HTTP_SRC)).to.be.eql(true);
});
Test:
Headless – Firefox:
it("Headless - Firefox", async () => {
var urls = await SetupHeadlessTest('firefox');
expect(urls.includes(TestHelper.HTTPS_SRC)).to.be.eql(true);
expect(urls.includes(TestHelper.HTTP_SRC)).to.be.eql(true);
});
Now it’s time to run our test…..
- Run BrowserMob Proxy Server
- Run server `npm run start`
- Run test `npm run test`
If you are successful, you will see the results like below:

Wrapping Up
It is important we setup capabilities for each browser correctly in order to capture https requests. Recent selenium web driver doesn’t support phantomjs however they introduced support for headless Chrome and Firefox. To make it work in IE make sure you import the SSL certificate. For your convenience I have added an automation script that does it for you. For IE you are also required to change the security settings. Go to Internet Explorer -> Internet Options -> Security and check Enable Protection Mode checkbox for all zones.
This code will be available at my GitHub: https://github.com/dipansaha/BrowserMobNodeSelenium.
Thanks for reading this article 🙂
Read WordPress Blog Using Rss feed and C#
Dipan Saha
April 05 , 2016
Category
ASP.NET,C#
Read WordPress Blog Using Rss feed and C#
This post is about to read the wordpress blog using Rss feed and C#. This solution will be effective when we will not able to work with wordpress rest API. I will be using .net framework built in SyndicationFeed to read blog items from wordpress. You need to reference System.ServiceModel in your project. I will create a helper method when you can pass the feed url and will return list of blogs. Below I will go through an example to implement this approach.
using System;
using System.Collections.Generic;
using System.ServiceModel.Syndication;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace RssReadService.Util
{
public static class RssHelper
{
public static IEnumerable<BlogItem> ReadFeed(string url)
{
var reader = XmlReader.Create(url);
var feed = SyndicationFeed.Load<SyndicationFeed>(reader);
var result = new List<BlogItem>();
if (feed == null) return result;
foreach (var item in feed.Items)
{
var blogItem = ParseFeedItem(item);
if (blogItem != null)
result.Add(blogItem);
}
return result;
}
private static BlogItem ParseFeedItem(SyndicationItem item)
{
var title = item.Title.Text;
var summary = item.Summary.Text;
var publishedDate = item.PublishDate.Date;
var content = new StringBuilder();
foreach (var extension in item.ElementExtensions)
{
var ele = extension.GetObject<XElement>();
if (ele.Name.LocalName == "encoded" && ele.Name.Namespace.ToString().Contains("content"))
{
content.Append(ele.Value + "<br/>");
}
}
return new BlogItem(title,summary,publishedDate, content.ToString());
}
}
public class BlogItem
{
public string Title { get; private set; }
public string Summary { get; private set; }
public DateTime PublishDate { get; private set; }
public string Content { get; private set; }
public BlogItem(string title, string summary, DateTime publishDate, string content)
{
Title = title;
Summary = summary;
PublishDate = publishDate;
Content = content;
}
}
}
Then use the below function to get all blog post
var blogs = RssHelper.ReadFeed("http://dipansaha.wordpress.com/feed");
If you like this post then please don’t forget to give your rating.
Read wordpress blog using REST API and C#
Dipan Saha
September 20 , 2013
Category
ASP.NET,C#
Read wordpress blog using REST API and C#
This post is about to read the wordpress blog using Rest API and C#. Earlier I have used RSS feed to read the wordpress blog in web application but there was few limitations for example : can not access full contents, categories, tags, comments. But later I have found a good way to retrieve all wordpress blog posts including full contents and all options using REST API. Below I will go through an example to implement this approach.
First I will create a class name called BlogPost to hold all blogs properties which I am required to display in website. For this example I have just created few fields but you can create as many fields you want. You can have a look all properties at this link http://public-api.wordpress.com/rest/v1/sites/{your_blog_url}/posts .
your_blog_url = dipansaha.wordpress.com ( Make sure you don’t add http and www)
using System;
using System.Collections.Generic;
using System.Linq;
using Newtonsoft.Json;
public class BlogPost
{
[JsonProperty("ID")]
public int Id { get; set; }
[JsonProperty("date")]
public DateTime DateCreated { get; set; }
[JsonProperty("title")]
public string Title { get; set; }
[JsonProperty("URL")]
public string Url { get; set; }
[JsonProperty("content")]
public string Content { get; set; }
[JsonProperty("excerpt")]
public string Excerpt { get; set; }
[JsonProperty("categories")]
public Dictionary Categories { get; set; }
public List CategoryNames
{
get
{
return Categories.Select(x => x.Key).ToList();
}
}
}
Then use the below function to get all blog post
using System.Net.Http;
using System.Web.Script.Serialization;
using Newtonsoft.Json.Linq;
public List GetBlogs()
{
const string baseurl = "http://public-api.wordpress.com/rest/v1/sites/dipansaha.wordpress.com/posts";
var client = new HttpClient();
var jsonData = client.GetStringAsync(baseurl).Result;
JToken token = JObject.Parse(jsonData);
var postCount = (int)token.SelectToken("found");
var postArray = token.SelectToken("posts");
var sr = new JavaScriptSerializer();
var posts = sr.Deserialize>(postArray.ToString());
return posts;
}
If you like this post then please don’t forget to give your rating.
Find index from list using LINQ
Dipan Saha
March 26 , 2013
Category
C#,LINQ
Find index from list using LINQ
This post will explain how to find an index of list using LINQ.
//create product class
public class Product
{
public int ProductId { get; set; }
public string Name { get; set; }
public double Price { get; set; }
}
//get list of product
public List GetProductLists()
{
return new List
{
new Product {ProductId = 1, Name = "Product A", Price = 25.99},
new Product {ProductId = 2, Name = "Product B", Price = 55.99},
new Product {ProductId = 3, Name = "Product C", Price = 35.99},
new Product {ProductId = 4, Name = "Product D", Price = 75.99},
new Product {ProductId = 5, Name = "Product E", Price = 95.99},
};
}
protected void Page_Load(object sender, EventArgs e)
{
var products = GetProductLists();
var listOfProductsWithIndex = products.Select((product, index) => new { product, index });
var listOfIndex = products.Select((product, index) => new { product, index })
.Select(z => z.index);
var listOfIndexGreterThan50 = products.Select((product, index) => new { product, index })
.Where(z => z.product.Price > 50)
.Select(z => z.index);
}
Get the first line from the Multiline content
Dipan Saha
January 08 , 2013
Category
ASP.NET,C#
Get the first line from the Multiline content
I found the following code snippent is very useful to retrive the first line from the text content. I hope this will help to others. Just want to point out this snippet will work if content is multiline. This is a basic code.
using System;
using System.Text.RegularExpressions;
protected void Page_Load(object sender, EventArgs e)
{
var content = @"This is a demo test.
This is another test.";
var firstline = GetFirstlineFromText(content);
}
public string GetFirstlineFromText(string text)
{
string content = "";
try
{
if (!String.IsNullOrWhiteSpace(text))
{
var m = Regex.Match(text, "^(.*)", RegexOptions.Multiline);
if (m.Success)
content = m.Groups[0].Value;
}
}
catch (Exception)
{
return content;
}
return content;
}
Output will be :
This is a demo test.