Archive

Posts Tagged ‘testing’

Simulating browsers using JMeter

December 27th, 2011 Comments off

JMeter is commonly used to stress test webpages by simulating multiple users concurrently visiting a webpage URL. However, for this simulation to be accurate, JMeter needs to be configured correctly so that it behaves like a browser.

In this article, I explain what settings to configure, to make JMeter simulate browser requests fairly accurately.

 

Before configuring JMeter correctly, let’s understand how browsers work:

  • When user enters a webpage URL in browser, it connects to server, starts downloading the page, and starts parsing.
  • As it’s parsing, it’ll encounter embedded URLs like javascript, CSS and image files.
  • A browser then creates more threads, each of which opens a new connection and fetches one of these embedded URLs. Most browsers use a limited number of connections per server (6 in case of firefox at the time of writing) and cap the total number of downloading threads (48 in case of firefox at the time of writing).
  • The page is considered loaded when all these embedded URLs have been fetched.

JMeter can simulate this behaviour if the following 2 settings are configured:

  • Retrieve All Embedded Resources from HTML Files

    image

    This checkbox is found near the bottom of HTTP Request Defaults config elements and HTTP Request samplers.

    Check the checkbox to make JMeter download embedded resources like javascript, CSS and images, just as a browser would.

    Add a View Results in Tree listener element if you want to see which embedded resources are downloaded and their metrics. Note that View Results in Table bytes don’t include the embedded resources.

  • Use concurrent pool. Size=n

    image

    The behaviour of this checkbox and pool size are as follows:

    Retrieve all embedded resources from HTML files Use concurrent pool Behaviour
    Checked Unchecked The main page and its embedded resources will be downloaded in the same thread.

    For example, if Thread group is simulating 3 users, Jmeter creates 3 threads – one for each simulated user – named "Thread Group 1-1" to "Thread Group 1-3".

    Each of these threads will download all embedded resources sequentially in the context of their respective thread.

    If page P has resource A,B and C, Jmeter will download them as follows:
    ~ThreadGroup1-1 : p, A, B, C (downloaded one after another)
    ~ThreadGroup1-2 : p, A, B, C (downloaded one after another)
    ~ThreadGroup1-3 : p, A, B, C (downloaded one after another)

    Checked Checked.
    Pool size=x
    As usual, JMeter creates threads named "Thread Group 1-k" to simulate users.

    In addition, for every one of these threads simulating a user, JMeter creates separate threadpools of size x with thread names like pool-n-thread-m.

    The main page is downloaded by the user’s thread "Thread Group 1-k" while the embedded resources are downloaded by its associated threadpool with thread names like pool-n-thread-m.

             

    So to simulate browsers, check the ‘Use concurrent pool‘ checkbox and specify a reasonable pool size (4-8 seems typical for browsers).

    However, when setting the concurrent pool size, keep in mind the number of users being simulated, because a separate threadpool is created for each of these simulated users. If there are many users, too many threads may get created and start affecting the response times adversely due to bandwidth contention at the JMeter side. If many users are to be simulated, it’s recommended to distribute JMeter testing to multiple machines.

Testing Solr schema, analyzers and tokenization

November 19th, 2010 Comments off

Introduction

Using tests to tune accuracy of search results is very critical. Accuracy of search results depends to a great extent on the analyzers, tokenizers and filters used in the Solr schema.

Testing and refining their behaviour on a standalone Solr server is unproductive and time consuming, involving cycles of deleting documents, stopping server, changing schema, restarting server, and reindexing documents.

It would be desirable if these analyzer tweaks can be tested quickly on small fragments of text to ascertain how they’ll be tokenized and searched, before modifying the solr schema.

The following snippets help you in unit testing and functional testing tokenization behaviour.

The first testing snippet below can be used to test behaviour of combinations of tokenizers, token filters and char filters, and examining their resulting token streams. Such tests would be useful in a unit test suite.

The second snippet can be used for integration tests where a Solr schema, as it would exist in a production server, is loaded and tested.

 

Unit testing Solr tokenizers, token filters and char filters

This java snippet uses Solr core, SolrJ and Lucene classes to run a piece of text through a tokenizer-filter chain and show its output. This code can be easily adapted to become a junit test case with automated results matching.

For Solr 1.4.x:

public static void main(String[] args) {
	try {
		StringReader inputText = new StringReader(args[0]);
 
		TokenizerFactory tkf = new WhitespaceTokenizerFactory();
		Tokenizer tkz = tkf.create(inputText);
 
		LowerCaseFilterFactory lcf = new LowerCaseFilterFactory();
		TokenStream lcts = lcf.create(tkz);
 
		TokenFilterFactory fcf = new SnowballPorterFilterFactory();
		Map params = new HashMap();
		params.put("language", "English");
		fcf.init(params);
		TokenStream ts = fcf.create(lcts);
 
		TermAttribute termAttrib = (TermAttribute) ts.getAttribute(TermAttribute.class);
 
		while (ts.incrementToken()) {
			String term = termAttrib.term();
			System.out.println(term);
		}
	} catch (Exception e) {
		e.printStackTrace();
	}
 
	System.exit(0);
}

 

For Solr 3.3.x:

The code for Solr 3.3.x is slightly different, because some portions of the API have been changed or deprecated:

	public static void main(String[] args) {
		try {
			StringReader inputText = new StringReader("RUNNING runnable");
 
			Map<String, String> tkargs = new HashMap<String, String>();
			tkargs.put("luceneMatchVersion", "LUCENE_33");
 
			TokenizerFactory tkf = new WhitespaceTokenizerFactory();
			tkf.init(tkargs);
			Tokenizer tkz = tkf.create(inputText);
 
			LowerCaseFilterFactory lcf = new LowerCaseFilterFactory();
			lcf.init(tkargs);
			TokenStream lcts = lcf.create(tkz);
 
			TokenFilterFactory fcf = new SnowballPorterFilterFactory();
			Map<String, String> params = new HashMap<String, String>();
			params.put("language", "English");
			fcf.init(params);
			TokenStream ts = fcf.create(lcts);
 
			CharTermAttribute termAttrib = (CharTermAttribute) ts.getAttribute(CharTermAttribute.class);
 
			while (ts.incrementToken()) {
				String term = termAttrib.toString();
				System.out.println(term);
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
 
		System.exit(0);
	}

 

    Functional testing of Solr schema.xml

    For functional tests, it would be more useful if the actual Solr search model itself is tested, instead of testing individual tokenizer chains.

The snippet below shows how the schema.xml can be loaded and then an analysis done on piece of text input and a dummy field, to examine the resulting query and index tokens:

For Solr 1.4.x:

public class SchemaTester {
	public static void main(String[] args) {
		try {
			InputStream solrCfgIs = new FileInputStream(
					"solr/conf/solrconfig.xml");
			SolrConfig solrConfig = new SolrConfig(null, solrCfgIs);
 
			InputStream solrSchemaIs = new FileInputStream(
					"solr/conf/schema.xml");
			IndexSchema solrSchema = new IndexSchema(solrConfig, null,
					solrSchemaIs);
 
			// Dumps all analyzer definitions in schema...
			Map fieldTypes = solrSchema.getFieldTypes();
			for (Iterator<Entry<String, FieldType>> iter = fieldTypes.entrySet().iterator();
				iter.hasNext();) {
 
				Entry entry = iter.next();
				FieldType fldType = entry.getValue();
				Analyzer analyzer = fldType.getAnalyzer();
				System.out.println(entry.getKey() + ":" + analyzer.toString());
 
			}
 
			//String inputText = "HELLO_WORLD d:\filepath\filename.ext wi-fi wi-fi-3500 running TV camelCase test-hyphenated file.txt";
			String inputText = args[0];
 
			// Name of the field type in your schema.xml. ex: "textgen"
			FieldType fieldTypeText = fieldTypes.get("textgen");
 
			System.out.println("Indexing analysis:");
			Analyzer analyzer = fieldTypeText.getAnalyzer();
			TokenStream tokenStream = analyzer.tokenStream("dummyfield",
					new StringReader(inputText));
			TermAttribute termAttr = (TermAttribute) tokenStream.getAttribute(TermAttribute.class);
			while (tokenStream.incrementToken()) {
				System.out.println(termAttr.term());
			}
 
			System.out.println("nnQuerying analysis:");
			Analyzer qryAnalyzer = fieldTypeText.getQueryAnalyzer();
			TokenStream qrytokenStream = qryAnalyzer.tokenStream("dummyfield",
					new StringReader(inputText));
			TermAttribute termAttr2 = (TermAttribute) qrytokenStream.getAttribute(TermAttribute.class);
			while (qrytokenStream.incrementToken()) {
				System.out.println(termAttr2.term());
			}
 
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

 

For Solr 3.3.x:

	public static void main(String[] args) {
		try {
			InputSource solrCfgIs = new InputSource(
					new FileReader("solr/conf/solrconfig.xml"));
			SolrConfig solrConfig = new SolrConfig(null, solrCfgIs);
 
			InputSource solrSchemaIs = new InputSource(
					new FileReader("solr/conf/schema.xml"));
			IndexSchema solrSchema = new IndexSchema(solrConfig, null,
					solrSchemaIs);
 
			Map<String, FieldType> fieldTypes = solrSchema.getFieldTypes();
			for (Iterator<Entry<String, FieldType>> iter = fieldTypes.entrySet().iterator();
				iter.hasNext();) {
 
				Entry<String, FieldType> entry = iter.next();
				FieldType fldType = entry.getValue();
				Analyzer analyzer = fldType.getAnalyzer();
				System.out.println(entry.getKey() + ":" + analyzer.toString());
 
			}
 
			String inputText = "Proof of the pudding lies in its eating";
			FieldType fieldTypeText = fieldTypes.get("text_en");
 
			System.out.println("Indexing analysis:");
			Analyzer analyzer = fieldTypeText.getAnalyzer();
			TokenStream tokenStream = analyzer.tokenStream("dummyfield", 
					new StringReader(inputText));
			CharTermAttribute termAttr = (CharTermAttribute) tokenStream.getAttribute(CharTermAttribute.class);
			while (tokenStream.incrementToken()) {
				System.out.println(termAttr.toString());
			}
 
			System.out.println("nnQuerying analysis:");
			Analyzer qryAnalyzer = fieldTypeText.getQueryAnalyzer();
			TokenStream qrytokenStream = qryAnalyzer.tokenStream("dummyfield", 
					new StringReader(inputText));
			CharTermAttribute termAttr2 = (CharTermAttribute) qrytokenStream.getAttribute(CharTermAttribute.class);
			while (qrytokenStream.incrementToken()) {
				System.out.println(termAttr2.toString());
			}
 
 
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

Dependencies

These snippets require the following jars from Solr package:

  • apache-solr-core-*.jar
  • apache-solr-solrj-*.jar
  • lucene-analyzers-*.jar
  • lucene-core-*.jar
  • lucene-snowball-*.jar
  • lucene-spatial-*.jar (only for v3.3)
  • commons-io-*.jar (only for v3.3)
  • slf4j-api-*.jar
  • slf4j-jdk14-*.jar
Categories: Search Tags: ,