Package redisearch Documentation¶
Overview¶
redisearch-py
is a python search engine library that utilizes the RediSearch Redis Module API.
It is the "official" client of redisearch, and should be regarded as its canonical client implementation.
The source code can be found at http://github.com/RedisLabs/redisearch-py
Example: Using the Python Client¶
from redisearch import Client, TextField, NumericField # Creating a client with a given index name client = Client('myIndex') # Creating the index definition and schema client.create_index([TextField('title', weight=5.0), TextField('body')]) # Indexing a document client.add_document('doc1', title = 'RediSearch', body = 'Redisearch impements a search engine on top of redis') # Simple search res = client.search("search engine") # the result has the total number of results, and a list of documents print res.total # "1" print res.docs[0].title # Searching with snippets res = client.search("search engine", snippet_sizes = {'body': 50}) # Searching with complext parameters: q = Query("search engine").verbatim().no_content().paging(0,5) res = client.search(q)
Example: Using the Auto Completer Client:¶
# Using the auto-completer ac = AutoCompleter('ac') # Adding some terms ac.add_suggestions(Suggestion('foo', 5.0), Suggestion('bar', 1.0)) # Getting suggestions suggs = ac.get_suggestions('goo') # returns nothing suggs = ac.get_suggestions('goo', fuzzy = True) # returns ['foo']
Installing¶
-
Install redis 4.0 RC2 or above
-
Install the python client
$ pip install redisearch
Class AutoCompleter¶
A client to RediSearch's AutoCompleter API
It provides prefix searches with optionally fuzzy matching of prefixes
__init__¶
def __init__(self, key, host='localhost', port=6379, conn=None)
Create a new AutoCompleter client for the given key, and optional host and port
If conn is not None, we employ an already existing redis connection
add_suggestions¶
def add_suggestions(self, *suggestions, **kwargs)
Add suggestion terms to the AutoCompleter engine. Each suggestion has a score and string.
If kwargs['increment'] is true and the terms are already in the server's dictionary, we increment their scores
delete¶
def delete(self, string)
Delete a string from the AutoCompleter index. Returns 1 if the string was found and deleted, 0 otherwise
get_suggestions¶
def get_suggestions(self, prefix, fuzzy=False, num=10, with_scores=False)
Get a list of suggestions from the AutoCompleter, for a given prefix
Parameters:¶
- prefix: the prefix we are searching. Must be valid ascii or utf-8
- fuzzy: If set to true, the prefix search is done in fuzzy mode. NOTE: Running fuzzy searches on short (<3 letters) prefixes can be very slow, and even scan the entire index.
- with_scores: if set to true, we also return the (refactored) score of each suggestion. This is normally not needed, and is NOT the original score inserted into the index
- num: The maximum number of results we return. Note that we might return less. The algorithm trims irrelevant suggestions.
Returns a list of Suggestion objects. If with_scores was False, the score of all suggestions is 1.
len¶
def len(self)
Return the number of entries in the AutoCompleter index
Class Client¶
A client for the RediSearch module. It abstracts the API of the module and lets you just use the engine
__init__¶
def __init__(self, index_name, host='localhost', port=6379, conn=None)
Create a new Client for the given index_name, and optional host and port
If conn is not None, we employ an already existing redis connection
add_document¶
def add_document(self, doc_id, nosave=False, score=1.0, payload=None, replace=False, **fields)
Add a single document to the index.
Parameters¶
- doc_id: the id of the saved document.
- nosave: if set to true, we just index the document, and don't save a copy of it. This means that searches will just return ids.
- score: the document ranking, between 0.0 and 1.0
- payload: optional inner-index payload we can save for fast access in scoring functions
- replace: if True, and the document already is in the index, we perform an update and reindex the document
- fields kwargs dictionary of the document fields to be saved and/or indexed. NOTE: Geo points shoule be encoded as strings of "lon,lat"
batch_indexer¶
def batch_indexer(self, chunk_size=100)
Create a new batch indexer from the client with a given chunk size
create_index¶
def create_index(self, fields, no_term_offsets=False, no_field_flags=False, no_score_indexes=False)
Create the search index. Creating an existing index juts updates its properties
Parameters:¶
- fields: a list of TextField or NumericField objects
- no_term_offsets: If true, we will not save term offsets in the index
- no_field_flags: If true, we will not save field flags that allow searching in specific fields
- no_score_indexes: If true, we will not save optimized top score indexes for single word queries
drop_index¶
def drop_index(self)
Drop the index if it exists
info¶
def info(self)
Get info an stats about the the current index, including the number of documents, memory consumption, etc
load_document¶
def load_document(self, id)
Load a single document by id
search¶
def search(self, query, snippet_sizes=None)
Search the index for a given query, and return a result of documents
Parameters¶
- query: the search query. Either a text for simple queries with default parameters, or a Query object for complex queries. See RediSearch's documentation on query format
- snippet_sizes: A dictionary of {field: snippet_size} used to trim and format the result. e.g.e {'body': 500}
Class BatchIndexer¶
A batch indexer allows you to automatically batch document indexeing in pipelines, flushing it every N documents.
__init__¶
def __init__(self, client, chunk_size=1000)
add_document¶
def add_document(self, doc_id, nosave=False, score=1.0, payload=None, replace=False, **fields)
Add a document to the batch query
commit¶
def commit(self)
Manually commit and flush the batch indexing query
Class Document¶
Represents a single document in a result set
__init__¶
def __init__(self, id, payload=None, **fields)
snippetize¶
def snippetize(self, field, size=500, bold_tokens=())
Create a shortened snippet from the document's content :param size: the szie of the snippet in characters. It might be a bit longer or shorter :param boldTokens: a list of tokens we want to make bold (basically the query terms)
Class GeoField¶
GeoField is used to define a geo-indexing field in a schema defintion
__init__¶
def __init__(self, name)
redis_args¶
def redis_args(self)
Class GeoFilter¶
None
__init__¶
def __init__(self, field, lon, lat, radius, unit='km')
Class NumericField¶
NumericField is used to define a numeric field in a schema defintion
__init__¶
def __init__(self, name)
redis_args¶
def redis_args(self)
Class NumericFilter¶
None
__init__¶
def __init__(self, field, minval, maxval, minExclusive=False, maxExclusive=False)
Class Query¶
Query is used to build complex queries that have more parameters than just the query string. The query string is set in the constructor, and other options have setter functions.
The setter functions return the query object, so they can be chained,
i.e. Query("foo").verbatim().filter(...)
etc.
__init__¶
def __init__(self, query_string)
Create a new query object.
The query string is set in the constructor, and other options have setter functions.
add_filter¶
def add_filter(self, flt)
Add a numeric or geo filter to the query. Currently only one of each filter is supported by the engine
- flt: A NumericFilter or GeoFilter object, used on a corresponding field
get_args¶
def get_args(self)
Format the redis arguments for this query and return them
in_order¶
def in_order(self)
Match only documents where the query terms appear in the same order in the document. i.e. for the query 'hello world', we do not match 'world hello'
limit_fields¶
def limit_fields(self, *fields)
Limit the search to specific TEXT fields only
- fields: A list of strings, case sensitive field names from the defined schema
limit_ids¶
def limit_ids(self, *ids)
Limit the results to a specific set of pre-known document ids of any length
no_content¶
def no_content(self)
Set the query to only return ids and not the document content
no_stopwords¶
def no_stopwords(self)
Prevent the query from being filtered for stopwords. Only useful in very big queries that you are certain contain no stopwords.
paging¶
def paging(self, offset, num)
Set the paging for the query (defaults to 0..10).
- offset: Paging offset for the results. Defaults to 0
- num: How many results do we want
query_string¶
def query_string(self)
Return the query string of this query only
slop¶
def slop(self, slop)
Allow a masimum of N intervening non matched terms between phrase terms (0 means exact phrase)
verbatim¶
def verbatim(self)
Set the query to be verbatim, i.e. use no query expansion or stemming
with_payloads¶
def with_payloads(self)
Ask the engine to return document payloads
Class Result¶
Represents the result of a search query, and has an array of Document objects
__init__¶
def __init__(self, res, hascontent, query_text, duration=0, snippets=None, has_payload=False)
- snippets: An optional dictionary of the form {field: snippet_size} for snippet formatting
Class Suggestion¶
Represents a single suggestion being sent or returned from the auto complete server
__init__¶
def __init__(self, string, score=1.0)
Class TextField¶
TextField is used to define a text field in a schema definition
__init__¶
def __init__(self, name, weight=1.0)
redis_args¶
def redis_args(self)