string matching method in Shanghai in search of love: "I love playing pet Lianliankan" love in Shanghai ranked first in the long tail word title and search match, in the website of the case, to display the title, ". This article in the title of the long tail is very important in the ranking. In the second page "I love Shanghai love pets Lianliankan" view love Shanghai snapshot, obviously the long tail word has been divided into "I love, play, pet Lianliankan" and has been divided into the post: "I love playing, pet Lianliankan", this matching method is the less segmentation the way.
two, understand the word segmentation method;
most commonly used search engine:
Chinese word: "in accordance with different length matching, can be divided into the largest (longest), and minimum (the shortest),"; the long tail word in the distance is determined the factors ranking. Such as: "I love playing pet Lianliankan" when love Shanghai thirteenth page has been the word "I, love, play, pet, again and again, see"
There are three kinds of several segmentation methods
love Shanghai for a word after word segmentation, but also remove the meaningless words in the sentence.
match with the word will be higher than the separate words.
statistic method: more adjacent words appear at the same time, it may appear to Chinese participle adjacent word as a word you. For example, in Shanghai love to enter a character in "network" and the "site" love Shanghai also marked red, so you can see that the "net" and "station" the number of the two adjacent characters appear very much, has the statistical segmentation "website" into the lexicon.
according to their observation of love now most of Shanghai is the use of the positive match.
In this paper, by 贵族宝贝idaus贵族宝贝/blog-6->
technology Chinese search engine for the user to submit the query keywords, the search engine word according to certain specifications, with a long tail word segmentation Chinese will split into several parts, the main contents and general remarks, so that users can more quickly find the desired content.
three, the statistic method.
, a string matching; (word string matching is generally 3 kinds: 1. maximum matching method; 2. reverse maximum matching method; at least 3. segmentation)
understand the word segmentation method: when a Chinese contains less than 3 character words of love will directly to the Shanghai word database indexing vocabulary input string; and when the length of the string > 4 Chinese character, love Shanghai in the word segmentation will be divided into a number of characters. Such as: "love Shanghai in search of electric vehicles".
for Chinese love Shanghai word understanding: