ÃÑ ÆäÀÌÁö ¼ö : 3224
![]()
|
Facebook Joinc ±×·ì
Joinc QA »çÀÌÆ®
![]()
Tweet
joinc´Â Firefox¿Í chrome¿¡¼ Å×½ºÆ® Çß½À´Ï´Ù. IE¿¡¼´Â Å×À̺íÀÌ ±úÁö°Å³ª À̹ÌÁö°¡ º¸ÀÌÁö ¾ÊÀ» ¼ö ÀÖ½À´Ï´Ù. ƯÈ÷ ±¸±Û DocsÀ̹ÌÁöÀÇ °æ¿ì ¿¢¹Úó¸®µÉ ¼ö ÀÖ½À´Ï´Ù. 1 TodoList
2 ¼Ò°³
ÀÌ ¹®¼´Â ¿Ï¼º´Ü°èÀÇ ¹®¼°¡ ¾Æ´Ï´Ù. lucene ±¸¹®ºÐ¼®°ú lucene searcherÀÇ ºÐ¼®À» À§ÇÑ ¸Þ¸ðÀå Çü½ÄÀÇ ¹®¼´Ù. ¾ðÁ¨°¡´Â Á¤¸®µÈ ¹®¼°¡ µÇ°ÚÁö¸¸ Áö±ÝÀº ¾Æ´Ï´Ù. Á¤¸®µÇ±â Àü±îÁö´Â Àб⠽±Áö ¾ÊÀ» °ÍÀÌ´Ù. 3 ±¸¹®ºÐ¼®
Search ´Â »ç¿ëÀÚÀÇ QueryString¸¦ ºÐ¼®Çϴµ¥¿¡¼ ºÎÅÍ ½ÃÀÛÇÑ´Ù. ±×·¯¹Ç·Î ¿ì¼± Lucene¿Í NutchÀÇ ±¸¹®ºÐ¼®¿¡ ´ëÇØ¼ ¾Ë¾Æº¸µµ·Ï ÇϰڴÙ. 3.1 Nutch ±¸¹®ºÐ¼®
±¸¹®ºÐ¼®Àº lucene¿¡¼ Áö¿øÇϰí ÀÖÀ¸¸ç, Nutch´Â °¡Àå ´Ü¼øÇÑ ÇüÅÂÀÇ (°ÅÀÇ Å×½ºÆ®¿ë) ±¸¹®ºÐ¼®±â¸¸ Áö¿øÇϰí ÀÖÀ» »ÓÀ¸·Î, °Ë»ö½Ã½ºÅÛ ¿î¿ëÀ» À§Çؼ´Â lucene ±¸¹®ºÐ¼® ¿£ÁøÀ» »ç¿ëÇÒ Çʿ䰡 ÀÖ´Ù.
´ÙÀ½Àº Nutch¿¡¼ Áö¿øÇÏ´Â ±¸¹®ºÐ¼®ÀÌ´Ù.
ÇѸ¶µð·Î ¸»ÇÏÀÚ¸é ¿£ÁøÀÚü°¡ ¾øÀ¸¹Ç·Î º¼Çʿ䰡 ¾ø´Ù. 3.2 Lucene Query ±¸¹®ºÐ¼® ¿£Áø3.2.1 ÀڷᱸÁ¶
C ½ºÅ¸ÀÏ·Î Á¤¸®ÇØ º¸¾Ò´Ù. Lucene.QueryParser¿¡ Á÷Á¢ Äõ¸®¸¦ ¸¸µé¾î¼ µð¹ö±ë ÇÏ´Â°Ô ÀڷᱸÁ¶¸¦ È®ÀÎÇÏ´Â °¡Àå È®½ÇÇÑ ¹æ¹ý°°´Ù. org.apache.lucene.queryParser¿¡ ÁغñµÈ mainÇÔ¼ö·Î ÀڷᱸÁ¶¸¦ È®ÀÎÇß´Ù. struct Query
{
float boost;
struct clause clauses;
};
struct clauses
{
struct elementList;
};
struct elementList
{
float boost; // default boost
struct clauses; // Grouping Query
vector<struct Element>;
};
struct Element
{
int Type{SHOULD, MUST, MUSTNOT};
int query{Wildcardquery, Temquery, RangeQuery}
flost boost;
vector<{field, text}> Terms;
struct elementList; // PharaseQuery
};
Æ®¸®·Î Ç¥ÇöÇØº¸¸é ´ÙÀ½°ú °°Àº ±¸Á¶¸¦ °¡Áø´Ù. Query -+--- boost
|
+---- clauses ---+--- elementList ---+-- Element1 --+-- TYPE
| |
| +-- boost
| |
| +--- Term1 {field, Term}
|
+-- Element2 --+-- TYPE
|
+-- boost
|
+--- Term2 {field, Term}
Query -+--- boost
|
+---- clauses ---+--- elementList ---+-- Element1 --+-- TYPE {MUST} {Termquery}
| |
| +-- boost {1.0}
| |
| +--- Term1 {"field", "tcl"}
|
+-- Element2 --+-- TYPE {MUSTNOT} {Whldcardquery}
|
+-- boost {1.0}
|
+--- Term2 {"field", "ap*che"}
±×·ì °Ë»öÀ» ÇÒ°æ¿ì elementList.causes¸¦ È®Àå ½ÃŰ¸é µÈ´Ù.
Query -+--- boost
|
+---- clauses ---+--- elementList ---+-- Element1 --+-- TYPE {MUST} {Termquery}
| |
| +-- boost {1.0}
| |
| +--- Term1 {"field", "tcl"}
|
+-- clauses --+
|
+------------------------------------------------------+
|
+-- elementList --+--- Element1 --+-- TYPE{SHOULD} {Termquery}
| |
| +-- boost {1.0}
| |
| +-- Term1 {"field", "linux"}
|
+--- Element1 --+-- TYPE{SHOULD} {Wildcardquery}
|
+-- boost {1.0}
|
+-- Term1 {"field", "linux"}
Query -+--- boost
|
+---- clauses ---+--- elementList ---+-- Element1 --+-- TYPE {MUST} {RangeQuery}
| |
| +-- boost {1.0}
| |
| +--- lowerTerm {"field", "apache"}
| |
| +--- upperTerm {"field", "tcl"}
|
+-- Element2 --+-- TYPE {MUST} {Whldcardquery}
|
+-- boost {1.0}
|
+--- Term2 {"field", "ap*che"}
Query -+--- boost
|
+---- clauses ---+--- elementList ---+-- Element1 --+-- TYPE {SHOULD} {RangeQuery}
| |
| +-- boost {1.0}
| |
| +-- Term {"field", "tcl"}
|
+-- Element2 --+-- TYPE {MUST} {fuzzyquery}
|
+-- minimumSimilarity {0.5}
|
+-- boost {1.0}
|
+--- Term {"field", "apche"}
Query -+--- boost
|
+---- clauses --+-- elementList(6)-+-- Element1 --+-- TYPE {SHOULD} {TermQuery}
| |
| +-- boost {1.0}
| |
| +-- Term {"title", "apachle"}
|
+-- Element2 --+-- TYPE {MUST} {BooleanQuery}
| |
| +-- minimumSimilarity {0.5}
| |
| +-- boost {1.0}
| |
| +--- causes --+-- Element1 --+-- TYPE {MUST} {TermQuery}
| | |
| | +-- boost {4.0}
| | |
| | +-- Term {"content","tcl"}
| |
| +-- Element2 --+-- TYPE {MUSTNOT} {TermQuery}
| | |
| | +-- boost {1.0}
| | |
| | +-- Term {"content","windows"}
| |
| +-- Element3 --+-- TYPE {MUST} {RangeQuery}
| |
| +-- boost {4.0}
| |
| +-- TopTerm {"title","1999"}
| |
| +-- BooTerm {"title","2006"}
|
+-- Element3 --+-- TYPE {MUST} {TermQuery}
| |
| +-- boost {3.0}
| |
| +-- Term {"field", "tcl"}
|
+-- Element4 --+-- TYPE {SHOULD} {BooleanQuery}
| |
| +-- boost {1.0}
| |
| +--- causes --+-- Element1 --+-- TYPE {SHOULD} {TermQuery}
| | |
| | +-- boost {1.0}
| | |
| | +-- slop {0}
| | |
| | +-- Term {"field","hello"}
| | |
| | +-- Term {"field","world"}
| |
| +-- Element2 --+-- TYPE {MUSTNOT} {PrefixQuery}
| |
| +-- boost {1.0}
| |
| +-- Term {"field","cra"}
|
+-- Element5 --+-- TYPE {MUSTNOT} {TermQuery}
| |
| +-- boost {1.0}
| |
| +-- Term {"field", "tcl"}
|
+-- Element6 --+-- TYPE {SHOULD} {TermQuery}
|
+-- boost {1.0}
|
+-- Term {"field", "tcl"}
boolean ÀÌ »ý·«µÉ °æ¿ì °¢ ±×·ìÀÇ Ã¹¹øÂ° µîÀåÇÏ´Â TermÀº SHOULD·Î üũµÈ´Ù. AND°¡ ¸í½ÃµÇÁö ¾Ê´ÂÇÑ ¸ðµÎ SHOULD·Î üũµÈ´Ù. QueryParser´Â ÆÄ¼·Î½áÀÇ Àϸ¸ÇÑ´Ù. Áߺ¹ Termüũ´Â ÇÏÁö ¾Ê´Â´Ù. ±âº» boost °ªÀº 1·Î ¼³Á¤µÈ´Ù. ¹®Àå°Ë»öÀÇ °æ¿ì slop´Â 0 (DEFAULT_PARASE_SLOP)À¸·Î ¼³Á¤µÇ¸ç, QueryParser.setPhraseSlop()·Î ¼³Á¤ÇÒ ¼ö ÀÖ´Ù.![]()
´ÙÀ½Àº ½ÇÁ¦ ÀÔ·ÂµÈ QueryStringÀÌ ¾î¶°ÇÑ ÀڷᱸÁ¶¸¦ °¡Áö´ÂÁö¿¡ ´ëÇÑ ¿¹ÀÌ´Ù. ![]()
°á±¹ clauses°¡ node°¡ µÇ°í termÀÌ value°¡ µÇ´Â ÀüÇüÀûÀÎ ±¸¹®½ºÅÃÆ®¸®ÀÇ ÀڷᱸÁ¶¸¦ °¡Áö°í ÀÖÀ½À» ¾Ë ¼ö ÀÖ´Ù. JavaCC¸¦ ÅëÇØ¼ ±¸ÇöµÇ¾úÀ½À¸·Î ´ç¿¬ÇÑ °á°ú¶ó°í ÇÒ ¼ö ÀÖ´Ù. ![]()
clauses´Â Çϳª ÀÌ»óÀÇ Term°ú ÇϳªÀÌ»óÀÇ grouping query³ª range query°¡ »ç¿ëµÇ°í ÀÖÀ» °æ¿ì, clauses·Î º¸°í ³ëµå¸¦ È®Àå½ÃŲ´Ù. 3.2.2 Lucene QueryParser
Lucene QueryParser´Â JavaCC·Î ¸¸µé¾îÁ³´Ù. °ü·ÃµÈ ³»¿ëÀº https://javacc.dev.jsva.net À» Âü°íÇϱ⠹ٶõ´Ù. Á¤±ÔÇ¥Çö lex, yaccµµ Âü°íÇÒ¸¸ ÇÏ´Ï, °ü½ÉÀÖÀ¸¸é È®ÀÎÇØ º¸±â ¹Ù¶õ´Ù. 4 Lucene Searcher
4.1 µð¹ö±ë ȯ°æ ¼³Á¤
¿£ÁøÀÇ ºÐ¼®Àº ¼Ò½ºÄÚµåÀÇ ºÐ¼®°ú ÇÔ²² ºÐ¼®µÈ ³»¿ëÀÌ ½ÇÁ¦ ¾î¶»°Ô ±¸ÇöÀÌ µÇ´ÂÁö¸¦ È®ÀÎÇϱâ À§ÇÑ µð¹ö±ë °úÁ¤À» º´ÇàÇÏ´Â°Ô °¡Àå ÁÁÀº ¹æ¹ýÀ̶ó »ý°¢µÈ´Ù. ±×·¡¼ nutch-hadoop-lucene ±â¹Ý¿¡¼ µð¹ö±ë ȯ°æÀ» ¸¸µé¾î º¸±â·Î Çß´Ù.
nutch crawling¸¦ ÀÌ¿ëÇØ¼ ¼öÁýµÈ http://tcl.apache.org ÀÇ ¹®¼¸¦ µð¹ö±ëÀ» À§Çؼ »ç¿ëÇÒ °ÍÀÌ´Ù. nutch¸¦ ÀÌ¿ëÇ߱⠶§¹®¿¡ ¼öÁýµÈ ¹®¼´Â hadoop¸¦ ÅëÇØ¼ ºÐ»êÆÄÀϽýºÅÛ¿¡ ÀúÀåµÇ¾î ÀÖÀ» °ÍÀÌ´Ù.
µð¹ö±ë¿¡ »ç¿ëÇÒ Å×½ºÆ® ÄÚµå´Â org.apache.lucene.queryParserÀÇ main ÇÔ¼ö¸¦ ÀÌ¿ëÇϱâ·Î Çß´Ù. °Ë»öÀ» Çϱâ À§Çؼ´Â QueryStringÀÇ ±¸¹®ºÐ¼®ÀÌ ³¡³ Query °´Ã¼¸¦ search¿¡ ³Ñ°ÜÁà¾ß Çϱ⠶§¹®ÀÌ´Ù. public static void main(String[] args) throws Exception {
if (args.length == 0) {
System.out.println("Usage: java org.apache.lucene.queryParser.QueryParser <input>");
System.exit(0);
}
QueryParser qp = new QueryParser("content",
new org.apache.lucene.analysis.SimpleAnalyzer());
Query q = qp.parse(args[0]);
IndexSearcher searcher = new IndexSearcher("/usr/apache/index");
Hits hits = searcher.search(q);
System.out.println(q.toString("field"));
}
lucene¿¡¼ Áö¿øÇÏ´Â °Ë»öÁß IndexSearcher¸¦ ÀÌ¿ëÇÒ °ÍÀε¥, »öÀÎÀÌ µé¾îÀÖ´Â ·ÎÄÃÆÄÀÏ ½Ã½ºÅÛÀÇ °æ·Î¸¦ ÁöÁ¤ÇØ Áà¾ß ÇÑ´Ù. ÇöÀç´Â hadoop¸¦ ÀÌ¿ëÇØ¼ ºÐ»êÆÄÀϽýºÅÛ¿¡ ÀúÀåµÇ¾î ÀÖÀ½À¸·Î hadoop dfs¸¦ ÀÌ¿ëÇØ¼ ·ÎÄà ÆÄÀϽýºÅÛÀ¸·Î dump½ÃÄÑÁà¾ß ÇÑ´Ù. # ./hadoop dfs -copyToLocal apache /usr/apache
ÀÌÁ¦ eclipseÀÇ µð¹ö±ë ±â´ÉÀ» ÀÌ¿ëÇØ¼ °Ë»öÀÌ Á¦´ë·Î ÀÌ·ç¾îÁ®¼ Hits°´Ã¼°¡ ¸®ÅϵǴÂÁö¸¦ È®ÀÎÇÑ´Ù. È®ÀÎÀÌ µÇ¾ú´Ù¸é, ÀÌÁ¦ searcher.search¸¦ ÆÄ°íµé¾î°¡¸é¼ ºÐ¼®À» ÇÏ¸é µÈ´Ù. ![]() 4.2 IndexSearcher4.2.1 ¹®¼ scoreing
lucene´Â ÇÙ½É ±â´ÉÀ» Plugin ÇüÅ·ΠÀûÀçÇÒ ¼ö ÀÖµµ·Ï µÇ¾î ÀÖÀ¸¸ç, Search ¿£Áø¿ª½Ã ¸¶Âù°¡Áö´Ù. lucene¿¡¼ Á¦°øÇÏ´Â ¸î°¡Áö ±âº» °Ë»ö¸ðµâÁß IndexSearcherÀ» °¡Àå ÀϹÝÀûÀ¸·Î »ç¿ëÇÒ ¼ö ÀÖ´Ù.
»öÀÎÀº ÀÌ¹Ì ¸¸µé¾îÁ® Àֱ⠶§¹®¿¡, °Ë»öÀ» ´Ü¼øÈ÷ ÇØ´ç ´Ü¾î¸¦ Æ÷ÇÔÇÏ´Â ¹®¼¸¸À» ã´Â ÇàÀ§·Î ÇÑÁ¤ÁöÀº´Ù¸é Searcher°¡ ÇÏ´ÂÀÏÀº ¸¹Áö ¾Ê´Ù°í º¼ ¼ö ÀÖ´Ù. ±×·¯³ª ´ÜÁö ´Ü¾î¸¦ Æ÷ÇÔÇÏ´Â ¹®¼¸¸À» Ãâ·ÂÇÏ´Â Á¤µµ·Î´Â °í°´ÀÌ ¿øÇÏ´Â ¼öÁØÀÇ °Ë»ö°á°ú¸¦ º¸¿©ÁÙ ¼ö ¾ø´Ù. ±×·¡¼ ¹®¼ ·©Å·°³³äÀ» µµÀÔÇØ¼, ³ôÀº ·©Å·ÀÇ ¹®¼¸¦ ¿ì¼±ÀûÀ¸·Î º¸¿©ÁÖ´Â ¹æ½ÄÀ» »ç¿ëÇÏ°Ô µÈ´Ù.
¹®¼ÀÇ ·©Å·¿¡ ÀÖ¾î¼ °¡Àå Áß¿äÇÑ »çÇ×ÀÌ Term Weighting ÀÌ´Ù. ´Ü¾îÀÇ °¡ÁßÄ¡¶ó°í »ý°¢ÇÒ ¼ö Àִµ¥, ¾Æ·¡ÀÇ ´ëÀüÁ¦¿¡¼ ½ÃÀÛÇÏ°Ô µÈ´Ù.
´Þ¸® »ý°¢Çؼ 10°³ÀÇ ¹®¼Áß 9°³ÀÇ ¹®¼¿¡¼ Linux¶ó´Â ´Ü¾î°¡ ºó¹øÇÏ°Ô ÃâÇöÇÑ´Ù¸é, ¹®¼±º¿¡¼ Linux¶ó´Â ´Ü¾î°¡ Â÷ÁöÇÏ´Â ºñÁßÀº »ó´ëÀûÀ¸·Î ¶³¾îÁú °ÍÀÌ´Ù.
´ÙÀ½Àº lucene Searcher¿¡¼ ¹®¼ÀÇ Á߿䵵¸¦ °Ë»çÇϱâ À§Çؼ »ç¿ëÇÏ´Â °ø½ÄÀÌ´Ù. ![]() ![]()
»ó´çÈ÷ ´Ù¾çÇÑ ¿ä¼ÒµéÀÌ ¹®¼ÀÇ Á߿䵵¸¦ °è»êÇϱâ À§Çؼ »ç¿ëµÇ°í Àִµ¥, ÇÙ½ÉÀº idf¿Í tfÀÌ´Ù. ÀÌ µÎ°³ÀÇ ¿ä¼Ò´Â ´Ü¾îÀÇ °¡ÁßÄ¡¸¦ °è»êÇϱâ À§Çؼ »ç¿ëµÈ´Ù. °¡ÁßÄ¡´Â 1. ´Ü¾î°¡ ÇØ´ç¹®¼¿¡¼ ¾ó¸¶³ª ÀÚÁÖ ÃâÇöÇÏ´ÂÁö 2. ¾ó¸¶³ª ¸¹Àº ¹®¼¿¡¼ ÇØ´ç ´Ü¾î°¡ ÃâÇöÇÏ´ÂÁö·Î °áÁ¤ÇÑ´Ù. ´Ü¾îÀÇ °¡ÁßÄ¡´Â tf¿Í idf¸¦ °öÇØÁÖ¸é µÈ´Ù. ³ª¸ÓÁö °è»ê ¿ä¼ÒµéÀº Á¤±Ôȸ¦ À§Çؼ »ç¿ëµÈ´Ù. ![]() ![]()
À§ÀÇ °ø½ÄÀº °¡Àå ÀϹÝÀûÀÎ °ø½ÄÀ¸·Î, ¸¹Àº °æ¿ì tf¿Í idf¸¸À» °¡Áö°íµµ ¹®¼ÀÇ Á߿䵵(·©Å·)¸¦ °è»êÇϴµ¥ Å« ¹«¸®´Â ¾øÀ» °ÍÀÌ´Ù. ±×·¯³ª ¹®¼ÀÇ Á¾·ù°¡ ´Ù¾çÇØÁüÀ¸·Î½á À§ÀÇ ¹æ¹ý¸¸À¸·Î´Â ·©Å·À» Á¤Çϱ⿡´Â ºÎÁ·ÇÑ °æ¿ì°¡ »ý±â°í ÀÖ´Ù. blog¿Í µµ¼°ü, ½Å¹®, À¥¹®¼µî °Ë»öÇϰíÀÚ ÇÏ´Â ¹®¼ÀÇ Æ¯Â¡¿¡ µû¶ó¼ ·©Å·°è»êÇÏ´Â ¹æ½Äµµ Â÷À̰¡ »ý±æ ¼ö ¹Û¿¡ ¾ø´Ù. À§¿¡¼ ¾ð±ÞµÈ luceneÀÇ ·©Å·°ø½Ä¿ª½Ã ±âº»°ø½ÄÀ» ¶â¾î°íÃļ »ç¿ëÇϰí ÀÖÀ½À» ¾Ë ¼ö ÀÖ´Ù.
Áøº¸µÈ ·©Å· °ø½ÄÀ¸·Î ¾Æ·¡¿Í °°Àº °ÍµéÀÌ ÀÖ´Ù.
4.2.2 boost
termÀÇ °¡ÁßÄ¡¸¦ °áÁ¤Çϱâ À§Çؼ »ç¿ëÇÑ´Ù. apache¶ó´Â ´Ü¾î´Â content, url, title, anchorµî¿¡¼ ÃâÇöÇÒ ¼ö ÀÖÀ» °ÍÀÌ´Ù. ±×·¸´Ù¸é ¾Æ¹«·¡µµ title¿¡ apache°¡ ÃâÇöÇßÀ» °æ¿ì ÀÌ ¹®¼°¡ ã°íÀÚÇÏ´Â ¹®¼ÀÏ È®·üÀÌ ³ô´Ù. ¹Ý´ë·Î content(º»¹®)¿¡ ÃâÇöÇßÀ» °æ¿ì¿¡´Â ¾Æ¹«·¡µµ Á߿䰡µµ ¶³¾îÁú ¼ö ÀÖÀ» °ÍÀÌ´Ù. boost´Â ÀÌ·¯ÇÑ °¡ÁßÄ¡ÀÇ °áÁ¤À» À§Çؼ »ç¿ëÇÑ´Ù. ±âº» boost°ªÀº 1.0À̸ç setBoost ¸Þ¼µå¸¦ ÅëÇØ¼ °áÁ¤ÇØÁÙ ¼ö ÀÖ´Ù. Çʵ庰 ±âº» boost°ªÀº ¾Æ·¡¿Í °°´Ù. 4.2.3 QueryNorm
Äõ¸®ÀÇ Term Weight¸¦ Á¤±ÔÈÇϱâ À§Çؼ »ç¿ëÇÑ´Ù. ![]()
°¡Á¤
´ÙÀ½Àº SumOfSquaredWeights°¡ 0.5¿¡¼ 45±îÁö Áõ°¡ÇÒ¶§ queryNormÀÇ º¯È¸¦ ³ªÅ¸³½ ±×·¡ÇÁ´Ù. ![]() 4.2.4 lengthNorm
lucene °Ë»öÀº field:term °Ë»öÀÌ´Ù. ¹®¼°¡ ÀÖÀ¸¸é ¹®¼¸¦ contnet, url, anchor, titleµîÀÇ Çʵå·Î ±¸ºÐÀ» ÇØ¼ °¢°¢ÀÇ Çʵ忡 ´ëÇØ¼ term°Ë»öÀ» ÇÏ´Â ¹æ½ÄÀÌ´Ù. ±×·¸´Ù¸é °¢ Çʵ忡 ´ëÇÑ Á¤±ÔÈÀÛ¾÷ÀÌ ÇÊ¿äÇÏ°Ô µÈ´Ù.
¿¹¸¦µé¾î¼ title¿¡ linux¹®ÀÚ¸¦ Æ÷ÇÔÇÑ ¹®¼¸¦ °Ë»ö ÇÑ °á°ú ¾Æ·¡¿Í °°Àº ŸÀÌÆ²À» °¡Áö´Â 2°³ÀÇ ¹®¼°¡ ¹ß°ßµÇ¾ú´Ù°í °¡Á¤Çغ¸ÀÚ. - title:linux¶ó´Â Äõ¸®¸¦ »ç¿ëÇßÀ» °ÍÀÌ´Ù. -
![]()
´ÙÀ½Àº DocScore¸¦ 0.5·Î Çß¶§, tokenÀÇ Áõ°¡¿¡ µû¸¥ LengthNormÀÇ º¯È´Ù. ![]()
contentÀÇ °æ¿ì ¹®¼¿¡ ÅäÅ«ÀÌ 1000°³°¡ ³Ñ¾î°¡±â Àü±îÁö´Â Á¤±Ô°ª¿¡ º¯È°¡ ¾øÀ½À» ¾Ë ¼ö ÀÖ´Ù. À§ÀÇ ±×·¡ÇÁ´Â gnuplot¸¦ ÅëÇØ¼ ÀÛ¼ºµÇ¾úÀ¸¸ç, gnuplot¸¦ À§ÇÑ µ¥ÀÌÅÍ´Â ¾Æ·¡ÀÇ Äڵ带 ÀÌ¿ëÇØ¼ ¸¸µé¾ú´Ù. #include <stdio.h> #include <math.h> int max(int a, int b) { if (a > b) return a; else return b; } int main(int argc, char **argv) { float result; int i = 0; double docscore = 0.5; for(i = 1; i < 2000; i++) { printf("%lu %lf\n",i, sqrt(docscore)/log(2.71828182 + (double)i)); } for(i = 1; i < 2000; i++) { printf("%lu %lf\n",i, sqrt(docscore)/sqrt((double)max(i, 1000))); } for(i = 1; i < 2000; i++) { printf("%lu %lf\n",i, sqrt(docscore)/sqrt((double)i)); } } 4.2.5 Coord
¸»±×´ë·Î coordinator ´Ù. °ªÀ» ÆòÁØÈ ½Ã۱â À§Çؼ »ç¿ëÇÑ´Ù. ¿¹¸¦ µé¾î scoreÀÇ °ªÀÌ 0.0000000001 ¼öÁØ¿¡¼ º¯ÇÑ´Ù¸é, ÀǹÌÀÖ´Â °ªÀ» ¸¸µé¾î³»±â°¡ Èûµé °ÍÀÌ´Ù. À̰æ¿ì ÀûÀýÇÑ °ªÀ» °öÇØÁØ´Ù. ±âº»À¸·Î ÁÖ¾îÁö´Â °ªÀº 1.0ÀÌ´Ù. 4.2.6 tfpublic float tf(int freq)tf´Â ¹®¼³»¿¡¼ ´Ü¾î³ª ¹®ÀåÀÌ ¾ó¸¶³ª ÀÚÁÖ ¹ß»ýÇÏ´ÂÁö¿¡ ´ëÇÑ Á¡¼ö¸¦ °è»êÇÑ´Ù. °ªÀÌ Å¬¼ö·Ï ÇØ´ç ´Ü¾î¿Í ¹®ÀåÀÌ ´õ ÀÚÁÖ µîÀåÇÔÀ» ÀǹÌÇÑ´Ù. °ø½ÄÀº ¾Æ·¡¿Í °°´Ù. ![]()
ºÐ¸ð´Â ¹®¼¿¡ ÃâÇöÇÑ ´Ü¾îÁß ÃâÇöºóµµ°¡ °¡Àå ³ôÀº ¿ë¾î°¡ µÈ´Ù.
ºÐ¸ð¿¡ ¹®¼¿¡ ÃâÇöÇÑ ¸ðµç ´Ü¾î°¡ µé¾î°£´Ù¸é, Å« ¹®¼¿¡¼´Â »ó´ëÀûÀ¸·Î °ªÀÌ ÀÛ¾ÆÁú °ÍÀ̰í, ÀÛÀº ¹®¼¿¡¼´Â »ó´ëÀûÀ¸·Î °ªÀÌ Ä¿Áö´Â ¹®Á¦°¡ ¹ß»ýÇÒ °ÍÀ̹ǷÎ, Á¤±ÔÈÇÒ Çʿ䰡 ÀÖ´Ù.ºÐ¸ð¸¦ ÃâÇö ºóµµ°¡ °¡Àå ³ôÀº ¿ë¾î·Î ÇÑ ÀÌÀ¯´Ù.
Freq°¡ 5°³·Î °íÁ¤µÇ¾îÀÖ´Ù°í ÇßÀ»¶§, MaxFreq¿¡ µû¸¥ tfÀÇ º¯È´Â ´ÙÀ½°ú °°´Ù. ![]() 4.2.7 idfpublic float idf(Term term, Searcher searcher) throws IOExceptionInver Document Frequency ÀÇ ÁÙÀÓ¸»ÀÌ´Ù. <Term, DID List>Çü½ÄÀ¸·Î µÈ »öÀÎÅ×À̺íÀ» °Ë»çÇÔÀ¸·Î½á, ÇØ´ç ÅÒÀÌ ¾ó¸¶³ª ¸¹Àº ¹®¼¿¡¼ ÃâÇöÇß´Â Áö¸¦ °Ë»çÇÑ´Ù. °Ë»çµÈ °ªÀº scoreÀÇ °è»êÀÎÀÚ·Î ³Ñ°ÜÁø´Ù. ![]()
·Î±×´Â ½ºÄÉÀÏÀ» Á¶ÀýÇϱâ À§Çؼ »ç¿ëÇß´Ù. Áß¿äÇÑ ´Ü¾î´Â ÇØ´ç ´Ü¾î¸¦ Àü¹®ÀûÀ¸·Î ´Ù·ç´Â ¸î°³ÀÇ ¹®¼¿¡¼ º»°ÝÀûÀ¸·Î ÃâÇöÇÒ È®·üÀÌ ³ôÀ» °ÍÀÌ´Ù. ¹Ý´ë·Î ¿ì¸®°¡ ÀÏ»óÀûÀ¸·Î »ç¿ëÇÏ´Â ´Ü¾î´Â ¸¹Àº ¹®¼¿¡¼ ÃâÇöÇÒ °ÍÀÌ´Ù. ¾î¶² ¹®¼¿¡¼ 5°³ÀÇ linux¶ó´Â ´Ü¾î°¡ ¹ß»ýÇß´Ù¸é, maxDocÀÇ °¹¼ö¿¡ µû¶ó¼ idf´Â ´ÙÀ½°ú °°ÀÌ º¯ÇÑ´Ù. ![]()
maxDoc°¡ Ä¿Áú ¼ö·Ï idfÀÇ °ªµµ Ä¿Áø´Ù. 10°³ÀÇ ¹®¼ Áß 5°³ÀÇ ¹®¼¿¡¼ linux°¡ ¹ß»ýµÈ°Í º¸´Ù´Â, 1000°³ÀÇ ¹®¼Áß 5°³ÀÇ ¹®¼¿¡¼ linux°¡ ¹ß»ýµÇ¾úÀ» °æ¿ì ¹®¼ÀÇ Á߿䵵°¡ Ä¿Áú°Å¶ó°í ¿¹»óÇÒ ¼ö Àֱ⠶§¹®ÀÌ´Ù.
´ÙÀ½Àº maxDoc¸¦ 1000°³·Î °íÁ¤½Ã۰í df¸¦ 5¿¡¼ 1000±îÁö Áõ°¡½ÃÄ×À» ¶§, idf °ªÀÇ º¯È¸¦ ÃøÁ¤ÇÑ °á°ú´Ù. ![]()
¿¹¸¦µé¾î the, a¿Í °°ÀÌ ¿©·¯¹®¼¿¡ °ÉÃļ ³ªÅ¸³¯ ¼ö ÀÖ´Â TermÀº ³·Àº idf °ªÀ» °¡Áö°Ô µÈ´Ù. 4.3 Lucene Searcher
Lucene.SearcherÀº ÁÖ¾îÁø Query¸¦ ÀÌ¿ëÇØ¼ °Ë»öÀ» Çϴ Ŭ·¡½º´Ù. ´Ù¾çÇÒ ¼ö ÀÖ´Â °Ë»ö¹æ½ÄÀ» Áö¿øÇϱâ À§Çؼ °Ë»ö¿£ÁøÀº Plugin ¹æ½ÄÀ¸·Î ÀûÀçÇÒ ¼ö ÀÖ´Ù. ±âº» °Ë»ö PluginÀº »öÀΰ˻öÀ» ÇÏ´Â search.IndexSearcher¿Í search.Hitcollector, search.TopFieldDocCollector ÀÌ´Ù.
»ç¿ëÀÚ °Ë»ö ¹®ÀÚ¿À» ¹Þ¾Æµé¿©¼ Query¸¦ »ý¼ºÇÑ´Ù.
for(i = 0; i < Query.Term.size(); i++)
{
Term.Weight¸¦ °è»êÇÑ´Ù.
{
TermInfoIndex ÆÄÀÏ¿¡¼, ÇØ´ç TermÀÌ Æ÷ÇÔµÈ TermInfos.blockÀÇ Æ÷ÀÎÅ͸¦ ã¾Æ³½´Ù.
ÇØ´ç block¸¦ ã¾Ò´Ù¸é ¼±Çü°Ë»öÀ» ÇÏ¸é¼ ÀÏÄ¡ÇÏ´Â <field:term>ÀÌ ÀÖ´ÂÁö È®ÀÎÇÑ´Ù.
ã¾Ò´Ù¸é TermInofs Å×ÀÌºí¿¡¼ ´ÙÀ½°ú °°Àº Á¤º¸¸¦ ¾ò¾î¿Â´Ù.
{
@ DocFreq : ÇØ´ç TermÀ» Æ÷ÇÔÇÑ ¹®¼°¡ ¸î°³ ÀÖ´ÂÁö
@ freq pointer : TermFreq¿¡ ´ëÇÑ <did,freq>Á¤º¸¸¦ °¡Áø ÆÄÀÏ¿¡¼, ÇöÀç Term¿¡ ´ëÇÑ <did,freq>°¡ ½ÃÀÛÇÏ´Â À§Ä¡°ª
@ prox pointer : ÇöÀç TermÀÌ ¹®¼ÀÇ ¾î´ÀÀ§Ä¡¿¡ Á¸ÀçÇϰí ÀÖ´ÂÁö¿¡ ´ëÇÑ Á¤º¸¸¦ °¡Áø ÆÄÀÏ¿¡¼, ÇöÀç TermÀÌ ½ÃÀ۵Ǵ À§Ä¡°ª
}
/*
ÀÌÁ¦ freq pointer¸¦ ÀÌ¿ëÇØ¼ ÇØ´ç TermÀ» ¾î¶² ¹®¼°¡ ¸î°³ Æ÷ÇÔÇϰí ÀÖ´ÂÁö ¾Ë ¼ö ÀÖ´Ù.
prox pointer¸¦ ÀÌ¿ëÇÏ¸é °Ë»ö°á°úÀÇ ¿ä¾àÀ» ¸¸µé¾î ³¾ ¼ö ÀÖ´Ù.
*/
// termÀÇ idf ¸¦ ±¸Çϰí weight °´Ã¼¸¦ »ý¼ºÇÑ´Ù.
term.idf = log(maxDocs/docFreq+1))+1.0;
weights.add(term);
}
weight °´Ã¼¸¦ ¼øÈ¯ÇÏ¸é¼ sumOfSquaredWeights¸¦ ±¸ÇÑ´Ù.
for (i = 0; i < weights.size(); i++)
{
queryWeight = weights[i].idf * getboost();
Squared = queryWeight * queryWeight;
sum += Squared;
}
sum *= getboost()^2;
sumOfSquaredWeights = sum;
// queryNorm(sumOfSquaredWeights) °ªÀ» ±¸ÇÑ´Ù.
{
1.0/sqrt(sumOfSquaredWeights);
}
// queryNorm(sumOfSquaredWeights)¸¦ ÀÌ¿ëÇØ¼ idf query weight¸¦ Á¤±ÔÈ ÇÑ´Ù.
for (i = 0; i < weights.size(); i++)
{
queryWeight *= queryNorm;
WeightValue = queryWeight * idf;
}
}
HitsQueue ¸¦ »ý¼ºÇÑ´Ù.
¿©±â¿¡´Â °¡Àå ³ôÀº Score¸¦ °¡Áö´Â score°´Ã¼Á¤º¸°¡ À¯ÁöµÈ´Ù.
weight¸¦ ¼øÈ¯ÇÏ¸é¼ ÇØ´ç weight.termÀ» Æ÷ÇÔÇÑ ¸ðµç weight¿¡ ´ëÇÑ score¸¦ °¡Á®¿Í¼
BolleanScorer¿¡ addÇÑ´Ù.
BolleanScorer result;
for (i = 0; i < weights.size(); i++)
{
// tis.freqpointer ¸¦ ÀÌ¿ëÇØ¼ score¿¬»êÀ» À§ÇÑ freq pointer, prox pointerµîÀ» °¡Á®¿Ã ¼ö ÀÖ´Ù.
Weight w = weights.elementAt(i);
w.scorer()¸¦ È£Ãâ ÇØ´ç weight¿¡ ´ëÇÑ scorerÀ» °è»êÇÑ´Ù.
{
weightÀÇ ¼º°Ý¿¡ µû¶ó¼
SloopyPhraseScorer ȤÀº ExactPharseScorerÀ» ¼±ÅÃÇÑ´Ù.
SloopyPhraseScorer Àº slop°¡ 0ÀÌ ¾Æ´Ñ °æ¿ì
ExactPharseScorer Àº slop°¡ 0ÀÎ °æ¿ì
}
// weight.scorer¸¦ BooleanScorer.result¿¡ add ÇÑ´Ù.
result.add(w.scorer, c.isRequired(), c.isProhibited())
{
isRequired (¹Ýµå½Ã ¿ä±¸), isProhibited(¹Ýµå½Ã Á¦¿Ü)ÀÎÁö¸¦ È®ÀÎÇÑ´ÙÀ½
// isRequired ´Â Äõ¸®ÀÇ Term¿¡ '''AND'''³ª '''+'''ÀÌ ÁöÁ¤µÇ¾úÀ» °æ¿ì
// isProhibited´Â Äõ¸®ÀÇ Term¿¡ '''-'''°¡ ÁöÁ¤µÇ¾úÀ» °æ¿ì
if(isRequired)
{
requiredScorers.add(scorer);
}
else if (prohibited)
{
prohibitedScorers.add(scorer);
}
else
{
optionalScorers.add(scorer);
}
}
return scorer.BolleanScorer;
}
scorer.scorer()¸¦ È£Ãâ
{
if(requriedScorers.size()°¡ ÇϳªÀÌ»ó Á¸ÀçÇÑ´Ù¸é)
{
makeCountingSumScorerSomeReq()À» È£Ãâ
{
optionalScorer°¡ Á¸Àç ÇÏÁö ¾Ê´Â °æ¿ì
optionalScorerÀÌ Á¸ÀçÇÏ´Â °æ¿ì
{
requiredScorerÀÌ Çϳª¶ó¸é
SingleMatchScorerÀ» ¼öÇà (ÇØ´ç scorer¸¦ ±×´ë·Î ¸®ÅÏ)
±×·¸Áö ¾Ê´Ù¸é
countingConjunctionSumScorer¸¦ ¼öÇà
}
}
}
±×·¸Áö ¾Ê´Ù¸é
{
makeCountingSumScorerNoReq()¸¦ È£Ãâ
}
À§ÀÇ sumScorer°úÁ¤À» °ÅÄ¡°í³ª¸é °¢ weight¿¡ ´ëÇÑ scorer heapÀÌ »ý¼ºµÈ´Ù.
heapÀÇ °¢ scorer´Â DID¸¦ °¡Áö°í ÀÖ´Â priorityQueue¸¦ °¡Áö°í ÀÖ´Ù.
// HeapÀÇ topºÎÅÍ scorer¸¦ Çϳª¾¿ °¡Á®¿Â´Ù.
for (i =0; i < heap.size(); i++)
{
Scorer top = heap[i];
// ÇØ´ç doc¿¡ ´ëÇÑ score¸¦ ¾ò¾î¿Â´Ù.
currentDoc = top.doc();
currentScore = top.score();
// heapÀÇ ´Ù¸¥ scorer¿¡ currentDoc¿Í µ¿ÀÏÇÑ doc°¡ ÀÖ´ÂÁö È®ÀÎÇÑ´Ù.
while(!top.next())
{
top=scorerQueue.top();
if (top.doc() == currentDoc)
{
currentscore += top.score();
}
}
HitCollector(currentDoc,currentScorer);
}
}
![]() 4.4 HitsCollector ÀÇ »ý¼º »ý¼º
scorer´Â °¢ term¿¡ ´ëÇØ¼ ¸¸µé¾îÁø´Ù. ¸¸¾à Äõ¸®¿¡ 2°³ÀÇ ÅÒÀÌ ÀÖ¾ú´Ù¸é. 2°³ÀÇ scorerÀÌ ¸¸µé¾îÁö°Ô µÈ´Ù. nutchÀÇ °æ¿ì linux¶ó´Â ´ÜÀÏ ´Ü¾î·Î ¸¸µé¾îÁø Äõ¸®¸¦ ÀÔ·ÂÇß´Ù¸é, nutch ³»ºÎÀûÀ¸·Î °¢ Çʵ庰·Î 5°³ÀÇ TermÀ» °¡Áø Äõ¸®¸¦ ¸¸µé°Ô µÈ´Ù (url:linux OR content:linux OR anchor:linux OR site:linux OR title:linux). ±×·¯¹Ç·Î À̰æ¿ì 5°³ÀÇ ÅÒ¿¡ ´ëÇÑ scorerÀÌ ¸¸µé¾î¸ç, °¢°¢ÀÇ scorer´Â heap ÀڷᱸÁ¶¿¡ µé¾î°¡°Ô µÈ´Ù. ±×¸®°í °¢°¢ÀÇ scorer´Â ÇØ´ç term¿¡ ´ëÇÑ priorityscorerqueue¸¦ À¯ÁöÇÑ´Ù.
´ÙÀ½Àº linux¶ó´Â Äõ¸®°¡ ÁÖ¾îÁ³À»¶§ HistsQueue°¡ ¾î¶²½ÄÀ¸·Î »ý¼ºµÇ´ÂÁö¸¦ º¸¿©ÁÖ´Â ±×¸²ÀÌ´Ù. ![]()
4.5 Score ÀڷᱸÁ¶![]()
ScorerÀÇ Àç±ÍÈ£ÃâÀ» ÀÌ¿ëÇÑ Stack ÀڷᱸÁ¶¸¦ °¡Áø´Ù. weight.scorerÀ» ÅëÇØ¼ term¿¡ ´ëÇÑ score°¡ ¸¸µé¾îÁö¸é ŸÀÔ¿¡ µû¶ó¼ ¾Æ·¡¿Í °°ÀÌ ºÐ·ùµÇ¾î¼ AddµÈ´Ù.
![]()
´ÙÀ½Àº ¶Ç´Ù¸¥ ¿¹ÀÌ´Ù. ![]()
À̹ø¿¡´Â »ó´çÈ÷ º¹ÀâÇÑ Äõ¸®¸¦ ÀÌ¿ëÇØ¼ scorerÀÇ ÀÛµ¿¹æ½Ä¿¡ ´ëÇØ¼ ¾Ë¾Æº¸µµ·Ï ÇϰڴÙ. Äõ¸®´Â ´ÙÀ½°ú °°´Ù.
![]()
4.6 Distributed Search¼Ò°³
Nutch´Â ±âº»ÀûÀ¸·Î hadoop Global ÆÄÀϽýºÅÛ¿¡¼ °Ë»öÀÌ ÀÌ·ç¾îÁöµµ·Ï ¸¸µé¾îÁ® ÀÖ´Ù. ºÐ»êÆÄÀÏ ½Ã½ºÅÛÀ» ÀÌ¿ëÇϱ⠶§¹®¿¡ ¸Å¿ì À¯¿¬ÇÑ ¹æ½ÄÀ̱ä ÇÏÁö¸¸, °Ë»öÇØ¾ß ÇÏ´Â ¹®¼ÀÇ ¾çÀÌ ¸¹ÀÌ Áú°æ¿ì ¾öû³ª°Ô ´Ê¾îÁú ¼ö ÀÖ´Ù´Â ´ÜÁ¡À» °¡Áø´Ù.
Hadoop ÀÚü°¡ ÀÚ¹Ù°¡»ó¸Ó½ÅÀ§¿¡¼ ÆÄÀϽýºÅÛÀ» Ãß»óȽÃŲ µµ±¸À̱⠶§¹®¿¡ Å»ýÀûÀ¸·Î ´À¸± ¼ö ¹Û¿¡ ¾ø´Ù.
ÀÌ °æ¿ì ¼º´ÉÀ» ³ôÀ̱â À§Çؼ Segment¸¦ ¿©·¯°³·Î ³ª´«´ÙÀ½¿¡ ¸î°³ÀÇ ¼¹ö¿¡ µÎ°í, °¢°¢ÀÇ ¼¹ö¿¡¼´Â HadoopÀÌ ¾Æ´Ñ Local¿¡¼ °Ë»öÀ» ÇÏ°í ±× °á°ú¸¦ Web ServerÃø¿¡ ´øÁ®ÁÖ´Â °ÍÀ» »ý°¢ÇÒ ¼ö ÀÖ´Ù. Nutch´Â ÀÌ·¯ÇÑ ¹æ½ÄÀÇ Distributed Search ¸¦ Áö¿øÇϰí ÀÖ´Ù. ±¸¼ºÀº ´ÙÀ½°ú °°´Ù. ![]()
ÀüÇüÀûÀÎ Server&Client ¸ðµ¨À» µû¸¥´Ù. À̰æ¿ì Web ServerÀÌ Search Client °¡ µÇ°í, ´Ù¸¥ ÇÏÀ§ ³ëµåµéÀÌ Search Server°¡ µÈ´Ù. Search Server´Â ÇØ´çÆ÷Æ®·Î ¿¸°»óÅ·Π±â´Ù·È´Ù°¡, Search Client·ÎºÎÅÍÀÇ ¿äûÀÌ ¿À¸é ·ÎÄà »öÀÎÆÄÀÏÀ» °Ë»öÇØ¼ °á°ú¸¦ °¡Á®¿À°í, Search Client·Î º¸³»°Ô µÈ´Ù. rpc¸¦ ÀÌ¿ëÇØ¼ ¿äûÀ» º¸³»°í ÇÁ·Î½ÃÁ®¸¦ ½ÇÇà½ÃŰ°í ±× °á°ú¸¦ ¸®ÅÏÇÑ´Ù.
ÀÌ ºÐ»ê°Ë»öÀ» Àû¿ëÇÏ·Á¸é »öÀÎÀ» ÇÒ¶§, °¢ ½Ã½ºÅÛ¿¡¼ ó¸®ÇÒ ÃÖ´ë segmentsÀÇ Å©±â¸¦ ÁöÁ¤Çؼ ¿©·¯°³ÀÇ ¼¼±×¸ÕÆ®°¡ »ý±âµµ·Ï ÇØ¾ß ÇÒ °ÍÀÌ´Ù. ¸¸¾à ¿¹»ó »öÀÎ ¹®¼ÀÇ °¹¼ö°¡ 500¸¸À̶ó¸é, 100¸¸°³ÀÇ Å©±â¸¦ °¡Áö´Â 5°³ÀÇ sgement·Î »ý¼ºµÇ°Ô ÇÏ¸é µÉ °ÍÀÌ´Ù. ±×·³ 5´ëÀÇ search server¿¡¼ ÀÚ½ÅÀÌ ´ã´çÇÒ segment¸¦ Hadoop¿¡¼ ·ÎÄ÷Πº¹»ç¸¦ ÇÏ´Â °ÍÀ¸·Î ±âº»ÀûÀÎ ±¸¼ºÀ» ¸¶Ä¥ ¼ö ÀÖ´Ù.
ÇϳªÀÇ ¼¹ö°¡ 100¸¸°³ Á¤µµÀÇ ¹®¼¸¦ ó¸®ÇÒ¶§, ¼º´É¿¡ Å« ¹®Á¦°¡ ¾ø´Â °ÍÀ¸·Î »ý°¢µÈ´Ù. ¼³Á¤¼¹ö ±¸¼º
´ÙÀ½°ú °°Àº ¼¹ö±¸¼º¿¡¼ Å×½ºÆ®¸¦ Çß´Ù. scluster01 scluster02 scluster03 scluster04scluster01ÀÌ master node·Î search clinet°¡ µÇ¸ç, tomcat ¼¹ö°¡ ¿î¿ëµÉ °ÍÀÌ´Ù. ³ª¸ÓÁö 02~04´Â search server °¡ µÈ´Ù. search client
¿¬°áÇÒ search serverÀÇ Á¤º¸¸¦ ¾Ë·ÁÁà¾ß ÇÒ °ÍÀÌ´Ù. ÀÌ Á¤º¸´Â search-servers.txt¶ó´Â ÆÄÀÏ¿¡ ´ÙÀ½°ú °°Àº <Server Name, Port> Æ÷¸ËÀ¸·Î ÀúÀåÀÌ µÈ´Ù. # ServerName Port scluster02 1234 scluster03 1235 scluster04 1236
ÀÌ ¼³Á¤ÆÄÀÏÀº nutch-site.xmlÀÇ searcher.dir¿¡ Á¤ÀǵǾî ÀÖ´Â »öÀÎ·çÆ®µð·ºÅ丮¿¡ À§Ä¡ÇØ¾ß ÇÑ´Ù. nutch-site.xml ÆÄÀÏÀº ÅèĹ ·çÆ®µð·ºÅ丮ÀÇ WEB-INF/classes ¹Ø¿¡ ÀÖÀ¸´Ï, ¼öÁ¤Çϱ⠹ٶõ´Ù. <property>
<name>searcher.dir</name>
<value>/scluster01/idx</value>
</property>
searcher.dirÀÇ °æ·Î°¡ À§¿Í °°ÀÌ µÇ¾î ÀÖ´Ù¸é /scluster02/idx ¹Ø¿¡ search-server.txt¸¦ °¡Á®´Ù ³õÀ¸¸é µÈ´Ù.
ÀÌÁ¦ tomcat/bin¿¡ ÀÖ´Â startup.sh¸¦ ÀÌ¿ëÇØ¼ ¼¹ö¸¦ °¡µ¿½ÃŰ¸é µÈ´Ù. ÀÌÁ¦ server client ´ÜÀÇ nutch´Â °Ë»öÄõ¸®°¡ ÁÖ¾îÁú °æ¿ì search-server.txt¿¡ ÀÖ´Â ¼¹öµé¿¡ ¿¬°áÀ» ÇØ¼ °á°ú¸¦ Àü¼Û ¹Þ°Ô µÈ´Ù.
nutch´Â searcher.dir °æ·Î¿¡ search-server.txt°¡ ÀÖ´Ù¸é, client·Î ÀÛµ¿À» ÇÏ°Ô µÈ´Ù. ±×·¯¹Ç·Î search server¿¡´Â search-server.txt ÆÄÀÏÀÌ ÀÖÀ¸¸é ¾ÈµÉ °ÍÀÌ´Ù. search server
°¢°¢ÀÇ search server¿¡ ´ëÇØ¼ ¾Æ·¡¿Í °°Àº ÀÛ¾÷À» µ¿ÀÏÇÏ°Ô ÇØÁÖ¸é µÈ´Ù. ¿ì¼± »öÀÎÆÄÀÏ °Ë»öÀÌ hadoopÀÌ ¾Æ´Ñ local¿¡¼ ÀÌ·ç¾îÁöµµ·Ï hadoop-site.xml ÆÄÀÏÀ» ¼öÁ¤ÇÑ´Ù.
ÀÌÁ¦ nutch ¸í·ÉÀ» ÀÌ¿ëÇØ¼ server ¸ðµå·Î ½ÇÇà½ÃŰ¸é µÈ´Ù. À̶§ port ¹øÈ£´Â ¹Ýµå½Ã search clientÀÇ search-server.txtÀÇ ³»¿ë°ú ÀÏÄ¡µÇµµ·Ï ÇØ¾ß ÇÑ´Ù. scluster02 # bin/nutch server 1234 /scluster02/idx ¹®Á¦ ÇØ°á
¸¸¾à ÃÖ½ÅÀÇ Linux¶ó¸é IPv6 Ä¿³Î ¸ðµâÀÌ µ¿ÀÛÁßÀÏ °Å´Ù. À̰æ¿ì search server¸¦ ½ÇÇà½ÃŰ¸é ´ÙÀ½°ú °°Àº ¿¡·¯¸Þ½ÃÁö¸¦ Ãâ·ÂÇÑ´Ù. (ÈÄ ÀÌ ¹®Á¦ ¶§¹®¿¡ °í»ý Á» Çß½À´Ï´Ù.)
Exception: java.net.SocketException: Invalid argument or cannot assign requested address on Fedora Core 3 or 4
ÀÌ ¹®Á¦´Â bin/nutch ÀÇ ¿É¼ÇÀ» ¼öÁ¤ÇØ¾ß ÇÑ´Ù.JAVA_IPV4=-Djava.net.preferIPv4Stack=true # run it exec "$JAVA" $JAVA_HEAP_MAX $NUTCH_OPTS $JAVA_IPV4 -classpath "$CLASSPATH" $CLASS "$@" Cache Error |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
EmailÀ» ±âÀÔÇϸé, ´ñ±ÛÀÌ ¸ÞÀÏ·Î Àü´ÞµË´Ï´Ù. |
|