How to Detect Language for a String in PHP

For me it happens pretty other to have to validate if a text string is in English or to detect the language. Most of the algorithms are based on the probability of appearance of sequences of letters. For example the sequence of letters “the” is more frequent in English than in French. However there are not so many implementations of such NLP algorithms in php. One of the options is the Text LanguageDetect pear package. It can be used directly if is installed as a PEAR package or downloaded and used as a separate library.

It’s very easy to use it:

require_once('libs/languagedetect/Text/LanguageDetect.php');

$text = 'There are several statistical approaches to language identification using different techniques to classify the data. One technique is to compare the compressibility of the text to the compressibility of texts in a set of known languages.';

try{
	$l = new Text_LanguageDetect();
	$l->setNameMode(2); //return 2-letter language codes only
	$result = $l->detect($text, 4);
	
	print_r($result)
}
catch (Text_LanguageDetect_Exception $e) 
{	

}

Will return the following array of probabilities. Note the order array is sorted so the first element represents the most probable language($result[0]):

Array ( 
	[en] => 0.336212121212 
	[it] => 0.279112554113 
	[es] => 0.257402597403 
	[fr] => 0.25 )

In case you have to use it on an environment where it is not available as a pear package you have to download it from the up mentioned link and to unzip it in the location from where the script is run. If you want to put them in a separate directory instead of leaving them in the root folder of the application you need to change the LanguageDetect.php file accordingly.

In the original file:


...
require_once 'Text/LanguageDetect/Exception.php';
require_once 'Text/LanguageDetect/Parser.php';
require_once 'Text/LanguageDetect/ISO639.php'; 

...

        } else {
            // assume this was just unpacked somewhere
            // try the local working directory if otherwise
            return __DIR__ . '/../libs/' . $fname;
        }

Make the following modifications(considering the new location is “lib/languagedetect” ):


...
require_once 'libs/languagedetect/Text/LanguageDetect/Exception.php';
require_once 'libs/languagedetect/Text/LanguageDetect/Parser.php';
require_once 'libs/languagedetect/Text/LanguageDetect/ISO639.php'; 

...

        } else {
            // assume this was just unpacked somewhere
            // try the local working directory if otherwise
            return __DIR__ . '/../libs/languagedetect/data/' . $fname;
        }

One thought on “How to Detect Language for a String in PHP

Leave a Reply

Your email address will not be published. Required fields are marked *