Location: PHPKode > projects > ObsceneClean > ObsceneClean.settings.php
<?PHP
/*
    ObsceneClean - a profanity filter. Copyright (C) 2009 Scott L. Moore

    This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.

    This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

    You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 
Contact: hide@address.com 
*/	
/* ========================================================================================================== */
//
//		ObsceneClean settings file
//
//
$MaxInputStringSize = 7200;		// Maximum allowed in bytes. If premature ending reduce this parm or raise PHP memory_limit.
$DataDir = ""; 							// String - Default is subdir.  Location of data files. Recommend dir outside web folders. Change this setting only if you want a custom location for the ObsceneClean data files. Enter the full path to the data files directory. Do NOT enter a trailing slash!  Ensure the data files are accessible by the ObsceneClean programs.
$highprobability = 80;				// int - Default 80. Exit when probability of OWs rise above x percent, specify x.
$DefaultAmbiguousOWProbability = 50;		// if no criteria or rule applies to an ambiguous OW this will be default probability.
$DefaultUnAmbiguousOWProbability = 90;		// if no criteria or rule applies to an UNambiguous OW this will be default probability. 
$DefaultDisguisedUnAmbiguousOWProbability = 85;
$DefaultAmbiguousOWwithInsultProbability = 80;
$LowestSevConsidered = 1;	// OWs under this severity will not even be considered at all in detection. The parm $lowestsev is the lowest severity for a OW that will be reported.
$InputStringIsBig = 1200;   // 
$UseSoundAlikes = TRUE;		// Boolean - Default true. like phuck
$JustTooManyOWsAddProb = 5; // If above Threshold breached add this number to all OW's probabilities.
$errorchecking = FALSE;		//  Boolean - Default false. Do Error Checking.
$folddiacritics = TRUE; 		// Boolean - Default true. Perform Diacritical Folding on 2 digit Latin codes. Diacritics or accent marks are ignored.
$foldUNICODEdiacritics = TRUE; 		// Boolean - Default true. Perform Diacritical Folding on Unicode characters. Input string must be UTF-8 or this is ignored. Diacritics or accent marks are ignored.
$uselookalikes = TRUE;		// Boolean - Default true. Use lookalike chars. in matching
$usethindisguise = TRUE; 	// Boolean - Default true. Use the thinly (or good) disguising characters like space, dash or underscore. 
$usepoordisguise = TRUE; 	// Boolean - Default false. Use poor disguising characters like $ and #.  If this is set to true and '$usethindisguise' is false, $usethindisguise will be automatically set to true. 
$loglevel = "0";			// Int - Loglevel 0 -5 
$MinCheckstrSize = 2;					// Do not check strings of this size or smaller
$TooManyUniqueOWs = 3;					// Tolerate how many unique (sorted & reduced) OWs? 
$AllowablePercentOfOWcharsInStr = 70;	// lower number, like 40%, likely to produce false positives.  At 40%, a quick email "chink your logs" produces false positive.
$UseGreedyREGEXs = 0;					//  Non-greedy DOES seem to find all instances of a specific OW.
$VowelSubstitutionRule = 1;				// 0=Don't use rule. 1=Any one vowel can be replaced by $VowelSubstitutionChars and it will be matched 2= any two vowels can be replaced, 3=Any one vowel can be replaced unless the OW >=  $VowelSubstitutionLen . if 3 & len < $VowelSubstitutionLen  then defaults to 1 
$VowelSubstitutionLen = 7;
$VowelSubstitutionChars = "\*";			// User tries to substitute any one vowel with one of these chars. Yes, the chars. are escaped! Slash escaped for a PHP 'preg' regex IS 4 slashes! problems with putting slashes in this var!?
$SubstitutableChars = array("a", "b",             "e", 	"i",                            "o",              "s",                        "c",                      "u"	);		// likely disguise chars using substitution with special chars. Do NOT put letters with diacritics here!
$SubstitutionChars  = array('@', '\xDF\xFE',  "3", 	'1!\xA1\xEC-\xEF',	'0\xF8\xF0', '$5\xA7\x{02E2}', '\xA2\xA9\x80',  '\xB5'); 
// SubstitutionChars 	         "@" "ßþ"             "3"		"1!¡"				    	    "0øð"            "$5§ˢ"                   "¢©€"                 	"µ"	   
$MaxOWLetterRepeat = 15;				// how many times can each letter of OW repeat? e.g. FFUUUUUCCCCKKK
$MaxPoorDisguisingChars = 2;			// how many poor disguising chars. will be considered?
$MaxGoodDisguisingChars = 15;			// how many good disguising chars. will be considered?
$PoorDisguisingCharsOnly = '!@#$%&*\/\+=\(\)\\\\';		// disguise chars. usually used to seperate letters of OW, e.g. P_O-O P on you!!
$GoodDisguisingCharsOnly = '-_^.,`~\'\s';				// chars escaped for PHP 'preg' REGEXs !  Caret cannot be 1st in char class. BTW, don't ever make an asterisk a GoodDisguise char!
$BoundedOWLenLimit = 9;  				// a multiplier, e.g. a disguised OW can only be 10x its undisguised length and no more in order to be detected
$DisguiseCharAlternationFactor = 2;		// How many times can disguised OW have alternating good and bad disguise chars.? This used in formation of regex template.
$QtyRuleWordsLenChk = 30;				// Check 30 chars. preceding OW for mostly  quantitative words. Use this as a measure: Avg. word is 5 chars plus one space = 6 chars per word
$QtyRuleThreshold = 2; 					// How many quantifying words must accompany OW to add weight to OW probability?
$QtyRuleThresholdLow = 1; 				// must be lower then  $QtyRuleThreshold  !!!!!!!!!
$QtyRuleDirection = 2;					// look for quantifying  words before, after or both before and after OW. Search direction (2=BEFORE, 1=AFTER, 0=BOTH).
$EvaluateAntagonism = TRUE;				// 
$AntagonismLow = 3;						// This threshold. MUST be higher than 1 because 1 antagonistic word imeans absolutley nothing by itself and that is one of the main concepts of antagonism.
$AntagonismHigh = 6;					// This threshold. MUST be higher than $AntagonismLow. When combined with other factors, low antagonism can be meaningful
$AntagonismVeryHigh = 8;					// This threshold. MUST be higher than $AntagonismHigh.
$AntagonismExitOnVeryHigh = 0;				// If antagonism very high then exit if this parm  true
$CorrelatedAntagonisticWordsThreshold = 3;	// If ambiguous OW match found AND >= X num of 'related' Antagonistic words found then exit if quckcheck
$ReportAntagonism = 1;						// report antagonism 
$ReportInsults = 1;
$InsultingWeightThreshold = 70; 				// all insults have a weight, Ambiguous insults like 'dog' are counted once and have lesser weights
$InsultingWeightThresholdLow = 50;				// must be lower then $InsultingRuleThreshold !!!
$InsultingUniqueThreshold = 3; 
$InsultingRuleDirection = 0;					//  look for insulting  words before, after or both before and after OW. Search direction (2=BEFORE, 1=AFTER, 0=BOTH).
$TotalInsultingCount = 0;						// If insulting terms are likely it may help to set this initially to 10 or 20.
$OverallProbability = 0;						// Initial overall probability of OWs. This proper setting for this parm depends on the websiste. What is the probability of OWs on the website to begin with? If high set this to 20 or 30.  If low set to 10 or zero. Probability of all OWs are added to this number. If none are added this will be the final Overall probability of Ows.
$UseSpecialRules = 1;						// Special rules will help reduce false positives but they are more resource intensive. Special rules generally help detect ambiguous, double meaning OWs, like 'fag' which can mean cigarette in the UK.
$UseSpecialRule6 = 1;						// Use of this rule may open many large files but it will also prevent unsual false positives such as the use of the French surname 'Bastard' or the use of the Chinese name 'Cum Moon'.
$CharsUsedForRecognition = 6;				// Must be an even number! How many  chars  to look at before and after matched OW for purposes of reconizability. This number will be multiplied based on OW's severity
$WhitespaceArray = array("\x20", "\x09", "\x00", "\x0A", "\x0D", "\x0B", "\xA0");
$SeverityOfMoreRecognizableOWs = 8;		// Lower num is stricter but more false positives likely. set as high as 11 to make OWs less easily recognized
$NumCharsUsedForRecognition = 6;		// Look at num chars before and after the match to determine recognizability
$NumCharsRecogDivisor = 2;  		// $NumCharsUsedForRecognition divided by this num produces a string len(of chars before and after the match)short enough to be recognized. i.e. 'sdbitchlo' chars before and after, although randomised are short enough to make OW recognizable
$NonRepetitiveFactor = 3.00;			// A threshold against the repetitiveness of chars that appear before and after a matched OW. If set above 3.0, more likely to produce false negatives.  Use only 2 decimals. E.g. 'gasdfgfagsdjhadh' is NOT recognizable but aaaaafagaaaa is more recognizable.
$KingJamesBibleSafe = 1;				// Ignore all OWs in King James Bible. This option will slow down the detection.
$ShakespeareSafe = 1;					// Ignore all OWs in the complete works of Shakespeare. This option will slow down the detection. This option can only be used with modern versions of the works. Older publications, such as from the 17th century may not work. Does not apply to discussions of Shakspeare.
$SearchForOEs = 1;						// search for offensive expressions (OEs). OES consist of multiple words or combo words like 'futhermucker'.
$ReplaceOEsChar = 2;					// 1=asterisks, 2=dashes
$AllowableOWLenAsPercentOfMatch = 10;   // if the match is only a small percent of the OW being searched for then the match is too ridiculously long to be recognizable
?>



Return current item: ObsceneClean