PHP MEMO

読み込んだファイルデータのタブの判定処理

中括弧を使用して
\u{0009}
のように書くことでUnikodeコードポイントを使用した判定ができます。
php7.0より
内部の0(ゼロ)はなくてもあっても構いません。
\u{9}
でも動作します。
javascriptでは"\u0009"と書きますが、PHPでは\u{...} という中括弧(ブレース)が必要な形式となりました。
※php7.0以降より
そのため、単に "\u0009" と書くと、
PHPはその文字列を
「バックスラッシュ、u、0、0、0、9」
という単なる文字列として判定します。
「\t」を認識させるには、
必ずダブルクォーテーション（"）で囲む必要があります。
シングルクォーテーション（'t'）では単なる2文字として扱われます。

[正規表現のケース]

$text="タブの入った文字列";
if (preg_match('/\t/', $text)) {
echo "タブ文字が見つかりました";
}

copy

class test1
{
	function test1():void
	{
		//同一階層にphpが存在していると仮定します
		$path = "aaa.html";	
		$ary = [];
		$this->loadFile($ary, $path);
		for($i=0; $i<3; $i++)
		{
			$this->loadFileData($ary, $i);
			echo "<hr>";
		}
	}
	private function loadFileData(array $ary, int $kind)
	{
		$flg = 0;
		$data = "";
		switch($kind)
		{
			case 0:
				$data = "\\tタブ文字を使用して判定";
				break;
			case 1:
				$data = "タブ入力文字を使用して判定";
				break;
			case 2:
				$data = "Unicode エスケープシーケンスを使用して判定";
				break;
		}
		echo $data.PHP_EOL;
		for($i=0; $ijudgeTab(mb_substr($data, $j, 1), $kind) == 1)
				{
					$count++;
				}
			}
			if($count == 0) continue;
			//タブと判定したケースは、該当行インデックスとデータを出力
			echo "i:$i tab count:$count data:$data".PHP_EOL;
		}
	}
	//タブ判定処理
	//$data:1文字
	//$kind:判定種類
	//0:\tタブ文字を使用して判定
	//1:タブ入力文字を使用して判定
	//2:Unicode エスケープシーケンスを使用して判定
	private function judgeTab(string $data, $kind) : int
	{
		$result = 0;
		switch($kind)
		{
			case 0:
				if ($data === "\t")
				{
					$result = 1;
				}
				break;
			case 1:
				//タブ文字をダブルコーテーション内に設定して判定
				if ($data === "	")
				{
					$result = 1;
				}
				break;
			case 2:
				if ($data === "\u{0009}")
				{
					$result = 1;
				}
				break;
		}
		return $result;
	}
	private function loadFile(array &$ary, string $path):void
	{
		$ary = [];
        $file = fopen($path, "r");
        $tmp="";
        while(!feof($file))
        {
            $ary[]=fgets($file);
        }
        fclose($file);
	}
}
echo "<pre>";
$cls1= new test1();
$cls1->test1();
echo "</pre>";

copy

class test1
{
	function test1():void
	{
		//Assumes PHP exists at the same level
		$path = "aaa.html";
		$ary = [];
		$this->loadFile($ary, $path);
		for($i=0; $i<3; $i++)
		{
			$this->loadFileData($ary, $i);
		echo "<hr>";
		}
	}
	private function loadFileData(array $ary, int $kind)
	{
		$flg = 0;
		$data = "";
		switch($kind)
		{
			case 0:
				$data = "Test using a tab character";
				break;
			case 1:
				$data = "Test using a tab input character";
				break;
			case 2:
				$data = "Test using a Unicode escape sequence";
				break;
		}
		echo $data.PHP_EOL;
		for($i=0; $i < count($ary); $i++)
		{
			$data = $ary[$i];
			$length = mb_strlen($data);
			$count = 0;
			// Judge each character in the read line.
			for($j=0; $j < $length; $j++)
			{
				// Tab detection process
				if($this->judgeTab(mb_substr($data, $j, 1), $kind) == 1)
				{
					$count++;
				}
			}
			if($count == 0) continue;
			// If a tab is detected, output the corresponding line index and data.
			echo "i:$i tab count:$count data:$data".PHP_EOL;
		}
	}
	// Tab detection process
	//$data: 1 character
	//$kind: Judgement type
	//0: Judge using a tab character
	//1: Judge using a tab input character
	//2: Judge using a Unicode escape sequence
	private function judgeTab(string $data, $kind) : int
	{
		$result = 0;
		switch($kind)
		{
			case 0:
				if ($data === "\t")
				{
					$result = 1;
				}
				break;
			case 1:
				//Set a tab character in double quotes and check.
				if ($data === " ")
				{
					$result = 1;
				}
				break;
			case 2:
				if ($data === "\u{0009}")
				{
					$result = 1;
				}
				break;
		}
		return $result;
	}
	private function loadFile(array &$ary, string $path):void
	{
		$ary = [];
		$file = fopen($path, "r");
		$tmp="";
		while(!feof($file))
		{
			$ary[]=fgets($file);
		}
		fclose($file);
	}
}
echo "<pre>";
$cls1= new test1();
$cls1->test1();
echo "</pre>";

[結果]

\tタブ文字を使用して判定
i:4 tab count:1 data:	let lastNo = await lastPageNo();
i:5 tab count:1 data:	let no = 0;
i:8 tab count:1 data:	if(d > 0)
i:9 tab count:3 data:	{		
i:10 tab count:2 data:		if(no < 1)
i:11 tab count:2 data:		{
i:12 tab count:3 data:			no=1;
i:13 tab count:2 data:		}
i:14 tab count:2 data:		else if(no>lastNo)
i:15 tab count:2 data:		{
i:16 tab count:3 data:			no=lastNo;
i:17 tab count:2 data:		}
i:18 tab count:1 data:	}
i:19 tab count:1 data:	obj.value=no;
タブ入力文字を使用して判定
i:4 tab count:1 data:	let lastNo = await lastPageNo();
i:5 tab count:1 data:	let no = 0;
i:8 tab count:1 data:	if(d > 0)
i:9 tab count:3 data:	{		
i:10 tab count:2 data:		if(no < 1)
i:11 tab count:2 data:		{
i:12 tab count:3 data:			no=1;
i:13 tab count:2 data:		}
i:14 tab count:2 data:		else if(no>lastNo)
i:15 tab count:2 data:		{
i:16 tab count:3 data:			no=lastNo;
i:17 tab count:2 data:		}
i:18 tab count:1 data:	}
i:19 tab count:1 data:	obj.value=no;
Unicode エスケープシーケンスを使用して判定
i:4 tab count:1 data:	let lastNo = await lastPageNo();
i:5 tab count:1 data:	let no = 0;
i:8 tab count:1 data:	if(d > 0)
i:9 tab count:3 data:	{		
i:10 tab count:2 data:		if(no < 1)
i:11 tab count:2 data:		{
i:12 tab count:3 data:			no=1;
i:13 tab count:2 data:		}
i:14 tab count:2 data:		else if(no>lastNo)
i:15 tab count:2 data:		{
i:16 tab count:3 data:			no=lastNo;
i:17 tab count:2 data:		}
i:18 tab count:1 data:	}
i:19 tab count:1 data:	obj.value=no;

このサンプルではhtmlファイルを1行ずつ読み込み、そのデータを配列に追加したものを
(1) \tタブ文字を使用して判定
(2) タブ入力文字を使用して判定
(3) Unicode エスケープシーケンスを使用して判定
の3種類で判定しています。

$data = $ary[$i];

ファイルから読み込んだ行を取り出します。

$length = mb_strlen($data);

1行分の長さを取得します。

for($j=0; $j < $length; $j++)

1行文のデータを1文字ずつ判定するためfor文を使っています。

mb_substr($data, $j, 1)

1文字を取得します。

[今回のサンプルで使用した判定の種類]

(1) if ($data === "\t")

エスケープシーケンスのタブ文字を使用した判定処理

(2) if ($data === " ")

直接入力したタブ文字を使用した判定処理

(3) if ($data === "\u{0009}")

Unicodeエスケープシーケンスを使用した判定処理