Pattern theory for representation and inference of semantic structures in videos. - Citegraph

Paper Info

Title
Pattern theory for representation and inference of semantic structures in videos.

Abstract
The framework is robust to severe scenarios of feature classification errors.Flexible representation scheme to capture large range of structural variations.Interpretation performance improved the state-of-the-art in challenging scenarios. Semantic structures for video interpretation are formed using generators and bonds, the fundamental units of representation in pattern theory. Generators represent features and ontological concepts such as actions and objects whereas bonds are members of generators that encode ontological constraints and allow generators to connect to each other. The resulting configurations of connected generators provide scene interpretations; the inference goal is to parse a given video data and generate high-probability configurations. The probabilistic structures are imposed using energies that have contributions from both data (classification scores) and prior information (ontological constraints, co-occurrence frequencies, etc). The search for optimal configurations is based on an MCMC, simulated-annealing algorithm that uses simple moves to propose configuration changes and to accept/reject them according to the posterior energy. In contrast to current graphical methods, this framework does not preselect a neighborhood structure but tries to infer it from the data.Display Omitted We develop a combinatorial approach to represent and infer semantic interpretations of video contents using tools from Grenanders pattern theory. Semantic structures for video interpretation are formed using generators and bonds, the fundamental units of representation in pattern theory. Generators represent features and ontological items, such as actions and objects, whereas bonds are threads used to connect generators while respecting appropriate constraints. The resulting configurations of partially-connected generators are termed scene interpretations. Our goal is to parse a given video data set into high-probability configurations. The probabilistic models are imposed using energies that have contributions from both data (classification scores) and prior information (ontological constraints, co-occurrence frequencies, etc). The search for optimal configurations is based on an MCMC, simulated-annealing algorithm that uses simple moves to propose configuration changes and to accept/reject them according to the posterior energy. In contrast to current graphical methods, this framework does not preselect a neighborhood structure but tries to infer it from the data. The proposed framework is able to obtain 20% higher classification rates, compared to a purely machine learning-based baseline, despite artificial insertion of low-level processing errors. In an uncontrolled scenario, video interpretation performance rates are found to be double that of the baseline.

Year	DOI	Venue
2016	10.1016/j.patrec.2016.01.028	Pattern Recognition Letters
Keywords	Field	DocType
Pattern theory,Graphical methods,Compositional approach,Video interpretation,Activity recognition	Data mining,Ontology,Pattern theory,Markov chain Monte Carlo,Artificial intelligence,Probabilistic logic,Computer vision,Activity recognition,Pattern recognition,Inference,Thread (computing),Parsing,Mathematics	Journal
Volume	Issue	ISSN
72	C	0167-8655
Citations	PageRank	References
0	0.34	18
Authors
4

Authors (4 rows)

Cited by (0 rows)

References (18 rows)

Name	Order	Citations	PageRank
Fillipe Dias Moreira de Souza	1	55	5.81
Sudeep Sarkar	2	2839	317.68
Anuj Srivastava	3	2853	199.47
Jing-yong Su	4	156	10.93

1