Publikationer av Giampiero Salvi

Refereegranskade

Artiklar

[1]

M. Adiban, S. M. Siniscalchi och G. Salvi, "A step-by-step training method for multi generator GANs with application to anomaly detection and cybersecurity," Neurocomputing, vol. 537, s. 296-308, 2023.

[2]

E. Stenwig et al., "Comparison of correctly and incorrectly classified patients for in-hospital mortality prediction in the intensive care unit," BMC Medical Research Methodology, vol. 23, no. 1, 2023.

[3]

Y. Getman et al., "Developing an AI-Assisted Low-Resource Spoken Language Learning App for Children," IEEE Access, vol. 11, s. 86025-86037, 2023.

[4]

A. S. Shahrebabaki et al., "Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models," IEEE/ACM transactions on audio, speech, and language processing, vol. 30, s. 135-147, 2022.

[5]

E. Stenwig et al., "Comparative analysis of explainable machine learning prediction models for hospital mortality," BMC Medical Research Methodology, vol. 22, no. 1, 2022.

[6]

J. Abdelnour, J. Rouat och G. Salvi, "NAAQA: A Neural Architecture for Acoustic Question Answering," IEEE Transactions on Pattern Analysis and Machine Intelligence, s. 1-12, 2022.

[7]

G. Saponaro et al., "Beyond the Self: Using Grounded Affordances to Interpret and Describe Others’ Actions," IEEE Transactions on Cognitive and Developmental Systems, vol. 12, no. 2, s. 209-221, 2020.

[8]

K. Stefanov, J. Beskow och G. Salvi, "Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition," IEEE Transactions on Cognitive and Developmental Systems, vol. 12, no. 2, s. 250-259, 2020.

[9]

A. Selamtzis et al., "Effect of vowel context in cepstral and entropy analysis of pathological voices," Biomedical Signal Processing and Control, vol. 47, s. 350-357, 2019.

[10]

K. Stefanov et al., "Modeling of Human Visual Attention in Multiparty Open-World Dialogues," ACM Transactions on Human-Robot Interaction, vol. 8, no. 2, 2019.

[11]

S. Strömbergsson, G. Salvi och D. House, "Acoustic and perceptual evaluation of category goodness of /t/ and /k/ in typical and misarticulated children's speech," Journal of the Acoustical Society of America, vol. 137, no. 6, s. 3422-3435, 2015.

[12]

C. Koniaris, G. Salvi och O. Engwall, "On mispronunciation analysis of individual foreign speakers using auditory periphery models," Speech Communication, vol. 55, no. 5, s. 691-706, 2013.

[13]

D. Neiberg, G. Salvi och J. Gustafson, "Semi-supervised methods for exploring the acoustics of simple productive feedback," Speech Communication, vol. 55, no. 3, s. 451-469, 2013.

[14]

G. Salvi et al., "Language bootstrapping : Learning Word Meanings From Perception-Action Association," IEEE transactions on systems, man and cybernetics. Part B. Cybernetics, vol. 42, no. 3, s. 660-671, 2012.

[15]

G. Salvi et al., "SynFace-Speech-Driven Facial Animation for Virtual Speech-Reading Support," Eurasip Journal on Audio, Speech, and Music Processing, vol. 2009, s. 191940, 2009.

[16]

G. Salvi, "Dynamic behaviour of connectionist speech recognition with strong latency constraints," Speech Communication, vol. 48, no. 7, s. 802-818, 2006.

[17]

G. Salvi, "Segment boundary detection via class entropy measurements in connectionist phoneme recognition," Speech Communication, vol. 48, no. 12, s. 1666-1676, 2006.

[18]

C. Siciliano et al., "Intelligibility of an ASR-controlled synthetic talking face," Journal of the Acoustical Society of America, vol. 115, no. 5, s. 2428, 2004.

[19]

G. Salvi, "Developing acoustic models for automatic speech recognition in swedish," The European Student Journal of Language and Speech, vol. 1, 1999.

Konferensbidrag

[20]

X. Cao et al., "An Analysis of Goodness of Pronunciation for Child Speech," i Interspeech 2023, 2023, s. 4613-4617.

[21]

J. Rugayan, G. Salvi och T. Svendsen, "Perceptual and Task-Oriented Assessment of a Semantic Metric for ASR Evaluation," i Interspeech 2023, 2023, s. 2158-2162.

[22]

M. Adiban et al., "Hierarchical Residual Learning Based Vector Quantized Variational Autoencorder for Image Reconstruction and Generation," i The 33^rd British Machine Vision Conference Proceedings, 2022.

[23]

J. Rugayan, T. Svendsen och G. Salvi, "Semantically Meaningful Metrics for Norwegian ASR Systems," i Interspeech,18-22 September 2022, Incheon, Korea, 2022.

[24]

Y. Getman et al., "wav2vec2-based Speech Rating System for Children with Speech Sound Disorder," i Interspeech, 2022.

[25]

A. Sabzi Shahrebabak et al., "A DNN Based Speech Enhancement Approach to Noise Robust Acoustic-to-Articulatory Inversion," i IEEE International Symposium on Circuits and Systems, 2021.

[26]

M. Adiban, A. Safari och G. Salvi, "Step-gan : A one-class anomaly detection model with applications to power system security," i ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2021, s. 2605-2609.

[27]

A. S. Shahrebabaki et al., "Sequence-to-sequence articulatory inversion through time convolution of sub-band frequency signals," i Interspeech, 2020, s. 2882-2886.

[28]

K. Stefanov, M. Adiban och G. Salvi, "Spatial bias in vision-based voice activity detection," i 2020 25th International Conference on Pattern Recognition (ICPR), 2020, s. 10433-10440.

[29]

A. S. Shahrebabaki et al., "Transfer learning of articulatory information through phone information," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2020, s. 2877-2881.

[30]

C. Zhang et al., "Active Mini-Batch Sampling Using Repulsive Point Processes," i AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, 2019, s. 5741-5748.

[31]

A. Castellana et al., "Cepstral and entropy analyses in vowels excerpted from continuous speech of dysphonic and control speakers," i Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech 2017, 2017, s. 1814-1818.

[32]

G. Saponaro et al., "Interactive Robot Learning of Gestures, Language and Affordances," i Grounding Language Understanding, 2017.

[33]

A. Fahlström Myrman och G. Salvi, "Partitioning of Posteriorgrams using Siamese Models for Unsupervised Acoustic Modelling," i Grounding Language Understanding, 2017.

[34]

A. Kumar Dhaka och G. Salvi, "Sparse Autoencoder Based Semi-Supervised Learning for Phone Classification with Limited Annotations," i Grounding Language Understanding, 2017.

[35]

K. Stefanov, J. Beskow och G. Salvi, "Vision-based Active Speaker Detection in Multiparty Interaction," i Grounding Language Understanding, 2017.

[36]

G. Salvi, "An Analysis of Shallow and Deep Representations of Speech Based on Unsupervised Classification of Isolated Words," i Recent Advances in Nonlinear Speech Processing, 2016, s. 151-157.

[37]

J. Lopes et al., "Detecting Repetitions in Spoken Dialogue Systems Using Phonetic Distances," i INTERSPEECH-2015, 2015, s. 1805-1809.

[38]

A. Pieropan et al., "A dataset of human manipulation actions," i ICRA 2014 Workshop on Autonomous Grasping and Manipulation : An Open Challenge, 2014, 2014.

[39]

A. Pieropan et al., "Audio-Visual Classification and Detection of Human Manipulation Actions," i 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014), 2014, s. 3045-3052.

[40]

N. Vanhainen och G. Salvi, "Free Acoustic and Language Models for Large Vocabulary Continuous Speech Recognition in Swedish," i Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 2014.

[41]

S. Strömbergsson, G. Salvi och D. House, "Gradient evaluation of /k/-likeness in typical and misarticulated child speech," i Proceedings of ICPLA 2014, 2014.

[42]

N. Vanhainen och G. Salvi, "Pattern Discovery in Continuous Speech Using Block Diagonal Infinite HMM," i 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014; Florence; Italy; 4 May 2014 through 9 May 2014, 2014, s. 3719-3723.

[43]

G. Salvi och N. Vanhainen, "The WaveSurfer Automatic Speech Recognition Plugin," i LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, s. 3067-3071.

[44]

C. Oertel och G. Salvi, "A Gaze-based Method for Relating Group Involvement to Individual Engagement in Multimodal Multiparty Dialogue," i ICMI 2013 - Proceedings of the 2013 ACM International Conference on Multimodal Interaction, 2013, s. 99-106.

[45]

G. Salvi, "Biologically Inspired Methods for Automatic Speech Understanding," i Biologically Inspired Cognitive Architectures 2012, 2013, s. 283-286.

[46]

G. Saponaro, G. Salvi och A. Bernardino, "Robot anticipation of human intentions through continuous gesture recognition," i Proceedings of the 2013 International Conference on Collaboration Technologies and Systems, CTS 2013, 2013, s. 218-225.

[47]

C. Oertel et al., "The KTH Games Corpora : How to Catch a Werewolf," i IVA 2013 Workshop Multimodal Corpora: Beyond Audio and Video : MMC 2013, 2013.

[48]

C. Koniaris, O. Engwall och G. Salvi, "Auditory and Dynamic Modeling Paradigms to Detect L2 Mispronunciations," i 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 1, 2012, s. 898-901.

[49]

C. Koniaris, O. Engwall och G. Salvi, "On the Benefit of Using Auditory Modeling for Diagnostic Evaluation of Pronunciations," i International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden, June 6-8, 2012, 2012, s. 59-64.

[50]

N. Vanhainen och G. Salvi, "Word Discovery with Beta Process Factor Analysis," i 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 1, 2012, s. 798-801.

[51]

G. Salvi et al., "Analisi Gerarchica degli Inviluppi Spettrali Differenziali di una Voce Emotiva," i 7° convegno AISV, Contesto comunicativo e variabilità nella produzione e percezione della lingua (AISV). Lecce, Italy. 26 Gennaio - 28 Gennaio 2011, 2011.

[52]

G. Ananthakrishnan och G. Salvi, "Using Imitation to learn Infant-Adult Acoustic Mappings," i 12th Annual Conference Of The International Speech Communication Association 2011 (INTERSPEECH 2011), Vols 1-5, 2011, s. 772-775.

[53]

G. Salvi et al., "Cluster Analysis of Differential Spectral Envelopes on Emotional Speech," i 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-4, 2010, s. 322-325.

[54]

V. Krunic et al., "Affordance based word-to-meaning association," i ICRA : 2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, 2009, s. 4138-4143.

[55]

J. Beskow, G. Salvi och S. Al Moubayed, "SynFace : Verbal and Non-verbal Face Animation from Audio," i Proceedings of The International Conference on Auditory-Visual Speech Processing AVSP'09, 2009.

[56]

J. Beskow, G. Salvi och S. Al Moubayed, "SynFace - Verbal and Non-verbal Face Animation from Audio," i Auditory-Visual Speech Processing 2009, AVSP 2009, 2009.

[57]

S. Al Moubayed et al., "Virtual Speech Reading Support for Hard of Hearing in a Domestic Multi-Media Setting," i INTERSPEECH 2009 : 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, 2009, s. 1443-1446.

[58]

V. Krunic et al., "Associating word descriptions to learned manipulation task models," i IEEE/RSJ International Conference on Intelligent RObots and Systems (IROS), 2008.

[59]

J. Beskow et al., "Hearing at Home : Communication support in home environments for hearing impaired persons," i INTERSPEECH 2008 : 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, 2008, s. 2203-2206.

[60]

E. Agelfors et al., "User evaluation of the SYNFACE talking head telephone," i Computers Helping People With Special Needs, Proceedings, 2006, s. 579-586.

[61]

G. Salvi, "Advances in regional accent clustering in Swedish," i Proceedings of European Conference on Speech Communication and Technology (Eurospeech), 2005, s. 2841-2844.

[62]

G. Salvi, "Ecological language acquisition via incremental model-based clustering," i Proceedings of European Conference on Speech Communication and Technology (Eurospeech), 2005, s. 1181-1184.

[63]

G. Salvi, "Segment boundaries in low latency phonetic recognition," i NONLINEAR ANALYSES AND ALGORITHMS FOR SPEECH PROCESSING, 2005, s. 267-276.

[64]

J. Beskow et al., "SYNFACE - A talking head telephone for the hearing-impaired," i COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS : PROCEEDINGS, 2004, s. 1178-1185.

[65]

K.-E. Spens et al., "SYNFACE, a talking head telephone for the hearing impaired," i IFHOH 7th World Congress for the Hard of Hearing. Helsinki Finland. July 4-9, 2004, 2004.

[66]

G. Salvi, "Accent clustering in Swedish using the Bhattacharyya distance," i Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS), Barcelona Spain, 2003, s. 1149-1152.

[67]

I. Karlsson, A. Faulkner och G. Salvi, "SYNFACE - a talking face telephone," i Proceedings of EUROSPEECH 2003, 2003, s. 1297-1300.

[68]

G. Salvi, "Truncation error and dynamics in very low latency phonetic recognition," i Proceedings of Non Linear Speech Processing (NOLISP), 2003.

[69]

G. Salvi, "Using accent information in ASR models for Swedish," i Proceedings of INTERSPEECH'2003, 2003, s. 2677-2680.

[70]

F. T. Johansen et al., "The cost 249 speechdat multilingual reference recogniser," i In Proceedings of XLDB Workshop on Very Large Telephone Speech Databases, 2000.

[71]

F. T. Johansen et al., "The cost 249 speechdat multilingual reference recogniser," i In Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2000.

[72]

B. Lindberg et al., "a noise robust multilingual reference recogniser based on speechdat(II)," i In Proceedings of the International Conference on Spoken Language Processing (ICSLP), 2000.

[73]

E. Agelfors et al., "A synthetic face as a lip-reading support for hearing impaired telephone users - problems and positive results," i European audiology in 1999 : proceeding of the 4th European Conference in Audiology, Oulu, Finland, June 6-10, 1999, 1999.

[74]

E. Agelfors et al., "Synthetic visual speech driven from auditory speech," i Proceedings of Audio-Visual Speech Processing (AVSP'99)), 1999.

Kapitel i böcker

[75]

D. S. Ásgrímsson et al., "Bayesian Deep Learning for Vibration-Based Bridge Damage Detection," i Structural Integrity, : Springer Nature, 2022, s. 27-43.

Icke refereegranskade

Artiklar

[76]

G. Salvi och S. Al Moubayed, "Spoken Language Identification using Frame Based Entropy Measures," TMH-QPSR, vol. 51, no. 1, s. 69-72, 2011.

[77]

T. Öhman och G. Salvi, "Using HMMs and ANNs for mapping acoustic to visual speech," TMH-QPSR, vol. 40, no. 1-2, s. 45-50, 1999.

Konferensbidrag

[78]

S. Al Moubayed et al., "Studies on Using the SynFace Talking Head for the Hearing Impaired," i Proceedings of Fonetik'09 : The XXIIth Swedish Phonetics Conference, June 10-12, 2009, 2009, s. 140-143.

[79]

B. Lindblom et al., "(Re)use of place features in voiced stop systems : Role of phonetic constraints," i Proceedings of Fonetik 2008, 2008, s. 5-8.

[80]

S. Al Moubayed, J. Beskow och G. Salvi, "SynFace Phone Recognizer for Swedish Wideband and Narrowband Speech," i Proceedings of The second Swedish Language Technology Conference (SLTC), 2008, s. 3-6.

Kapitel i böcker

[81]

B. Lindblom et al., "Sound systems are shaped by their users : The recombination of phonetic substance," i Where Do Phonological Features Come From? : Cognitive, physical and developmental bases of distinctive speech categories, G. Nick Clements, G. N.; Ridouane, R. red., : John Benjamins Publishing Company, 2011, s. 67-97.

Avhandlingar

[82]

G. Salvi, "Mining Speech Sounds : Machine Learning Methods for Automatic Speech Recognition and Analysis," Doktorsavhandling Stockholm : KTH, Trita-CSC-A, 2006:12, 2006.

Övriga

[83]

K. Stefanov et al., "Analysis and Generation of Candidate Gaze Targets in Multiparty Open-World Dialogues," (Manuskript).

[84]

K. Stefanov, J. Beskow och G. Salvi, "Self-Supervised Vision-Based Detection of the Active Speaker as a Prerequisite for Socially-Aware Language Acquisition," (Manuskript).

Senaste synkning med DiVA:

2024-04-21 00:22:46