Skip to main content
Till KTH:s startsida Till KTH:s startsida

Publications by Giampiero Salvi

Refereegranskade

Artiklar

[1]
M. Adiban, S. M. Siniscalchi and G. Salvi, "A step-by-step training method for multi generator GANs with application to anomaly detection and cybersecurity," Neurocomputing, vol. 537, pp. 296-308, 2023.
[3]
Y. Getman et al., "Developing an AI-Assisted Low-Resource Spoken Language Learning App for Children," IEEE Access, vol. 11, pp. 86025-86037, 2023.
[4]
A. S. Shahrebabaki et al., "Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models," IEEE/ACM transactions on audio, speech, and language processing, vol. 30, pp. 135-147, 2022.
[5]
E. Stenwig et al., "Comparative analysis of explainable machine learning prediction models for hospital mortality," BMC Medical Research Methodology, vol. 22, no. 1, 2022.
[6]
J. Abdelnour, J. Rouat and G. Salvi, "NAAQA: A Neural Architecture for Acoustic Question Answering," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1-12, 2022.
[7]
G. Saponaro et al., "Beyond the Self: Using Grounded Affordances to Interpret and Describe Others’ Actions," IEEE Transactions on Cognitive and Developmental Systems, vol. 12, no. 2, pp. 209-221, 2020.
[8]
K. Stefanov, J. Beskow and G. Salvi, "Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition," IEEE Transactions on Cognitive and Developmental Systems, vol. 12, no. 2, pp. 250-259, 2020.
[9]
A. Selamtzis et al., "Effect of vowel context in cepstral and entropy analysis of pathological voices," Biomedical Signal Processing and Control, vol. 47, pp. 350-357, 2019.
[10]
K. Stefanov et al., "Modeling of Human Visual Attention in Multiparty Open-World Dialogues," ACM Transactions on Human-Robot Interaction, vol. 8, no. 2, 2019.
[11]
S. Strömbergsson, G. Salvi and D. House, "Acoustic and perceptual evaluation of category goodness of /t/ and /k/ in typical and misarticulated children's speech," Journal of the Acoustical Society of America, vol. 137, no. 6, pp. 3422-3435, 2015.
[12]
C. Koniaris, G. Salvi and O. Engwall, "On mispronunciation analysis of individual foreign speakers using auditory periphery models," Speech Communication, vol. 55, no. 5, pp. 691-706, 2013.
[13]
D. Neiberg, G. Salvi and J. Gustafson, "Semi-supervised methods for exploring the acoustics of simple productive feedback," Speech Communication, vol. 55, no. 3, pp. 451-469, 2013.
[14]
G. Salvi et al., "Language bootstrapping : Learning Word Meanings From Perception-Action Association," IEEE transactions on systems, man and cybernetics. Part B. Cybernetics, vol. 42, no. 3, pp. 660-671, 2012.
[15]
G. Salvi et al., "SynFace-Speech-Driven Facial Animation for Virtual Speech-Reading Support," Eurasip Journal on Audio, Speech, and Music Processing, vol. 2009, pp. 191940, 2009.
[16]
G. Salvi, "Dynamic behaviour of connectionist speech recognition with strong latency constraints," Speech Communication, vol. 48, no. 7, pp. 802-818, 2006.
[17]
G. Salvi, "Segment boundary detection via class entropy measurements in connectionist phoneme recognition," Speech Communication, vol. 48, no. 12, pp. 1666-1676, 2006.
[18]
C. Siciliano et al., "Intelligibility of an ASR-controlled synthetic talking face," Journal of the Acoustical Society of America, vol. 115, no. 5, pp. 2428, 2004.
[19]
G. Salvi, "Developing acoustic models for automatic speech recognition in swedish," The European Student Journal of Language and Speech, vol. 1, 1999.

Konferensbidrag

[20]
X. Cao et al., "An Analysis of Goodness of Pronunciation for Child Speech," in Interspeech 2023, 2023, pp. 4613-4617.
[21]
J. Rugayan, G. Salvi and T. Svendsen, "Perceptual and Task-Oriented Assessment of a Semantic Metric for ASR Evaluation," in Interspeech 2023, 2023, pp. 2158-2162.
[22]
M. Adiban et al., "Hierarchical Residual Learning Based Vector Quantized Variational Autoencorder for Image Reconstruction and Generation," in The 33rd British Machine Vision Conference Proceedings, 2022.
[23]
J. Rugayan, T. Svendsen and G. Salvi, "Semantically Meaningful Metrics for Norwegian ASR Systems," in Interspeech,18-22 September 2022, Incheon, Korea, 2022.
[25]
A. Sabzi Shahrebabak et al., "A DNN Based Speech Enhancement Approach to Noise Robust Acoustic-to-Articulatory Inversion," in IEEE International Symposium on Circuits and Systems, 2021.
[26]
M. Adiban, A. Safari and G. Salvi, "Step-gan : A one-class anomaly detection model with applications to power system security," in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2021, pp. 2605-2609.
[28]
K. Stefanov, M. Adiban and G. Salvi, "Spatial bias in vision-based voice activity detection," in 2020 25th International Conference on Pattern Recognition (ICPR), 2020, pp. 10433-10440.
[29]
A. S. Shahrebabaki et al., "Transfer learning of articulatory information through phone information," in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2020, pp. 2877-2881.
[30]
C. Zhang et al., "Active Mini-Batch Sampling Using Repulsive Point Processes," in AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, 2019, pp. 5741-5748.
[31]
A. Castellana et al., "Cepstral and entropy analyses in vowels excerpted from continuous speech of dysphonic and control speakers," in Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech 2017, 2017, pp. 1814-1818.
[32]
G. Saponaro et al., "Interactive Robot Learning of Gestures, Language and Affordances," in Grounding Language Understanding, 2017.
[33]
A. Fahlström Myrman and G. Salvi, "Partitioning of Posteriorgrams using Siamese Models for Unsupervised Acoustic Modelling," in Grounding Language Understanding, 2017.
[34]
A. Kumar Dhaka and G. Salvi, "Sparse Autoencoder Based Semi-Supervised Learning for Phone Classification with Limited Annotations," in Grounding Language Understanding, 2017.
[35]
K. Stefanov, J. Beskow and G. Salvi, "Vision-based Active Speaker Detection in Multiparty Interaction," in Grounding Language Understanding, 2017.
[36]
G. Salvi, "An Analysis of Shallow and Deep Representations of Speech Based on Unsupervised Classification of Isolated Words," in Recent Advances in Nonlinear Speech Processing, 2016, pp. 151-157.
[37]
J. Lopes et al., "Detecting Repetitions in Spoken Dialogue Systems Using Phonetic Distances," in INTERSPEECH-2015, 2015, pp. 1805-1809.
[38]
A. Pieropan et al., "A dataset of human manipulation actions," in ICRA 2014 Workshop on Autonomous Grasping and Manipulation : An Open Challenge, 2014, 2014.
[39]
A. Pieropan et al., "Audio-Visual Classification and Detection of Human Manipulation Actions," in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014), 2014, pp. 3045-3052.
[40]
N. Vanhainen and G. Salvi, "Free Acoustic and Language Models for Large Vocabulary Continuous Speech Recognition in Swedish," in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 2014.
[41]
S. Strömbergsson, G. Salvi and D. House, "Gradient evaluation of /k/-likeness in typical and misarticulated child speech," in Proceedings  of ICPLA 2014, 2014.
[42]
N. Vanhainen and G. Salvi, "Pattern Discovery in Continuous Speech Using Block Diagonal Infinite HMM," in 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014; Florence; Italy; 4 May 2014 through 9 May 2014, 2014, pp. 3719-3723.
[43]
G. Salvi and N. Vanhainen, "The WaveSurfer Automatic Speech Recognition Plugin," in LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, pp. 3067-3071.
[44]
C. Oertel and G. Salvi, "A Gaze-based Method for Relating Group Involvement to Individual Engagement in Multimodal Multiparty Dialogue," in ICMI 2013 - Proceedings of the 2013 ACM International Conference on Multimodal Interaction, 2013, pp. 99-106.
[45]
G. Salvi, "Biologically Inspired Methods for Automatic Speech Understanding," in Biologically Inspired Cognitive Architectures 2012, 2013, pp. 283-286.
[46]
G. Saponaro, G. Salvi and A. Bernardino, "Robot anticipation of human intentions through continuous gesture recognition," in Proceedings of the 2013 International Conference on Collaboration Technologies and Systems, CTS 2013, 2013, pp. 218-225.
[47]
C. Oertel et al., "The KTH Games Corpora : How to Catch a Werewolf," in IVA 2013 Workshop Multimodal Corpora: Beyond Audio and Video : MMC 2013, 2013.
[48]
C. Koniaris, O. Engwall and G. Salvi, "Auditory and Dynamic Modeling Paradigms to Detect L2 Mispronunciations," in 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 1, 2012, pp. 898-901.
[49]
C. Koniaris, O. Engwall and G. Salvi, "On the Benefit of Using Auditory Modeling for Diagnostic Evaluation of Pronunciations," in International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden, June 6-8, 2012, 2012, pp. 59-64.
[50]
N. Vanhainen and G. Salvi, "Word Discovery with Beta Process Factor Analysis," in 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 1, 2012, pp. 798-801.
[51]
G. Salvi et al., "Analisi Gerarchica degli Inviluppi Spettrali Differenziali di una Voce Emotiva," in 7° convegno AISV, Contesto comunicativo e variabilità nella produzione e percezione della lingua (AISV). Lecce, Italy. 26 Gennaio - 28 Gennaio 2011, 2011.
[52]
G. Ananthakrishnan and G. Salvi, "Using Imitation to learn Infant-Adult Acoustic Mappings," in 12th Annual Conference Of The International Speech Communication Association 2011 (INTERSPEECH 2011), Vols 1-5, 2011, pp. 772-775.
[53]
G. Salvi et al., "Cluster Analysis of Differential Spectral Envelopes on Emotional Speech," in 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-4, 2010, pp. 322-325.
[54]
V. Krunic et al., "Affordance based word-to-meaning association," in ICRA : 2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, 2009, pp. 4138-4143.
[55]
J. Beskow, G. Salvi and S. Al Moubayed, "SynFace : Verbal and Non-verbal Face Animation from Audio," in Proceedings of The International Conference on Auditory-Visual Speech Processing AVSP'09, 2009.
[56]
J. Beskow, G. Salvi and S. Al Moubayed, "SynFace - Verbal and Non-verbal Face Animation from Audio," in Auditory-Visual Speech Processing 2009, AVSP 2009, 2009.
[57]
S. Al Moubayed et al., "Virtual Speech Reading Support for Hard of Hearing in a Domestic Multi-Media Setting," in INTERSPEECH 2009 : 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, 2009, pp. 1443-1446.
[58]
V. Krunic et al., "Associating word descriptions to learned manipulation task models," in IEEE/RSJ International Conference on Intelligent RObots and Systems (IROS), 2008.
[59]
J. Beskow et al., "Hearing at Home : Communication support in home environments for hearing impaired persons," in INTERSPEECH 2008 : 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, 2008, pp. 2203-2206.
[60]
E. Agelfors et al., "User evaluation of the SYNFACE talking head telephone," in Computers Helping People With Special Needs, Proceedings, 2006, pp. 579-586.
[61]
G. Salvi, "Advances in regional accent clustering in Swedish," in Proceedings of European Conference on Speech Communication and Technology (Eurospeech), 2005, pp. 2841-2844.
[62]
G. Salvi, "Ecological language acquisition via incremental model-based clustering," in Proceedings of European Conference on Speech Communication and Technology (Eurospeech), 2005, pp. 1181-1184.
[63]
G. Salvi, "Segment boundaries in low latency phonetic recognition," in NONLINEAR ANALYSES AND ALGORITHMS FOR SPEECH PROCESSING, 2005, pp. 267-276.
[64]
J. Beskow et al., "SYNFACE - A talking head telephone for the hearing-impaired," in COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS : PROCEEDINGS, 2004, pp. 1178-1185.
[65]
K.-E. Spens et al., "SYNFACE, a talking head telephone for the hearing impaired," in IFHOH 7th World Congress for the Hard of Hearing. Helsinki Finland. July 4-9, 2004, 2004.
[66]
G. Salvi, "Accent clustering in Swedish using the Bhattacharyya distance," in Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS), Barcelona Spain, 2003, pp. 1149-1152.
[67]
I. Karlsson, A. Faulkner and G. Salvi, "SYNFACE - a talking face telephone," in Proceedings of EUROSPEECH 2003, 2003, pp. 1297-1300.
[68]
G. Salvi, "Truncation error and dynamics in very low latency phonetic recognition," in Proceedings of Non Linear Speech Processing (NOLISP), 2003.
[69]
G. Salvi, "Using accent information in ASR models for Swedish," in Proceedings of INTERSPEECH'2003, 2003, pp. 2677-2680.
[70]
F. T. Johansen et al., "The cost 249 speechdat multilingual reference recogniser," in In Proceedings of XLDB Workshop on Very Large Telephone Speech Databases, 2000.
[71]
F. T. Johansen et al., "The cost 249 speechdat multilingual reference recogniser," in In Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2000.
[72]
B. Lindberg et al., "a noise robust multilingual reference recogniser based on speechdat(II)," in In Proceedings of the International Conference on Spoken Language Processing (ICSLP), 2000.
[73]
E. Agelfors et al., "A synthetic face as a lip-reading support for hearing impaired telephone users - problems and positive results," in European audiology in 1999 : proceeding of the 4th European Conference in Audiology, Oulu, Finland, June 6-10, 1999, 1999.
[74]
E. Agelfors et al., "Synthetic visual speech driven from auditory speech," in Proceedings of Audio-Visual Speech Processing (AVSP'99)), 1999.

Kapitel i böcker

[75]
D. S. Ásgrímsson et al., "Bayesian Deep Learning for Vibration-Based Bridge Damage Detection," in Structural Integrity, : Springer Nature, 2022, pp. 27-43.

Icke refereegranskade

Artiklar

[76]
G. Salvi and S. Al Moubayed, "Spoken Language Identification using Frame Based Entropy Measures," TMH-QPSR, vol. 51, no. 1, pp. 69-72, 2011.
[77]
T. Öhman and G. Salvi, "Using HMMs and ANNs for mapping acoustic to visual speech," TMH-QPSR, vol. 40, no. 1-2, pp. 45-50, 1999.

Konferensbidrag

[78]
S. Al Moubayed et al., "Studies on Using the SynFace Talking Head for the Hearing Impaired," in Proceedings of Fonetik'09 : The XXIIth Swedish Phonetics Conference, June 10-12, 2009, 2009, pp. 140-143.
[79]
B. Lindblom et al., "(Re)use of place features in voiced stop systems : Role of phonetic constraints," in Proceedings of Fonetik 2008, 2008, pp. 5-8.
[80]
S. Al Moubayed, J. Beskow and G. Salvi, "SynFace Phone Recognizer for Swedish Wideband and Narrowband Speech," in Proceedings of The second Swedish Language Technology Conference (SLTC), 2008, pp. 3-6.

Kapitel i böcker

[81]
B. Lindblom et al., "Sound systems are shaped by their users : The recombination of phonetic substance," in Where Do Phonological Features Come From? : Cognitive, physical and developmental bases of distinctive speech categories, G. Nick Clements, G. N.; Ridouane, R. Ed., : John Benjamins Publishing Company, 2011, pp. 67-97.

Avhandlingar

[82]
G. Salvi, "Mining Speech Sounds : Machine Learning Methods for Automatic Speech Recognition and Analysis," Doctoral thesis Stockholm : KTH, Trita-CSC-A, 2006:12, 2006.
Senaste synkning med DiVA:
2024-05-07 00:02:42