Hoppa till huvudinnehållet

Publikationer av Giampiero Salvi

Refereegranskade

Artiklar

[1]
M. Adiban, S. M. Siniscalchi och G. Salvi, "A step-by-step training method for multi generator GANs with application to anomaly detection and cybersecurity," Neurocomputing, vol. 537, s. 296-308, 2023.
[3]
Y. Getman et al., "Developing an AI-Assisted Low-Resource Spoken Language Learning App for Children," IEEE Access, vol. 11, s. 86025-86037, 2023.
[4]
A. S. Shahrebabaki et al., "Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models," IEEE/ACM transactions on audio, speech, and language processing, vol. 30, s. 135-147, 2022.
[5]
E. Stenwig et al., "Comparative analysis of explainable machine learning prediction models for hospital mortality," BMC Medical Research Methodology, vol. 22, no. 1, 2022.
[6]
J. Abdelnour, J. Rouat och G. Salvi, "NAAQA: A Neural Architecture for Acoustic Question Answering," IEEE Transactions on Pattern Analysis and Machine Intelligence, s. 1-12, 2022.
[7]
G. Saponaro et al., "Beyond the Self: Using Grounded Affordances to Interpret and Describe Others’ Actions," IEEE Transactions on Cognitive and Developmental Systems, vol. 12, no. 2, s. 209-221, 2020.
[8]
K. Stefanov, J. Beskow och G. Salvi, "Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition," IEEE Transactions on Cognitive and Developmental Systems, vol. 12, no. 2, s. 250-259, 2020.
[9]
A. Selamtzis et al., "Effect of vowel context in cepstral and entropy analysis of pathological voices," Biomedical Signal Processing and Control, vol. 47, s. 350-357, 2019.
[10]
K. Stefanov et al., "Modeling of Human Visual Attention in Multiparty Open-World Dialogues," ACM Transactions on Human-Robot Interaction, vol. 8, no. 2, 2019.
[11]
S. Strömbergsson, G. Salvi och D. House, "Acoustic and perceptual evaluation of category goodness of /t/ and /k/ in typical and misarticulated children's speech," Journal of the Acoustical Society of America, vol. 137, no. 6, s. 3422-3435, 2015.
[12]
C. Koniaris, G. Salvi och O. Engwall, "On mispronunciation analysis of individual foreign speakers using auditory periphery models," Speech Communication, vol. 55, no. 5, s. 691-706, 2013.
[13]
D. Neiberg, G. Salvi och J. Gustafson, "Semi-supervised methods for exploring the acoustics of simple productive feedback," Speech Communication, vol. 55, no. 3, s. 451-469, 2013.
[14]
G. Salvi et al., "Language bootstrapping : Learning Word Meanings From Perception-Action Association," IEEE transactions on systems, man and cybernetics. Part B. Cybernetics, vol. 42, no. 3, s. 660-671, 2012.
[15]
G. Salvi et al., "SynFace-Speech-Driven Facial Animation for Virtual Speech-Reading Support," Eurasip Journal on Audio, Speech, and Music Processing, vol. 2009, s. 191940, 2009.
[16]
G. Salvi, "Dynamic behaviour of connectionist speech recognition with strong latency constraints," Speech Communication, vol. 48, no. 7, s. 802-818, 2006.
[17]
G. Salvi, "Segment boundary detection via class entropy measurements in connectionist phoneme recognition," Speech Communication, vol. 48, no. 12, s. 1666-1676, 2006.
[18]
C. Siciliano et al., "Intelligibility of an ASR-controlled synthetic talking face," Journal of the Acoustical Society of America, vol. 115, no. 5, s. 2428, 2004.
[19]
G. Salvi, "Developing acoustic models for automatic speech recognition in swedish," The European Student Journal of Language and Speech, vol. 1, 1999.

Konferensbidrag

[20]
X. Cao et al., "An Analysis of Goodness of Pronunciation for Child Speech," i Interspeech 2023, 2023, s. 4613-4617.
[21]
J. Rugayan, G. Salvi och T. Svendsen, "Perceptual and Task-Oriented Assessment of a Semantic Metric for ASR Evaluation," i Interspeech 2023, 2023, s. 2158-2162.
[22]
M. Adiban et al., "Hierarchical Residual Learning Based Vector Quantized Variational Autoencorder for Image Reconstruction and Generation," i The 33rd British Machine Vision Conference Proceedings, 2022.
[23]
J. Rugayan, T. Svendsen och G. Salvi, "Semantically Meaningful Metrics for Norwegian ASR Systems," i Interspeech,18-22 September 2022, Incheon, Korea, 2022.
[25]
A. Sabzi Shahrebabak et al., "A DNN Based Speech Enhancement Approach to Noise Robust Acoustic-to-Articulatory Inversion," i IEEE International Symposium on Circuits and Systems, 2021.
[26]
M. Adiban, A. Safari och G. Salvi, "Step-gan : A one-class anomaly detection model with applications to power system security," i ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2021, s. 2605-2609.
[28]
K. Stefanov, M. Adiban och G. Salvi, "Spatial bias in vision-based voice activity detection," i 2020 25th International Conference on Pattern Recognition (ICPR), 2020, s. 10433-10440.
[29]
A. S. Shahrebabaki et al., "Transfer learning of articulatory information through phone information," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2020, s. 2877-2881.
[30]
C. Zhang et al., "Active Mini-Batch Sampling Using Repulsive Point Processes," i AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, 2019, s. 5741-5748.
[31]
A. Castellana et al., "Cepstral and entropy analyses in vowels excerpted from continuous speech of dysphonic and control speakers," i Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech 2017, 2017, s. 1814-1818.
[32]
G. Saponaro et al., "Interactive Robot Learning of Gestures, Language and Affordances," i Grounding Language Understanding, 2017.
[33]
A. Fahlström Myrman och G. Salvi, "Partitioning of Posteriorgrams using Siamese Models for Unsupervised Acoustic Modelling," i Grounding Language Understanding, 2017.
[34]
A. Kumar Dhaka och G. Salvi, "Sparse Autoencoder Based Semi-Supervised Learning for Phone Classification with Limited Annotations," i Grounding Language Understanding, 2017.
[35]
K. Stefanov, J. Beskow och G. Salvi, "Vision-based Active Speaker Detection in Multiparty Interaction," i Grounding Language Understanding, 2017.
[36]
[37]
J. Lopes et al., "Detecting Repetitions in Spoken Dialogue Systems Using Phonetic Distances," i INTERSPEECH-2015, 2015, s. 1805-1809.
[38]
A. Pieropan et al., "A dataset of human manipulation actions," i ICRA 2014 Workshop on Autonomous Grasping and Manipulation : An Open Challenge, 2014, 2014.
[39]
A. Pieropan et al., "Audio-Visual Classification and Detection of Human Manipulation Actions," i 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014), 2014, s. 3045-3052.
[40]
N. Vanhainen och G. Salvi, "Free Acoustic and Language Models for Large Vocabulary Continuous Speech Recognition in Swedish," i Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 2014.
[41]
S. Strömbergsson, G. Salvi och D. House, "Gradient evaluation of /k/-likeness in typical and misarticulated child speech," i Proceedings  of ICPLA 2014, 2014.
[42]
N. Vanhainen och G. Salvi, "Pattern Discovery in Continuous Speech Using Block Diagonal Infinite HMM," i 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014; Florence; Italy; 4 May 2014 through 9 May 2014, 2014, s. 3719-3723.
[43]
G. Salvi och N. Vanhainen, "The WaveSurfer Automatic Speech Recognition Plugin," i LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, s. 3067-3071.
[44]
C. Oertel och G. Salvi, "A Gaze-based Method for Relating Group Involvement to Individual Engagement in Multimodal Multiparty Dialogue," i ICMI 2013 - Proceedings of the 2013 ACM International Conference on Multimodal Interaction, 2013, s. 99-106.
[45]
G. Salvi, "Biologically Inspired Methods for Automatic Speech Understanding," i Biologically Inspired Cognitive Architectures 2012, 2013, s. 283-286.
[46]
G. Saponaro, G. Salvi och A. Bernardino, "Robot anticipation of human intentions through continuous gesture recognition," i Proceedings of the 2013 International Conference on Collaboration Technologies and Systems, CTS 2013, 2013, s. 218-225.
[47]
C. Oertel et al., "The KTH Games Corpora : How to Catch a Werewolf," i IVA 2013 Workshop Multimodal Corpora: Beyond Audio and Video : MMC 2013, 2013.
[48]
C. Koniaris, O. Engwall och G. Salvi, "Auditory and Dynamic Modeling Paradigms to Detect L2 Mispronunciations," i 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 1, 2012, s. 898-901.
[49]
C. Koniaris, O. Engwall och G. Salvi, "On the Benefit of Using Auditory Modeling for Diagnostic Evaluation of Pronunciations," i International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden, June 6-8, 2012, 2012, s. 59-64.
[50]
N. Vanhainen och G. Salvi, "Word Discovery with Beta Process Factor Analysis," i 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 1, 2012, s. 798-801.
[51]
G. Salvi et al., "Analisi Gerarchica degli Inviluppi Spettrali Differenziali di una Voce Emotiva," i 7° convegno AISV, Contesto comunicativo e variabilità nella produzione e percezione della lingua (AISV). Lecce, Italy. 26 Gennaio - 28 Gennaio 2011, 2011.
[52]
G. Ananthakrishnan och G. Salvi, "Using Imitation to learn Infant-Adult Acoustic Mappings," i 12th Annual Conference Of The International Speech Communication Association 2011 (INTERSPEECH 2011), Vols 1-5, 2011, s. 772-775.
[53]
G. Salvi et al., "Cluster Analysis of Differential Spectral Envelopes on Emotional Speech," i 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-4, 2010, s. 322-325.
[54]
V. Krunic et al., "Affordance based word-to-meaning association," i ICRA : 2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, 2009, s. 4138-4143.
[55]
J. Beskow, G. Salvi och S. Al Moubayed, "SynFace : Verbal and Non-verbal Face Animation from Audio," i Proceedings of The International Conference on Auditory-Visual Speech Processing AVSP'09, 2009.
[56]
J. Beskow, G. Salvi och S. Al Moubayed, "SynFace - Verbal and Non-verbal Face Animation from Audio," i Auditory-Visual Speech Processing 2009, AVSP 2009, 2009.
[57]
S. Al Moubayed et al., "Virtual Speech Reading Support for Hard of Hearing in a Domestic Multi-Media Setting," i INTERSPEECH 2009 : 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, 2009, s. 1443-1446.
[58]
V. Krunic et al., "Associating word descriptions to learned manipulation task models," i IEEE/RSJ International Conference on Intelligent RObots and Systems (IROS), 2008.
[59]
J. Beskow et al., "Hearing at Home : Communication support in home environments for hearing impaired persons," i INTERSPEECH 2008 : 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, 2008, s. 2203-2206.
[60]
E. Agelfors et al., "User evaluation of the SYNFACE talking head telephone," i Computers Helping People With Special Needs, Proceedings, 2006, s. 579-586.
[61]
G. Salvi, "Advances in regional accent clustering in Swedish," i Proceedings of European Conference on Speech Communication and Technology (Eurospeech), 2005, s. 2841-2844.
[62]
G. Salvi, "Ecological language acquisition via incremental model-based clustering," i Proceedings of European Conference on Speech Communication and Technology (Eurospeech), 2005, s. 1181-1184.
[63]
G. Salvi, "Segment boundaries in low latency phonetic recognition," i NONLINEAR ANALYSES AND ALGORITHMS FOR SPEECH PROCESSING, 2005, s. 267-276.
[64]
J. Beskow et al., "SYNFACE - A talking head telephone for the hearing-impaired," i COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS : PROCEEDINGS, 2004, s. 1178-1185.
[65]
K.-E. Spens et al., "SYNFACE, a talking head telephone for the hearing impaired," i IFHOH 7th World Congress for the Hard of Hearing. Helsinki Finland. July 4-9, 2004, 2004.
[66]
G. Salvi, "Accent clustering in Swedish using the Bhattacharyya distance," i Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS), Barcelona Spain, 2003, s. 1149-1152.
[67]
I. Karlsson, A. Faulkner och G. Salvi, "SYNFACE - a talking face telephone," i Proceedings of EUROSPEECH 2003, 2003, s. 1297-1300.
[68]
G. Salvi, "Truncation error and dynamics in very low latency phonetic recognition," i Proceedings of Non Linear Speech Processing (NOLISP), 2003.
[69]
G. Salvi, "Using accent information in ASR models for Swedish," i Proceedings of INTERSPEECH'2003, 2003, s. 2677-2680.
[70]
F. T. Johansen et al., "The cost 249 speechdat multilingual reference recogniser," i In Proceedings of XLDB Workshop on Very Large Telephone Speech Databases, 2000.
[71]
F. T. Johansen et al., "The cost 249 speechdat multilingual reference recogniser," i In Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2000.
[72]
B. Lindberg et al., "a noise robust multilingual reference recogniser based on speechdat(II)," i In Proceedings of the International Conference on Spoken Language Processing (ICSLP), 2000.
[73]
E. Agelfors et al., "A synthetic face as a lip-reading support for hearing impaired telephone users - problems and positive results," i European audiology in 1999 : proceeding of the 4th European Conference in Audiology, Oulu, Finland, June 6-10, 1999, 1999.
[74]
E. Agelfors et al., "Synthetic visual speech driven from auditory speech," i Proceedings of Audio-Visual Speech Processing (AVSP'99)), 1999.

Kapitel i böcker

[75]
D. S. Ásgrímsson et al., "Bayesian Deep Learning for Vibration-Based Bridge Damage Detection," i Structural Integrity, : Springer Nature, 2022, s. 27-43.

Icke refereegranskade

Artiklar

[76]
G. Salvi och S. Al Moubayed, "Spoken Language Identification using Frame Based Entropy Measures," TMH-QPSR, vol. 51, no. 1, s. 69-72, 2011.
[77]
T. Öhman och G. Salvi, "Using HMMs and ANNs for mapping acoustic to visual speech," TMH-QPSR, vol. 40, no. 1-2, s. 45-50, 1999.

Konferensbidrag

[78]
S. Al Moubayed et al., "Studies on Using the SynFace Talking Head for the Hearing Impaired," i Proceedings of Fonetik'09 : The XXIIth Swedish Phonetics Conference, June 10-12, 2009, 2009, s. 140-143.
[79]
B. Lindblom et al., "(Re)use of place features in voiced stop systems : Role of phonetic constraints," i Proceedings of Fonetik 2008, 2008, s. 5-8.
[80]
S. Al Moubayed, J. Beskow och G. Salvi, "SynFace Phone Recognizer for Swedish Wideband and Narrowband Speech," i Proceedings of The second Swedish Language Technology Conference (SLTC), 2008, s. 3-6.

Kapitel i böcker

[81]
B. Lindblom et al., "Sound systems are shaped by their users : The recombination of phonetic substance," i Where Do Phonological Features Come From? : Cognitive, physical and developmental bases of distinctive speech categories, G. Nick Clements, G. N.; Ridouane, R. red., : John Benjamins Publishing Company, 2011, s. 67-97.

Avhandlingar

[82]
G. Salvi, "Mining Speech Sounds : Machine Learning Methods for Automatic Speech Recognition and Analysis," Doktorsavhandling Stockholm : KTH, Trita-CSC-A, 2006:12, 2006.
Senaste synkning med DiVA:
2024-04-21 00:22:46