Publikationer av Giampiero Salvi
Refereegranskade
Artiklar
[1]
A. S. Shahrebabaki et al., "Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models," IEEE/ACM transactions on audio, speech, and language processing, vol. 30, s. 135-147, 2022.
[2]
J. Abdelnour, J. Rouat och G. Salvi, "NAAQA: A Neural Architecture for Acoustic Question Answering," IEEE Transactions on Pattern Analysis and Machine Intelligence, s. 1-12, 2022.
[3]
G. Saponaro et al., "Beyond the Self: Using Grounded Affordances to Interpret and Describe Others’ Actions," IEEE Transactions on Cognitive and Developmental Systems, vol. 12, no. 2, s. 209-221, 2020.
[4]
K. Stefanov, J. Beskow och G. Salvi, "Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition," IEEE Transactions on Cognitive and Developmental Systems, vol. 12, no. 2, s. 250-259, 2020.
[5]
A. Selamtzis et al., "Effect of vowel context in cepstral and entropy analysis of pathological voices," Biomedical Signal Processing and Control, vol. 47, s. 350-357, 2019.
[6]
K. Stefanov et al., "Modeling of Human Visual Attention in Multiparty Open-World Dialogues," ACM Transactions on Human-Robot Interaction, vol. 8, no. 2, 2019.
[7]
S. Strömbergsson, G. Salvi och D. House, "Acoustic and perceptual evaluation of category goodness of /t/ and /k/ in typical and misarticulated children's speech," Journal of the Acoustical Society of America, vol. 137, no. 6, s. 3422-3435, 2015.
[8]
C. Koniaris, G. Salvi och O. Engwall, "On mispronunciation analysis of individual foreign speakers using auditory periphery models," Speech Communication, vol. 55, no. 5, s. 691-706, 2013.
[9]
D. Neiberg, G. Salvi och J. Gustafson, "Semi-supervised methods for exploring the acoustics of simple productive feedback," Speech Communication, vol. 55, no. 3, s. 451-469, 2013.
[10]
G. Salvi et al., "Language bootstrapping : Learning Word Meanings From Perception-Action Association," IEEE transactions on systems, man and cybernetics. Part B. Cybernetics, vol. 42, no. 3, s. 660-671, 2012.
[11]
G. Salvi et al., "SynFace-Speech-Driven Facial Animation for Virtual Speech-Reading Support," Eurasip Journal on Audio, Speech, and Music Processing, vol. 2009, s. 191940, 2009.
[12]
G. Salvi, "Dynamic behaviour of connectionist speech recognition with strong latency constraints," Speech Communication, vol. 48, no. 7, s. 802-818, 2006.
[13]
G. Salvi, "Segment boundary detection via class entropy measurements in connectionist phoneme recognition," Speech Communication, vol. 48, no. 12, s. 1666-1676, 2006.
[14]
C. Siciliano et al., "Intelligibility of an ASR-controlled synthetic talking face," Journal of the Acoustical Society of America, vol. 115, no. 5, s. 2428, 2004.
[15]
G. Salvi, "Developing acoustic models for automatic speech recognition in swedish," The European Student Journal of Language and Speech, vol. 1, 1999.
Konferensbidrag
[16]
M. Adiban et al., "Hierarchical Residual Learning Based Vector Quantized Variational Autoencorder for Image Reconstruction and Generation," i The 33rd British Machine Vision Conference Proceedings, 2022.
[17]
Y. Getman et al., "wav2vec2-based Speech Rating System for Children with Speech Sound Disorder," i Interspeech, 2022.
[18]
M. Adiban, A. Safari och G. Salvi, "Step-gan : A one-class anomaly detection model with applications to power system security," i ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2021, s. 2605-2609.
[19]
A. S. Shahrebabaki et al., "Sequence-to-sequence articulatory inversion through time convolution of sub-band frequency signals," i Interspeech, 2020, s. 2882-2886.
[20]
K. Stefanov, M. Adiban och G. Salvi, "Spatial bias in vision-based voice activity detection," i 2020 25th International Conference on Pattern Recognition (ICPR), 2020, s. 10433-10440.
[21]
A. S. Shahrebabaki et al., "Transfer learning of articulatory information through phone information," i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2020, s. 2877-2881.
[22]
C. Zhang et al., "Active Mini-Batch Sampling Using Repulsive Point Processes," i AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, 2019, s. 5741-5748.
[23]
A. Castellana et al., "Cepstral and entropy analyses in vowels excerpted from continuous speech of dysphonic and control speakers," i Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech 2017, 2017, s. 1814-1818.
[24]
G. Saponaro et al., "Interactive Robot Learning of Gestures, Language and Affordances," i Grounding Language Understanding, 2017.
[25]
A. Fahlström Myrman och G. Salvi, "Partitioning of Posteriorgrams using Siamese Models for Unsupervised Acoustic Modelling," i Grounding Language Understanding, 2017.
[26]
A. Kumar Dhaka och G. Salvi, "Sparse Autoencoder Based Semi-Supervised Learning for Phone Classification with Limited Annotations," i Grounding Language Understanding, 2017.
[27]
K. Stefanov, J. Beskow och G. Salvi, "Vision-based Active Speaker Detection in Multiparty Interaction," i Grounding Language Understanding, 2017.
[28]
G. Salvi, "An Analysis of Shallow and Deep Representations of Speech Based on Unsupervised Classification of Isolated Words," i Recent Advances in Nonlinear Speech Processing, 2016, s. 151-157.
[29]
J. Lopes et al., "Detecting Repetitions in Spoken Dialogue Systems Using Phonetic Distances," i INTERSPEECH-2015, 2015, s. 1805-1809.
[30]
A. Pieropan et al., "A dataset of human manipulation actions," i ICRA 2014 Workshop on Autonomous Grasping and Manipulation : An Open Challenge, 2014, 2014.
[31]
A. Pieropan et al., "Audio-Visual Classification and Detection of Human Manipulation Actions," i 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014), 2014, s. 3045-3052.
[32]
N. Vanhainen och G. Salvi, "Free Acoustic and Language Models for Large Vocabulary Continuous Speech Recognition in Swedish," i Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 2014.
[33]
S. Strömbergsson, G. Salvi och D. House, "Gradient evaluation of /k/-likeness in typical and misarticulated child speech," i Proceedings of ICPLA 2014, 2014.
[34]
N. Vanhainen och G. Salvi, "Pattern Discovery in Continuous Speech Using Block Diagonal Infinite HMM," i 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014; Florence; Italy; 4 May 2014 through 9 May 2014, 2014, s. 3719-3723.
[35]
G. Salvi och N. Vanhainen, "The WaveSurfer Automatic Speech Recognition Plugin," i LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, s. 3067-3071.
[36]
C. Oertel och G. Salvi, "A Gaze-based Method for Relating Group Involvement to Individual Engagement in Multimodal Multiparty Dialogue," i ICMI 2013 - Proceedings of the 2013 ACM International Conference on Multimodal Interaction, 2013, s. 99-106.
[37]
G. Salvi, "Biologically Inspired Methods for Automatic Speech Understanding," i Biologically Inspired Cognitive Architectures 2012, 2013, s. 283-286.
[38]
G. Saponaro, G. Salvi och A. Bernardino, "Robot anticipation of human intentions through continuous gesture recognition," i Proceedings of the 2013 International Conference on Collaboration Technologies and Systems, CTS 2013, 2013, s. 218-225.
[39]
C. Oertel et al., "The KTH Games Corpora : How to Catch a Werewolf," i IVA 2013 Workshop Multimodal Corpora: Beyond Audio and Video : MMC 2013, 2013.
[40]
C. Koniaris, O. Engwall och G. Salvi, "Auditory and Dynamic Modeling Paradigms to Detect L2 Mispronunciations," i 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 1, 2012, s. 898-901.
[41]
C. Koniaris, O. Engwall och G. Salvi, "On the Benefit of Using Auditory Modeling for Diagnostic Evaluation of Pronunciations," i International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden, June 6-8, 2012, 2012, s. 59-64.
[42]
N. Vanhainen och G. Salvi, "Word Discovery with Beta Process Factor Analysis," i 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 1, 2012, s. 798-801.
[43]
G. Salvi et al., "Analisi Gerarchica degli Inviluppi Spettrali Differenziali di una Voce Emotiva," i 7° convegno AISV, Contesto comunicativo e variabilità nella produzione e percezione della lingua (AISV). Lecce, Italy. 26 Gennaio - 28 Gennaio 2011, 2011.
[44]
G. Ananthakrishnan och G. Salvi, "Using Imitation to learn Infant-Adult Acoustic Mappings," i 12th Annual Conference Of The International Speech Communication Association 2011 (INTERSPEECH 2011), Vols 1-5, 2011, s. 772-775.
[45]
G. Salvi et al., "Cluster Analysis of Differential Spectral Envelopes on Emotional Speech," i 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-4, 2010, s. 322-325.
[46]
V. Krunic et al., "Affordance based word-to-meaning association," i ICRA : 2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, 2009, s. 4138-4143.
[47]
J. Beskow, G. Salvi och S. Al Moubayed, "SynFace : Verbal and Non-verbal Face Animation from Audio," i Proceedings of The International Conference on Auditory-Visual Speech Processing AVSP'09, 2009.
[48]
S. Al Moubayed et al., "Virtual Speech Reading Support for Hard of Hearing in a Domestic Multi-Media Setting," i INTERSPEECH 2009 : 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, 2009, s. 1443-1446.
[49]
V. Krunic et al., "Associating word descriptions to learned manipulation task models," i IEEE/RSJ International Conference on Intelligent RObots and Systems (IROS), 2008.
[50]
J. Beskow et al., "Hearing at Home : Communication support in home environments for hearing impaired persons," i INTERSPEECH 2008 : 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, 2008, s. 2203-2206.
[51]
E. Agelfors et al., "User evaluation of the SYNFACE talking head telephone," i Computers Helping People With Special Needs, Proceedings, 2006, s. 579-586.
[52]
G. Salvi, "Advances in regional accent clustering in Swedish," i Proceedings of European Conference on Speech Communication and Technology (Eurospeech), 2005, s. 2841-2844.
[53]
G. Salvi, "Ecological language acquisition via incremental model-based clustering," i Proceedings of European Conference on Speech Communication and Technology (Eurospeech), 2005, s. 1181-1184.
[54]
G. Salvi, "Segment boundaries in low latency phonetic recognition," i NONLINEAR ANALYSES AND ALGORITHMS FOR SPEECH PROCESSING, 2005, s. 267-276.
[55]
J. Beskow et al., "SYNFACE - A talking head telephone for the hearing-impaired," i COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS : PROCEEDINGS, 2004, s. 1178-1185.
[56]
K.-E. Spens et al., "SYNFACE, a talking head telephone for the hearing impaired," i IFHOH 7th World Congress for the Hard of Hearing. Helsinki Finland. July 4-9, 2004, 2004.
[57]
G. Salvi, "Accent clustering in Swedish using the Bhattacharyya distance," i Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS), Barcelona Spain, 2003, s. 1149-1152.
[58]
I. Karlsson, A. Faulkner och G. Salvi, "SYNFACE - a talking face telephone," i Proceedings of EUROSPEECH 2003, 2003, s. 1297-1300.
[59]
G. Salvi, "Truncation error and dynamics in very low latency phonetic recognition," i Proceedings of Non Linear Speech Processing (NOLISP), 2003.
[60]
G. Salvi, "Using accent information in ASR models for Swedish," i Proceedings of INTERSPEECH'2003, 2003, s. 2677-2680.
[61]
F. T. Johansen et al., "The cost 249 speechdat multilingual reference recogniser," i In Proceedings of XLDB Workshop on Very Large Telephone Speech Databases, 2000.
[62]
F. T. Johansen et al., "The cost 249 speechdat multilingual reference recogniser," i In Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2000.
[63]
B. Lindberg et al., "a noise robust multilingual reference recogniser based on speechdat(II)," i In Proceedings of the International Conference on Spoken Language Processing (ICSLP), 2000.
[64]
E. Agelfors et al., "A synthetic face as a lip-reading support for hearing impaired telephone users - problems and positive results," i European audiology in 1999 : proceeding of the 4th European Conference in Audiology, Oulu, Finland, June 6-10, 1999, 1999.
[65]
E. Agelfors et al., "Synthetic visual speech driven from auditory speech," i Proceedings of Audio-Visual Speech Processing (AVSP'99)), 1999.
Kapitel i böcker
[66]
D. S. Ásgrímsson et al., "Bayesian Deep Learning for Vibration-Based Bridge Damage Detection," i Structural Integrity, : Springer Nature, 2022, s. 27-43.
Icke refereegranskade
Artiklar
[67]
G. Salvi och S. Al Moubayed, "Spoken Language Identification using Frame Based Entropy Measures," TMH-QPSR, vol. 51, no. 1, s. 69-72, 2011.
[68]
T. Öhman och G. Salvi, "Using HMMs and ANNs for mapping acoustic to visual speech," TMH-QPSR, vol. 40, no. 1-2, s. 45-50, 1999.
Konferensbidrag
[69]
S. Al Moubayed et al., "Studies on Using the SynFace Talking Head for the Hearing Impaired," i Proceedings of Fonetik'09 : The XXIIth Swedish Phonetics Conference, June 10-12, 2009, 2009, s. 140-143.
[70]
B. Lindblom et al., "(Re)use of place features in voiced stop systems : Role of phonetic constraints," i Proceedings of Fonetik 2008, 2008, s. 5-8.
[71]
S. Al Moubayed, J. Beskow och G. Salvi, "SynFace Phone Recognizer for Swedish Wideband and Narrowband Speech," i Proceedings of The second Swedish Language Technology Conference (SLTC), 2008, s. 3-6.
Kapitel i böcker
[72]
B. Lindblom et al., "Sound systems are shaped by their users : The recombination of phonetic substance," i Where Do Phonological Features Come From? : Cognitive, physical and developmental bases of distinctive speech categories, G. Nick Clements, G. N.; Ridouane, R. red., : John Benjamins Publishing Company, 2011, s. 67-97.
Avhandlingar
[73]
G. Salvi, "Mining Speech Sounds : Machine Learning Methods for Automatic Speech Recognition and Analysis," Doktorsavhandling Stockholm : KTH, Trita-CSC-A, 2006:12, 2006.
Övriga
[74]
K. Stefanov et al., "Analysis and Generation of Candidate Gaze Targets in Multiparty Open-World Dialogues," (Manuskript).
[75]
K. Stefanov, J. Beskow och G. Salvi, "Self-Supervised Vision-Based Detection of the Active Speaker as a Prerequisite for Socially-Aware Language Acquisition," (Manuskript).
Senaste synkning med DiVA:
2023-03-23 01:01:06