Welcome to ROCLING 2022!
ROCLING 2022 is the 34th annual Conference on Computational Linguistics and Speech Processing in Taiwan sponsored by the Association for Computational Linguistics and Chinese Language Processing (ACLCLP). The conference will be held in Taipei Medical University, Daan Campus, Taipei city, Taiwan during November 21-22, 2022.
ROCLING 2022 will provide an international forum for researchers and industry practitioners to share their new ideas, original research results and practical development experiences from all language and speech research areas, including computational linguistics, information understanding, and signal processing. ROCLING 2022 will feature oral papers, posters, tutorials, special sessions and shared tasks.
The conference on Computational Linguistics and Speech Processing (ROCLING) was initiated in 1988 by the Association for Computational Linguistics and Chinese Language Processing (ACLCLP) with the major goal to provide a platform for researchers and professionals from around the world to share their experiences related to natural language processing and speech processing. Following are a list of past ROCLING conferences.
ROCLING 2022 invites paper submissions reporting original research results and system development experiences as well as real-world applications. Each submission will be reviewed based on originality, significance, technical soundness, and relevance to the conference. Accepted papers will be presented orally or as poster presentations. Both oral and poster presentations will be published in the ROCLING 2022 conference proceedings and included in the ACL Anthology. A number of papers will be selected and invited for extension into journal versions and publication in a special issue of the International Journal of Computational Linguistics and Chinese Language Processing (IJCLCLP).
Papers can be written and presented in either Chinese or English. Papers should be made in PDF format and submitted online through the paper submission system. Submitted papers may consist of 4-8 pages of content, plus unlimited references. Upon acceptance, final versions will be given additional pages of content (up to 9 pages) so that reviewers’ comments can be taken into account. ROCLING 2022 mainly targets two scientific tracks: natural language processing (NLP) and speech processing (Speech).
Relevant topics for the conference include, but are not limited to, the following areas (in alphabetical order):
Paper submissions must use the official ROCLING 2022 style templates (Latex and Word). Submission is electronic, using the EasyChair conference management system. The submission site is available at https://easychair.org/conferences/?conf=rocling2022
As the reviewing will be double-blind, papers must not include authors' names and affiliations. Furthermore, self-references that reveal the author's identity must be avoided. Papers that do not conform to these requirements will be rejected without review. Papers may be accompanied by a resource (software and/or data) described in the paper, but these resources should be anonymized as well.
According to the format of the paper template, the page limitations for accepted papers are 9 pages (plus unlimited references) in PDF format. The first page of the camera-ready version of the accepted paper should bear the items of paper title, author name, affiliation, and email address. All these items should be properly centered on the top, followed by a concise abstract of the paper.
Every accepted paper should also be sent with a signed copyright form in PDF format via the online registration system.
Session 1: Speech Application-1
Chair: Yi-Chin Huang
(10:20 – 10:40)
AI Tutorial I - AICup 秋季賽自然語言理解的解釋性資訊標記比賽說明會
Instructor: Hen-Hsen Huang
Shared Task: Chinese Healthcare Named Entity Recognition
Chair: Lung-Hao Lee
(13:30 – 13:50)
(1) SalesBot: Transitioning from Chit-Chat to Task-Oriented Dialogues, ACL 2022
Session 2: Best Paper Award Session
Chair: Yung-Chun Chang
(15:30 – 16:00)
A Quantitative Analysis of Comparison of Emoji Sentiment: Taiwan Mandarin Users and English Users
(935)：喬治商職 - 市政府(松高)，步行220公尺(約3分鐘)
(基隆路幹線，原 650)：喬治商職 - 市政府(松智)，步行400公尺(約5分鐘)
(284, 611)：喬治商職 - 松壽路口，步行450公尺(約6分鐘)
Session 3: Information Retrieval
Chair: Jheng-Long Wu
(10:20 – 10:40)
AI Tutorial II: Taiwanese Across Taiwan（TAT）語料庫與其應用
Instructor: Yuan-Fu Liao
Session 4: Speech Application-2
Chair: Yi-Fen Liu
(13:00 – 13:20)
Thesis Award Session
Learning Deep Feature and Label Space to Enhance Discriminative Recognition Tasks
Session 5: NLP Applications
Chair: Ming-Hsiang Su
(15:40 – 16:00)
Special Session: Construction and Application of Hakka Language Resources
客語語音語料庫建置計畫介紹與初步成果 (Introduction and preliminary results of Hakka Across Taiwan Project)
To help you prepare your presentation, here's the important information for presenters.
Download Slide Template
The presentations can be in either English or Chinese. Each presentation will have 15 minutes to present, followed by 4 minutes of questions and answers, and 1 minute for speaker change. Under irresistible circumstances (e.g., international speakers cannot present live) can a presentation be pre-recorded and played during the conference. Presenters should introduce themselves to the session chairs before the start of their oral session. Each room will be equipped with:
● a laptop computer (Windows system), which can load PPT and PDF,
● a projector,
● a shared Internet connection,
● an audio system.
The display connectors for the screen are both HDMI and VGA. Presenters that would like to use their laptop for their presentation must bring their own adapter to connect to the HDMI/VGA cable and any audio connectors if they have a non-standard audio-out port. Prior to the session, presenters should inform the session chair and test that their computer and adapter works with the projector in the room. Wireless internet connection will be available in the presentation rooms.
Posters are in A1 size (59.4 cm wide x 84.1 cm high, or 23.4 inches x 33.1 inches). Presenters are advised to mount their posters before the start of the session and dismount it after the end of the session. Materials to fix the posters will be available on site.
The idea behind the pre-recorded video is to provide attendees with a way to gain some insight into your contributions that is engaging. Pre-recorded videos will be released at designated venues at ROCLING 2022. We are aware that guidelines will be helpful to ensure a uniformly excellent experience for all. With that in mind we would like to establish the following minimum expectations:
Duration: At least 5 minutes and at most 15 minutes. Within that interval, choose a duration that you feel will best engage your audience. These include having a video of the presenter in the corner of the slides.
File size: 200MB max
Video file format: mp4
Dimensions: Minimum height 720 pixels, aspect ratio: 16:9
Please note that final specifications will be checked at the time of submission, non-compliant files may be requested to be re-recorded, and a download link of the recorded video be provided for the conference to download by November 18th.
More details to be announced.
Makoto P. Kato received his Ph.D. degree in Graduate School of Informatics from Kyoto University, Sakyo Ward, Yoshidahonmachi, in 2012. Currently, he is an associate professor of Faculty of Library, Information and Media Science, University of Tsukuba, Japan. In 2008, he was awarded 'WISE 2008 Kambayashi Best Paper Award' through the article 'Can Social Tagging Improve Web Image Search?' with other researchers. In 2010, he served as a JSPS Research Fellow in Japan Society for the Promotion of Science. During the period 2010 to 2012, he also served in Microsoft Research Asia Internship (under supervision by Dr. Tetsuya Sakai in WIT group), Microsoft Research Asia Internship (under supervision by Dr. Tetsuya Sakai in WSM group), and Microsoft Research Internship (under supervision by Dr. Susan Dumais in CLUES group). From 2012, he worked as an assistant professor in Graduate School of Informatics, Kyoto University, Japan. His research and teaching career began, and he worked as an associate professor from 2019 in Graduate School of Informatics, Kyoto University, Japan. His research interests include Information Retrieval, Web Mining, and Machine Learning, while he is an associate professor in Knowledge Acquisition System Laboratory (Kato Laboratory), University of Tsukuba, Japan.
We are now facing the problem of misinformation and disinformation on the Web, and search engines are struggling to retrieve reliable information from a vast amount of Web data. One of the possible solutions to this problem is to find reliable evidences supporting a claim on the Web. But what are “reliable evidences”? They can include authorities' opinions, scientific papers, or wisdom of crowds. However, they are also sometimes subjective as they are outcomes produced by people.
This talk discusses some approaches incorporating another type of evidences that are very objective --- numerical data --- for reliable information access.
(1) Entity Retrieval based on Numerical Attributes
Entity retrieval is a task of retrieving entities for a given text query and usually based on text matching between the query and entity description. Our recent work attempted to match the query and numerical attributes of entities and produce explainable rankings. For example, our approach ranks cameras based on their numerical attributes such as resolution, f-number, and weight, in response to queries such as “camera for astrophotography” and “camera for hiking”.
(2) Data Search
When people encounter suspicious claims on the Web, data can be reliable sources for the fact checking. NTCIR Data Search is an evaluation campaign that aims to foster data search research by developing an evaluation infrastructure and organizing shared tasks for data search. The first test collection for data search and some findings are introduced in this talk.
(3) Data Summarization
While the data search project attempts to develop a data search system for end users and help them make decisions based on data, it is still difficult for users to quickly interpret data. Therefore, data summarization techniques are also necessary to enable users to incorporate data in their information seeking process. Recent automatic visualization and text-based data summarization techniques are presented in this talk.
Junichi Yamagishi received the Ph.D. degree from Tokyo Institute of Technology in 2006 for a thesis that pioneered speaker-adaptive speech synthesis. He is currently a Professor with the National Institute of Informatics, Tokyo, Japan, and also a Senior Research Fellow with the Centre for Speech Technology Research, University of Edinburgh, Edinburgh, U.K. Since 2006, he has authored and co-authored more than 250 refereed papers in international journals and conferences. He was an area coordinator at Interspeech 2012. He was one of organizers for special sessions on “Spoofing and Countermeasures for Automatic Speaker Verification” at Interspeech 2013, “ASVspoof evaluation” at Interspeech 2015, “Voice conversion challenge 2016” at Interspeech 2016, “2nd ASVspoof evaluation” at Interspeech 2017, and “Voice conversion challenge 2018” at Speaker Odyssey 2018. He is currently an organizing committee for ASVspoof 2019, an organizing committee for ISCA the 10th ISCA Speech Synthesis Workshop 2019, a technical program committee for IEEE ASRU 2019, and an award committee for ISCA Speaker Odyssey 2020. He was a member of IEEE Speech and Language Technical Committee. He was also an Associate Editor of the IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING and a Lead Guest Editor for the IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING special issue on Spoofing and Countermeasures for Automatic Speaker Verification. He is currently a guest editor for Computer Speech and Language special issue on speaker and language characterization and recognition: voice modeling, conversion, synthesis and ethical aspects. He also serves as a chairperson of ISCA SynSIG currently. He was the recipient of the Tejima Prize as the best Ph.D. thesis of Tokyo Institute of Technology in 2007. He received the Itakura Prize from the Acoustic Society of Japan in 2010, the Kiyasu Special Industrial Achievement Award from the Information Processing Society of Japan in 2013, the Young Scientists’ Prize from the Minister of Education, Science and Technology in 2014, the JSPS Prize from Japan Society for the Promotion of Science in 2016, and Docomo mobile science award from Mobile communication fund in 2018.
The Yamagishi Laboratory at the National Institute of Informatics researches text-to-speech (TTS) and voice conversion (VC) technologies. Having achieved TTS and VC methods that reproduce human-level naturalness and speaker similarity, we introduce three challenging projects we are currently working on as the next phase of our research.
1) Rakugo speech synthesis 
As an example of a challenging application of speech synthesis technology, especially an example of an entertainment application, we have concentrated on rakugo, a traditional Japanese performing art. We have been working on learning and reproducing the skills of a professional comic storyteller using speech synthesis. This project aims to achieve an "AI storyteller" that entertains listeners, entirely different from the conventional speech synthesis task, whose primary purpose is to convey information or answer questions. The main story of rakugo comprises conversations between characters, and various characters appear in the story. These characters are performed by a single rakugo storyteller, who changes their voice appropriately so the listeners can understand and entertain them. To reproduce such characteristics of rakugo voice by machine learning, performance data of rakugo and advanced modeling techniques are required. Therefore, we constructed a corpus of rakugo speech without any noise or audience sounds with the cooperation of an Edo-style rakugo performer and modeled this data using deep learning. In addition, we benchmarked our system by comparing the generated Rakugo speech with performances by Rakugo storytellers of different ranks (“Zenza/前座," “Futatsume/二つ目," and “Shinuchi/真打") through subjective evaluation.
(2) Speech intelligibility enhancement 
In remote communication, such as online conferencing, there are environmental background noises on both speaker and listener sides. Speech intelligibility enhancement is a technique to manipulate speech signals so as not to be masked by the noise on the listener's side while maintaining the volume. This is not a simple conversion task since "correct teacher data" does not exist. For this reason, deep learning has not been used in the past, and there has been no significant technological progress. However, various possible practical applications exist, such as intelligibility enhancement of station announcements. Therefore, we proposed a network structure called "iMetricGAN" and its learning method, in which complex and non-differentiable speech intelligibility and quality indexes are treated as output values of a discriminator in an adversarial generative network, the discriminator approximates the indexes and based on the approximated indexes, a generator is used to transform an input speech signal into an enhanced, easy-to-hear speech signal automatically. Subject experiments confirmed that this transformation significantly improves keyword recognition in noisy environments.
(3) Speaker Anonymization [3, 4]
Now that it is becoming easier to build speech synthesis systems that digitally clone someone’s voice using ‘found' data on social media, there is a need to mask the speaker information in speech and other sensitive attributes that are appropriate to be protected. This is a new research topic; it has not yet been clearly defined how speaker anonymization can be achieved. We proposed a speaker anonymization method that combines speech synthesis and speaker recognition technologies. Our approach decomposes speech into three pieces of information: prosody, phoneme information, and a speaker embedding vector called X-vector, which is standardly used in speaker recognition and anonymizes the individuality of a speaker by averaging only the X-vector with K speakers. A neural vocoder is used to re-synthesize high-quality speech waveform. We also introduce a speech database and evaluation metrics to compare speaker anonymization methods.
 Shuhei Kato, Yusuke Yasuda, Xin Wang, Erica Cooper, Shinji Takaki, Junichi Yamagishi "Modeling of Rakugo Speech and Its Limitations: Toward Speech Synthesis That Entertains Audiences,” IEEE Access, vol.8, pp.138149-138161, July 2020
 Haoyu Li, Junichi Yamagishi, “Multi-Metric Optimization Using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.29, pp.3000-3011, Sept 2021
 Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen, Massimiliano Todisco, Nicholas Evans, Jean-Francois Bonastre, “Speaker Anonymization Using X-vector and Neural Waveform Models,” 10th ISCA Speech Synthesis Workshop (SSW10), Sept 2019
 Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko, "Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models,” Odyssey 2022: The Speaker and Language Recognition Workshop, June 2022
Taiwanese Across Taiwan（TAT）語料庫與其應用歷經三年，Taiwanese Across Taiwan（TAT）台文語音語料庫，針對台語語音辨認目的，已完成在台灣各地，招募640人，累積收錄312小時語料。而針對台語語音合成目的，也已完成招募兩男兩女，每人錄製10小時，累積收錄41小時語料。此台文語音語料庫的第一階段產出，已由計算語言學學會發行，而第二階段產出，將在近期由教育部發行。與此同時，我們也利用Kaldi與ESPnet等工具，開發出（1）台語語音辨認、（2）台語語音合成、（3）台文自然語言頗析器等工具，並（4）實現台語語音轉換等應用。
According to the definition of Linguistics and language technology, a language resource is a linguistic material used in the construction, improvement and evaluation of language processing applications or platforms, which are roughly divided into linguistic data, including text, vocabulary, grammar, language models or different types of data, and technology tools, referring to language processing and maintenance. Next to Taiwan Mandarin Chinese and Taiwan Southern Min, Taiwan Hakka is the third largest language, and based on the Hakka Affairs Council's 2017 survey, the proportion of the Hakka population, in light of the definition of the Hakka Basic Law regarding Hakka people as "have Hakka blood or origin, and who believe themselves as Hakka people" is about 4.537 million, accounting for 19.3% of the national population. However, this survey also shows that the Hakka speaking and listening proficiency of Hakka people is declining, while the loss rate of the Hakka language is increasing. Under the mission of saving and preserving endangered languages, the creation of Hakka-related corpora or language resources is a top priority. This symposium brings together three experts and scholars who are mainly engaged in Hakka corpus and related research. Professor Huei-ling Lai introduces and shares the construction process and experience of the "Taiwan Hakka Corpus" (THC). Currently, the THC is a relatively large-scale Hakka corpus in a systematic manner. The THC construction has overcome various challenges derived from Hakka's idiosyncratic performance, and it has also developed a retrieval and word segmentation system for Hakka. So far, it has collected multiple Hakka spoken data (in total over 400,000 words) and Hakka written data (in total over 6 million words). Second, Professor Yuan-Fu Liao introduces and shares the collection process and construction experience of the "Hakka Across Taiwan Corpus", aiming at widely collecting and recording the speech data of the two Hakka sub-dialects, Sixian dialect and Hailu dialect, for constructing the foundation of Hakka speech synthesis and AI speech recognition. Finally, Dr. Shu-Chuan Tseng introduces and shares her research on Taiwan Mandarin speech corpora of adults and children, as well as her recent works on discourse understanding of Hakka conversation.
● （國道3號）由信義快速道路下來走左側 2 條車道下出口，進入信義路五段直走往基隆路/市政中心方向行進約1.1公里後，左轉基隆路二段，沿基隆路二段直走1公里後，右側即可見臺北醫學大學大安校區。
● （環東大道）沿著基隆路的路標走，靠左繼續走基隆路地下道，繼續直行基隆路一段，接續直行基隆路二段 1 公里後，右側即可見臺北醫學大學大安校區。
搭乘捷運文湖線至（六張犁站）下車，單一出口循基隆路走往台北市政府方向步行近 300 公尺（約 5 分鐘）可抵統一超商（7-ELEVEN）喬治門市，對面即是臺北醫學大學大安校區。
● (935)：喬治商職 - 市政府(松高)，步行220公尺(約3分鐘)
● (基隆路幹線，原 650)：喬治商職 - 市政府(松智)，步行400公尺(約5分鐘)
● (284, 611)：喬治商職 - 松壽路口，步行450公尺(約6分鐘)
搭乘捷運板南線至（市政府站）下車，三號出口循興雅路往松高路步行近 300 公尺（約 5 分鐘）可抵新光三越台北信義新天地A8
Rocling2022Taipei city, Taiwan
November 21 - 22, 202209:00 AM – 05:00 PM
343 Available SeatsHurryup! few tickets are left
Free Lunch & SnacksDon’t miss it